Retrieving the data
In this section we focus on reading in data and putting it into an appropriate data structure. These 'data' are modeled weather forecasts for individual weather stations across the United States. (I put quotes on data because these are modeled solutions, not actual observations). The file that will be read contains the forecast for one day (April 22, 2014) for 0 to 7 days prior, where the 0th day is the forecast on April 22nd:
# Read file
filename='target_day_20140422.dat'
f = open(filename, 'r')
contents = f.readlines()
Where contents looks like this:
['Lat, Lon, days_out, MaxT, MinT \n',
'38.576698 -92.173523 0 18.71 6.97\n',
'38.576698 -92.173523 1 21.03 8.7\n',
'38.576698 -92.173523 2 20.67 9.72\n',
'38.576698 -92.173523 3 19.01 7.23\n',
'38.576698 -92.173523 4 22.08 9.07\n',
'38.576698 -92.173523 5 21.68 9.53\n',
'38.576698 -92.173523 6 22.33 10.22\n',
'38.576698 -92.173523 7 16.18 12.14\n',
'34.154179 -117.344208 0 17.37 6.16\n',
'34.154179 -117.344208 1 19.66 7.48\n',
'34.154179 -117.344208 2 21.24 6.27\n',
'34.154179 -117.344208 3 21.71 5.5\n',
'34.154179 -117.344208 4 18.34 8.88\n', ...]
Couple of things here -- we have a list of strings, where the end of the string is marked with an 'n'. This marker indicates that it is the end of the line in the file and will need to be accounted for when we ingest the data into a useable form.
Let's make a dictionary of values, where lat, long are the keys (in tuple form). The values are also dictionaries, where the number of days out are the keys, and MaxT and MinT are the values:
forecast_dict = {}
for line in range(1, len(contents)):
line_split = contents[line].split(' ')
try:
forecast_dict[line_split[0], line_split[1]][line_split[2]] = {'MaxT':float(line_split[3]),
'MinT':float(line_split[4][:-1])}
except:
forecast_dict[line_split[0], line_split[1]] = {}
forecast_dict[line_split[0], line_split[1]][line_split[2]] = {'MaxT':float(line_split[3]),
'MinT':float(line_split[4][:-1])}
Here forecast_dict looks like this:
{('19.068609', '-155.764999'): {'0': {'MaxT': 25.67, 'MinT': 24.45},
'1': {'MaxT': 25.88, 'MinT': 24.66},
'2': {'MaxT': 25.17, 'MinT': 24.49},
'3': {'MaxT': 25.67, 'MinT': 24.37},
'4': {'MaxT': 25.35, 'MinT': 23.76},
'5': {'MaxT': 24.57, 'MinT': 23.27},
'6': {'MaxT': 24.26, 'MinT': 23.33},
'7': {'MaxT': 24.71, 'MinT': 23.78}},
('19.43083', '-155.237778'): {'0': {'MaxT': 25.38, 'MinT': 23.41},
'1': {'MaxT': 25.39, 'MinT': 22.47},
'2': {'MaxT': 24.77, 'MinT': 23.35},
'3': {'MaxT': 25.38, 'MinT': 22.45},
'4': {'MaxT': 24.36, 'MinT': 22.5},
'5': {'MaxT': 23.92, 'MinT': 22.57},
'6': {'MaxT': 23.21, 'MinT': 22.45},
'7': {'MaxT': 23.56, 'MinT': 22.68}},...
So now we have for each site (defined by its latitude and longitude) the Maximum Temperature (MaxT) and Minimum Temperature (Min T) for each forecast done the day of (day '0') to 7 days prior. It's pretty easy to retrieve the stations (and hence the latitudes and longitudes) by typing:
forecast_dict.keys()
which gives:
[('37.224239', '-95.708313'),
('27.53587', '-82.561211'),
('32.709301', '-96.008301'),
('42.09808', '-88.28286'),
('36.424229', '-89.057007'),
('36.98801', '-121.956627'),
('43.02496', '-108.380096'),
('41.802601', '-71.88591'),
('37.99548', '-122.332748'),
('43.416679', '-86.35701'),
('41.85371', '-71.758118'),...
And you can extract values for a random station by selecting one of these keys, e.g.:
forecast_dict[('40.51218', '-111.47435')]
gives you:
{'0': {'MaxT': 17.45, 'MinT': 2.04},
'1': {'MaxT': 17.95, 'MinT': 5.84},
'2': {'MaxT': 18.33, 'MinT': 7.99},
'3': {'MaxT': 18.16, 'MinT': 7.7},
'4': {'MaxT': 13.75, 'MinT': 3.62},
'5': {'MaxT': 14.58, 'MinT': 9.23},
'6': {'MaxT': 14.58, 'MinT': 9.23},
'7': {'MaxT': 13.08, 'MinT': -2.99}}
The output above shows the forecasted Max T and Min T values for 0-7 days prior for a specific station at Latitude 40.51218N, Longitude -111.47435E.
Prepare our data for Plotting
The plot will be Max T vs. day out for this one station. It will be a simple plot, but first, we need to make some lists that matplotlib can use to do the plotting. We will need a list of days, and a list of corresponding Max T values:
# First retrieve the days
day_keys = forecast_dict[('40.51218', '-111.47435')].keys()
day_keys gives you:
['1', '0', '3', '2', '5', '4', '7', '6']
Dictionaries don't necessarily sort alphabetically or numerically, so let's sort them:
day_keys.sort()
returns:
['0', '1', '2', '3', '4', '5', '6', '7']
Matplotlib plots lists of one thing against another. So, let's make our lists:
# First define the variables as lists
day_list = []; maxt_list = []
# Then populate the lists
for day_key in day_keys:
day_list.append(float(day_key))
maxt_list.append(float(forecast_dict[('40.51218', '-111.47435')][day_key]['MaxT']))
Now the element in one list corresponds with an element in the other list, for a given element number. For example day_list[0] corresponds to maxt_list[0]