diff --git a/.nojekyll b/.nojekyll
deleted file mode 100644
index e69de29..0000000
diff --git a/README.md b/README.md
index 9475786..e839f2e 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,14 @@ web-data-python
===============
Introduction to getting and publishing data over the web for Python programmers.
+Please see for a rendered version of this material,
+[the lesson template documentation][lesson-example]
+for instructions on formatting, building, and submitting material,
+or run `make` in this directory for a list of helpful commands.
-> Please see [https://github.com/swcarpentry/lesson-example](https://github.com/swcarpentry/lesson-example)
-> for instructions on formatting, building, and submitting lessons,
-> or run `make` in this directory for a list of helpful commands.
+Maintainer(s):
+
+* [Greg Wilson][wilson-greg]
+
+[lesson-example]: https://swcarpentry.github.com/lesson-example/
+[wilson-greg]: http://software-carpentry.org/team/#wilson_g
diff --git a/_episodes/01-getdata.md b/_episodes/01-getdata.md
index cf17d7b..c4b2f30 100644
--- a/_episodes/01-getdata.md
+++ b/_episodes/01-getdata.md
@@ -1,10 +1,14 @@
---
title: "Getting Data"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "Write Python programs to download data sets using simple REST APIs."
+keypoints:
+- "FIXME"
---
-> ## Learning Objectives {.objectives}
->
-> * Write Python programs to download data sets using simple REST APIs.
A growing number of organizations make data sets available on the web in a style called [REST](reference.html#rest),
which stands for REpresentational State Transfer.
@@ -41,6 +45,7 @@ For example, if we want the average annual temperature in Canada as a CSV file,
~~~
http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv
~~~
+{: .source}
If we paste that URL into a browser, it displays:
@@ -54,8 +59,9 @@ year,data
2008,-7.2008957862854
2009,-6.997011661529541
~~~
+{: .source}
-> ## Behind the Scenes {.callout}
+> ## Behind the Scenes
>
> This particular data set might be stored in a file on the World Bank's server,
> or that server might:
@@ -70,6 +76,7 @@ year,data
> As long as the World Bank doesn't change its URLs,
> we don't need to know which method it's using
> and it can switch back and forth between them without breaking our programs.
+{: .callout}
If we only wanted to look at data for a couple of countries,
we could just download those files one by one.
@@ -81,39 +88,43 @@ It is clumsy to use, though, so many people (including us) prefer
a newer library called [Requests](http://docs.python-requests.org).
To install it, run the command:
-~~~ {.bash}
+~~~
$ pip install requests
~~~
+{: .bash}
-> ## Installing with Pip {.callout}
+> ## Installing with Pip
>
> Note that `pip` is a program in its own right,
> so the command above must be run in the shell,
> and *not* from within Python itself.
+{: .callout}
If Requests is not already installed,
`pip`'s output is:
-~~~ {.output}
+~~~
Downloading/unpacking requests
Downloading requests-2.7.0-py2.py3-none-any.whl (470kB): 470kB downloaded
Installing collected packages: requests
Successfully installed requests
Cleaning up...
~~~
+{: .output}
If it's already present,
the output will be:
-~~~ {.output}
+~~~
Requirement already satisfied (use --upgrade to upgrade): requests in /Users/swc/anaconda/lib/python2.7/site-packages
Cleaning up...
~~~
+{: .output}
Either way,
we can now get the data we want like this:
-~~~ {.python}
+~~~
import requests
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'
response = requests.get(url)
@@ -123,7 +134,8 @@ else:
print('First 100 characters of data are')
print(response.text[:100])
~~~
-~~~ {.output}
+{: .python}
+~~~
First 100 characters of data are
year,data
1901,-7.67241907119751
@@ -131,6 +143,7 @@ year,data
1903,-7.910782814025879
1904,-8.15572929382
~~~
+{: .output}
The first line imports the `requests` library.
The second defines the URL for the data we want;
@@ -163,27 +176,31 @@ the most common are:
if we get anything else, the response probably doesn't contain actual data
(though it might contain an error message).
-> ## Some People Don't Follow the Rules {.callout}
+> ## Some People Don't Follow the Rules
>
> Unfortunately, some sites don't return a meaningful status code.
> Instead, they return 200 for *everything*,
> then put an error message (if appropriate) in the text of the response.
> This works when the result is being displayed to a human being,
> but fails miserably when the "reader" is a program that can't actually read.
+{: .callout}
-> ## Defining REST API {.challenge}
+> ## Defining REST API
>
> A REST API is:
> 1. A data format.
> 2. A way of accessing data via an URL.
> 3. Less work for the server.
> 4. Only accessable via Python libraries like Requests.
+{: .challenge}
-> ## Get Data for Guatemala {.challenge}
+> ## Get Data for Guatemala
>
> Modify the little program above to fetch temperatures for Guatemala.
+{: .challenge}
-> ## How Hot is Afghanistan? {.challenge}
+> ## How Hot is Afghanistan?
>
> Read the [documentation](http://data.worldbank.org/developers/climate-data-api) for the Climate Data API,
> and then write URLs to find the annual average temperature for Afghanistan between 1980 and 1999.
+{: .challenge}
diff --git a/_episodes/02-csv.md b/_episodes/02-csv.md
index 2fb237c..6193c6a 100644
--- a/_episodes/02-csv.md
+++ b/_episodes/02-csv.md
@@ -1,11 +1,15 @@
---
title: "Handling CSV Data"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "Parse CSV data using the `csv` library."
+- "Test a program that parses CSV using multiline strings."
+keypoints:
+- "FIXME"
---
-> ## Learning Objectives {.objectives}
->
-> * Parse CSV data using the `csv` library.
-> * Test a program that parses CSV using multiline strings.
Our little program gets the data we want,
but returns it as one long character string rather than as a list of numbers.
@@ -28,33 +32,38 @@ we create a file called `test01.csv` that contains the following three lines:
1902,45.6
1903,78.9
~~~
+{: .source}
It's easy to read this file line by line and (for example) report the length of each line:
-~~~ {.python}
+~~~
with open('test01.csv', 'r') as reader:
for line in reader:
print(len(line))
~~~
-~~~ {.output}
+{: .python}
+~~~
10
10
10
~~~
+{: .output}
We can also split each line on commas to turn each one into a list of string fragments:
-~~~ {.python}
+~~~
with open('test01.csv', 'r') as reader:
for line in reader:
fields = line.split(',')
print(fields)
~~~
-~~~ {.output}
+{: .python}
+~~~
['1901', '12.3\n']
['1902', '45.6\n']
['1903', '78.9\n']
~~~
+{: .output}
The dates are correct,
but the values all end with `\n`.
@@ -63,17 +72,19 @@ the newline character at the end of each line.
To get rid of it,
we should strip leading and trailing whitespace from each line before splitting it on commas:
-~~~ {.python}
+~~~
with open('test01.csv', 'r') as reader:
for line in reader:
fields = line.strip().split(',')
print(fields)
~~~
-~~~ {.output}
+{: .python}
+~~~
['1901', '12.3']
['1902', '45.6']
['1903', '78.9']
~~~
+{: .output}
Now let's have a look at how we could parse the data using standard Python libraries instead.
The library we'll use is called `csv`.
@@ -81,7 +92,7 @@ It doesn't read data itself:
instead, it takes the lines read by something else and turns them into lists of values by splitting on commas.
Here's one way to use it:
-~~~ {.python}
+~~~
import csv
with open('test01.csv', 'r') as raw:
@@ -89,11 +100,13 @@ with open('test01.csv', 'r') as raw:
for record in cooked:
print(record)
~~~
-~~~ {.ouptut}
+{: .python}
+~~~
['1901', '12.3']
['1902', '45.6']
['1903', '78.9']
~~~
+{: .output}
Here,
`raw` reads data in the normal way,
@@ -102,7 +115,7 @@ that takes a line of text and turns it into a list of fields.
We can equally well give a `csv.reader` a list of strings rather than a file:
-~~~ {.python}
+~~~
import csv
with open('test01.csv', 'r') as raw:
@@ -111,11 +124,13 @@ cooked = csv.reader(lines)
for record in cooked:
print(record)
~~~
-~~~ {.output}
+{: .python}
+~~~
['1901', '12.3']
['1902', '45.6']
['1903', '78.9']
~~~
+{: .output}
Using the `csv` library doesn't seem any simpler than just splitting strings,
but look at what happens when we have data like this:
@@ -125,14 +140,16 @@ but look at what happens when we have data like this:
"Spence, Frances Bilas",1922,2012
"Teitelbaum,Ruth Lichterman",1924,1986
~~~
+{: .source}
With simple string splitting, our output is:
-~~~ {.output}
+~~~
['"Meltzer', ' Marlyn Wescoff"', '1922', '2008']
['"Spence', ' Frances Bilas"', '1922', '2012']
['"Teitelbaum', 'Ruth Lichterman"', '1924', '1986']
~~~
+{: .output}
The double quotes are still there,
and the field containing each person's name has been split into pieces.
@@ -140,11 +157,12 @@ If we use the `csv` library,
on the other hand,
the output is:
-~~~ {.output}
+~~~
['Meltzer, Marlyn Wescoff', '1922', '2008']
['Spence, Frances Bilas', '1922', '2012']
['Teitelbaum,Ruth Lichterman', '1924', '1986']
~~~
+{: .output}
because the library understands how to handle text fields containing commas
(and a lot more).
@@ -158,8 +176,9 @@ year,data
1901,-7.67241907119751
1902,-7.862711429595947
1903,-7.910782814025879
-...
+⋮ ⋮ ⋮
~~~
+{: .source}
We have to break this into lines before giving it to `csv.reader`,
and we can do that by splitting the string on the same `\n` escape sequence
@@ -167,43 +186,49 @@ we encountered a few moments ago.
To see how this works,
let's read `test01.csv` into memory and split it into pieces:
-~~~ {.python}
+~~~
with open('test01.csv', 'r') as reader:
data = reader.read()
lines = data.split('\n')
print(lines)
~~~
-~~~ {.output}
+{: .python}
+~~~
['1901,12.3', '1902,45.6', '1903,78.9', '']
~~~
+{: .output}
That's *almost* right, but why is there an empty string at the end of the list?
The answer is that the last line of the file ends in a newline,
so Python does the same thing it does in the example below:
-~~~ {.python}
+~~~
fields = 'a-b-'.split('-')
print(fields)
~~~
-~~~ {.output}
+{: .python}
+~~~
['a', 'b', '']
~~~
+{: .output}
The solution once again is to strip leading and trailing whitespace before splitting:
-~~~ {.python}
+~~~
with open('test01.csv', 'r') as reader:
data = reader.read()
lines = data.strip().split('\n')
print(lines)
~~~
-~~~ {.output}
+{: .python}
+~~~
['1901,12.3', '1902,45.6', '1903,78.9']
~~~
+{: .output}
Putting this all together, we can get data for Canada like this:
-~~~ {.python}
+~~~
import requests
import csv
@@ -216,20 +241,22 @@ else:
for record in wrapper:
print(record)
~~~
-~~~ {.output}
+{: .python}
+~~~
['year', 'data']
['1901', '-7.67241907119751']
['1902', '-7.862711429595947']
['1903', '-7.910782814025879']
['1904', '-8.155729293823242']
['1905', '-7.547311305999756']
-...
+⋮ ⋮ ⋮
~~~
+{: .output}
That looks like progress,
so let's convert the data from strings to the numbers we actually want:
-~~~ {.python}
+~~~
import requests
import csv
@@ -244,25 +271,28 @@ else:
value = float(record[1])
print(year, value)
~~~
-~~~ {.error}
+{: .python}
+~~~
Traceback (most recent call last):
File "api-with-naive-converting.py", line 11, in
year = int(record[0])
ValueError: invalid literal for int() with base 10: 'year'
~~~
+{: .error}
The error occurs because the first line of data is:
~~~
year,data
~~~
+{: .source}
When we try to convert the string `'year'` to an integer,
Python quite rightly complains.
The fix is straightforward:
we just need to ignore lines that start with the word `year`:
-~~~ {.python}
+~~~
import requests
import csv
@@ -279,7 +309,8 @@ else:
value = float(record[1])
print(year, value)
~~~
-~~~ {.output}
+{: .python}
+~~~
1901 -7.67241907119751
1902 -7.862711429595947
1903 -7.910782814025879
@@ -287,8 +318,9 @@ else:
1905 -7.547311305999756
...
~~~
+{: .output}
-> ## The Makeup of CSV Files {.challenge}
+> ## The Makeup of CSV Files
>
> CSV Files need to be separated into:
>
@@ -296,3 +328,4 @@ else:
> 2. Rows(lines) then records (fields).
> 3. Newline characters.
> 4. Commas and other characters.
+{: .challenge}
diff --git a/_episodes/03-generalize.md b/_episodes/03-generalize.md
index 97b4eab..e485905 100644
--- a/_episodes/03-generalize.md
+++ b/_episodes/03-generalize.md
@@ -1,11 +1,15 @@
---
title: "Generalizing and Handling Errors"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "Turn a script into a function."
+- "Make a function more robust by explicitly handling errors."
+keypoints:
+- "FIXME"
---
-> ## Learning Objectives {.objectives}
->
-> * Turn a script into a function.
-> * Make a function more robust by explicitly handling errors.
Now that we know how to get the data for Canada,
let's create a function that will do the same thing for an arbitrary country.
@@ -17,7 +21,7 @@ The steps are simple:
The resulting function looks like:
-~~~ {.python}
+~~~
def annual_mean_temp(country):
'''Get the annual mean temperature for a country given its 3-letter ISO code (such as "CAN").'''
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/' + country + '.csv'
@@ -34,28 +38,33 @@ def annual_mean_temp(country):
results.append([year, value])
return results
~~~
+{: .python}
This works:
-~~~ {.python}
+~~~
canada = annual_mean_temp('CAN')
print('first three entries for Canada:', canada[:3])
~~~
-~~~ {.output}
+{: .python}
+~~~
first three entries for Canada: [[1901, -7.67241907119751], [1902, -7.862711429595947], [1903, -7.910782814025879]]
~~~
+{: .output}
However,
there's a problem.
Look what happens when we pass in an invalid country identifier:
-~~~ {.python}
+~~~
latveria = annual_mean_temp('LTV')
print 'first three entries for Latveria:', latveria[:3]
~~~
-~~~ {.output}
+{: .python}
+~~~
first three entries for Latveria: []
~~~
+{: .output}
Latveria doesn't exist,
so why is our function returning an empty list rather than printing an error message?
@@ -69,7 +78,7 @@ and returned `None`
So if the response code was 200 and there was no data, that would explain what we're seeing.
Let's check:
-~~~ {.python}
+~~~
def annual_mean_temp(country):
'''Get the annual mean temperature for a country given its 3-letter ISO code (such as "CAN").'''
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/' + country + '.csv'
@@ -92,12 +101,14 @@ def annual_mean_temp(country):
latveria = annual_mean_temp('LTV')
print('number of records for Latveria:', len(latveria))
~~~
-~~~ {.output}
+{: .python}
+~~~
url used is http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/LTV.csv
response code: 200
length of data: 0
number of records for Latveria: 0
~~~
+{: .output}
In other words,
the World Bank is always saying,
@@ -107,7 +118,7 @@ After a bit more experimenting, we discover that the site *always* returns a 200
The only way to tell if there's real data or not is to check if `response.text` is empty.
Here's the updated function:
-~~~ {.python}
+~~~
def annual_mean_temp(country):
'''
Get the annual mean temperature for a country given its 3-letter ISO code (such as "CAN").
@@ -128,17 +139,19 @@ def annual_mean_temp(country):
print('number of records for Canada:', len(annual_mean_temp('CAN')))
print('number of records for Latveria:', len(annual_mean_temp('LTV')))
~~~
-~~~ {.output}
+{: .python}
+~~~
number of records for Canada: 109
number of records for Latveria: 0
~~~
+{: .output}
Now that we can get surface temperatures for different countries,
we can write a function to compare those values.
(We'll jump straight into writing a function because by now it's clear that's what we're eventually going to do anyway.)
Here's our first attempt:
-~~~ {.python}
+~~~
def diff_records(left, right):
'''Given lists of [year, value] pairs, return list of [year, difference] pairs.'''
num_years = len(left)
@@ -150,13 +163,15 @@ def diff_records(left, right):
results.append([left_year, difference])
return results
~~~
+{: .python}
Here, we're using the number of entries in `left` (which we find with `len(left)`) to control our loop.
The expression:
-~~~ {.python}
+~~~
for i in range(num_years):
~~~
+{: .python}
runs `i` from 0 to `num_years-1`, which corresponds exactly to the legal indices of `left`.
Inside the loop we unpack the left and right years and values from the list entries,
@@ -165,20 +180,22 @@ which we return at the end.
To see if this function works, we can run a couple of tests on made-up data:
-~~~ {.python}
+~~~
print('one record:', diff_records([[1900, 1.0]],
[[1900, 2.0]]))
print('two records:', diff_records([[1900, 1.0], [1901, 10.0]],
[[1900, 2.0], [1901, 20.0]]))
~~~
-~~~ {.output}
+{: .python}
+~~~
one record: [[1900, -1.0]]
two records: [[1900, -1.0], [1901, -10.0]]
~~~
+{: .output}
That looks pretty good—but what about these cases?
-~~~ {.python}
+~~~
print('mis-matched years:', diff_records([[1900, 1.0]],
[[1999, 2.0]]))
print('left is shorter', diff_records([[1900, 1.0]],
@@ -186,7 +203,8 @@ print('left is shorter', diff_records([[1900, 1.0]],
print('right is shorter', diff_records([[1900, 1.0], [1901, 2.0]],
[[1900, 10.0]]))
~~~
-~~~ {.error}
+{: .python}
+~~~
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in ()
@@ -205,6 +223,7 @@ IndexError: list index out of rangemis-matched years: [[1900, -1.0]]
left is shorter [[1900, -9.0]]
right is shorter
~~~
+{: .error}
The first test gives us an answer even though the years didn't match:
we get a result, but it's meaningless.
@@ -218,7 +237,7 @@ because they are [silent failures](reference.html#silent-failure):
the function does the wrong thing, but doesn't indicate that in any way.
Let's fix that:
-~~~ {.python}
+~~~
def diff_records(left, right):
'''
Given lists of [year, value] pairs, return list of [year, difference] pairs.
@@ -237,27 +256,31 @@ def diff_records(left, right):
results.append([left_year, difference])
return results
~~~
+{: .python}
Do our "good" tests pass?
-~~~ {.python}
+~~~
print('one record:', diff_records([[1900, 1.0]],
[[1900, 2.0]]))
print('two records:', diff_records([[1900, 1.0], [1901, 10.0]],
[[1900, 2.0], [1901, 20.0]]))
~~~
-~~~ {.output}
+{: .python}
+~~~
one record: [[1900, -1.0]]
two records: [[1900, -1.0], [1901, -10.0]]
~~~
+{: .output}
What about our the three tests that we now expect to fail?
-~~~ {.python}
+~~~
print('mis-matched years:', diff_records([[1900, 1.0]],
[[1999, 2.0]]))
~~~
-~~~ {.error}
+{: .python}
+~~~
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in ()
@@ -273,12 +296,14 @@ AssertionError Traceback (most recent call last)
AssertionError: Record 0 is for different years: 1900 vs 1999mis-matched years:
~~~
+{: .error}
-~~~ {.python}
+~~~
print('left is shorter', diff_records([[1900, 1.0]],
[[1900, 10.0], [1901, 20.0]]))
~~~
-~~~ {.error}
+{: .python}
+~~~
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in ()
@@ -294,11 +319,13 @@ AssertionError Traceback (most recent call last)
AssertionError: Inputs have different lengths. left is shorter
~~~
-~~~ {.python}
+{: .error}
+~~~
print('right is shorter', diff_records([[1900, 1.0], [1901, 2.0]],
[[1900, 10.0]]))
~~~
-~~~ {.error}
+{: .python}
+~~~
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
in ()
@@ -314,10 +341,11 @@ AssertionError Traceback (most recent call last)
AssertionError: Inputs have different lengths. right is shorter
~~~
+{: .error}
Excellent: the assertions we've added will now alert us if we try to work with badly-formatted or inconsistent data.
->## Error Handling {.challenge}
+> ## Error Handling
>
> Python scripts should have error handling code because:
>
@@ -325,8 +353,9 @@ Excellent: the assertions we've added will now alert us if we try to work with b
> 2. Functions can return errors.
> 3. One should never trust the data provided is what is expected.
> 4. A python script would stop on an error, so the task wouldn't be accomplished.
+{: .challenge}
-> ## When to Complain? {.challenge}
+> ## When to Complain?
>
> We have actually just committed the same mistake as the World Bank:
> if someone gives `annual_mean_temp` an invalid country identifier,
@@ -335,23 +364,27 @@ Excellent: the assertions we've added will now alert us if we try to work with b
> so the caller has to somehow know to look for that.
> Should it use an assertion to fail if it doesn't get data?
> Why or why not?
+{: .challenge}
-> ## Enumerating {.challenge}
+> ## Enumerating
>
> Python includes a function called `enumerate` that's often used in `for` loops.
> This loop:
>
-> ~~~ {.python}
+> ~~~
> for (i, c) in enumerate('abc'):
> print(i, '=', c)
> ~~~
+> {: .python}
>
> prints:
>
-> ~~~ {.output}
+> ~~~
> 0 = a
> 1 = b
> 2 = c
> ~~~
+> {: .output}
>
> Rewrite `diff_records` to use `enumerate`.
+{: .challenge}
diff --git a/_episodes/04-visualize.md b/_episodes/04-visualize.md
index 1abfd25..cc0ef99 100644
--- a/_episodes/04-visualize.md
+++ b/_episodes/04-visualize.md
@@ -1,15 +1,19 @@
---
title: "Visualization"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "Construct a simple visualization using pyplot."
+keypoints:
+- "FIXME"
---
-> ## Learning Objectives {.objectives}
->
-> * Construct a simple visualization using pyplot.
Long lists of numbers are not particularly useful,
but we now have the tools we need to visualize the temperature differences between countries:
-~~~ {.python}
+~~~
from matplotlib import pyplot as plt
australia = annual_mean_temp('AUS')
@@ -18,6 +22,7 @@ diff = diff_records(australia, canada)
plt.plot(diff)
plt.show()
~~~
+{: .python}
![First Plot](fig/plot-01.png)
@@ -26,17 +31,19 @@ pyplot has interpreted the list of pairs returned by `annual_mean_temp`
as two corresponding curves rather than as the (x,y) coordinates for one curve.
Let's convert our list of (year, difference) pairs into a NumPy array:
-~~~ {.python}
+~~~
import numpy as np
d = np.array(diff)
~~~
+{: .python}
and then plot the first column against the second:
-~~~ {.python}
+~~~
plt.plot(d[:, 0], d[:, 1])
plt.show()
~~~
+{: .python}
![Second Plot](fig/plot-02.png)
@@ -45,7 +52,8 @@ At this point, if we wanted to do some real science,
it would be time to use a curve-fitting library
or calculate some meaningful statistics.
-> ## Changing Visualizations {.challenge}
+> ## Changing Visualizations
>
> Modify the plotting commands so that the Y-axis scale runs from 0 to 32.
> Do you think this gives you a more accurate or less accurate view of this data?
+{: .challenge}
diff --git a/_episodes/05-makedata.md b/_episodes/05-makedata.md
index 5b5c20d..3a4b95c 100644
--- a/_episodes/05-makedata.md
+++ b/_episodes/05-makedata.md
@@ -1,10 +1,14 @@
---
title: "Publishing Data"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "Write Python programs that share static data sets."
+keypoints:
+- "FIXME"
---
-> ## Learning Objectives {.objectives}
->
-> * Write Python programs that share static data sets.
We now have functions to download temperature data for different countries and find annual differences.
The next step is to share our findings with the world by publishing the data sets we generate.
@@ -17,7 +21,7 @@ To do this, we have to answer three questions:
The first question is the easiest to answer:
`diff_records` returns a list of (year, difference) pairs that we can write out as a CSV file:
-~~~ {.python}
+~~~
import csv
def save_records(filename, records):
@@ -26,18 +30,21 @@ def save_records(filename, records):
writer = csv.writer(raw)
writer.writerows(records)
~~~
+{: .python}
-> ## Lessons Learned {.callout}
+> ## Lessons Learned
>
> We use the `csv` library to write data
> for the same reason we use it to read:
> it correctly handles special cases (such as text containing commas).
+{: .callout}
Let's test it:
-~~~ {.python}
+~~~
save_records('temp.csv', [[1, 2], [3, 4]])
~~~
+{: .python}
If we then look in the file `temp.csv`, we find:
@@ -45,6 +52,7 @@ If we then look in the file `temp.csv`, we find:
1,2
3,4
~~~
+{: .source}
as desired.
@@ -79,7 +87,7 @@ Someone could, for example, call `save_records('aus+bra.csv', records)`.
To reduce the odds of this happening,
let's modify `save_records` to take country identifiers as parameters:
-~~~ {.python}
+~~~
import csv
def save_records(left, right, records):
@@ -89,30 +97,34 @@ def save_records(left, right, records):
writer = csv.writer(raw)
writer.writerows(records)
~~~
+{: .python}
We can now call it like this:
-~~~ {.python}
+~~~
save_records('AUS', 'BRA', [[1, 2], [3, 4]])
~~~
+{: .python}
and then check that the right output file has been created.
We are bound to have the country codes anyway (having used them to look up our data),
so this should seem natural to our users.
-> ## Deciding What to Check {.challenge}
+> ## Deciding What to Check
>
> Should `save_records` check that every record in its input has exactly two fields?
> Why or why not?
> What about country codes -
> should it contain a list of those that match actual countries
> and check that `left` and `right` are in that list?
+{: .challenge}
-> ## Setting Up Locally {.challenge}
+> ## Setting Up Locally
>
> Find out how to publish a file on your department's server.
+{: .challenge}
-> ## Published Data Consistency {.challenge}
+> ## Published Data Consistency
>
> It is important for the file names of published data to be consistent because:
>
@@ -120,3 +132,4 @@ so this should seem natural to our users.
> 2. You may not have access to your department's server to rename them.
> 3. The `csv` library requires it.
> 4. Programs can only process files and data correctly when they are.
+{: .challenge}
diff --git a/_episodes/06-findable.md b/_episodes/06-findable.md
index fd31a48..cceb939 100644
--- a/_episodes/06-findable.md
+++ b/_episodes/06-findable.md
@@ -1,10 +1,14 @@
---
title: "Making Data Findable"
-minutes: 15
+teaching: 15
+exercises: 0
+questions:
+- "FIXME"
+objectives:
+- "FIXME"
+keypoints:
+- "Make data sets more useful by providing metadata."
---
-> ## Learning Objectives {.objectives}
->
-> * Make data sets more useful by providing metadata.
It's not enough to tell people what the rule is for creating filenames,
since that doesn't tell them what data sets we've actually generated.
@@ -20,6 +24,7 @@ Here's the format we will use:
2014-05-27,AUS,CAN,AUS-CAN.csv
2014-05-28,BRA,CAN,BRA-CAN.csv
~~~
+{: .source}
The columns are the date the data set was generated,
the identifiers of the two countries being compared,
@@ -32,7 +37,7 @@ other people's programs shouldn't have to.
Here's a function that updates the index file every time we generate a new data file:
-~~~ {.python}
+~~~
import time
def update_index(index_filename, left, right):
@@ -56,6 +61,7 @@ def update_index(index_filename, left, right):
writer = csv.writer(raw)
writer.writerows(records)
~~~
+{: .python}
Let's test it.
If our index file contains:
@@ -66,12 +72,14 @@ If our index file contains:
2014-05-27,AUS,CAN,AUS-CAN.csv
2014-05-28,BRA,CAN,BRA-CAN.csv
~~~
+{: .source}
and we run:
-~~~ {.python}
+~~~
update_index('data/index.csv', 'TCD', 'CAN')
~~~
+{: .python}
then our index file now contains:
@@ -82,6 +90,7 @@ then our index file now contains:
2014-05-28,BRA,CAN,BRA-CAN.csv
2014-05-29,TCD,CAN,TCD-CAN.csv
~~~
+{: .source}
Now that all of this is in place,
it's easy for us—and other people—to do new and exciting things with our data.
@@ -89,7 +98,7 @@ For example,
we can easily write a small program that tells us what data sets include information about a particular country
*and* have been published since we last checked:
-~~~ {.python}
+~~~
def what_is_available(index_file, country, after):
'''What data files include a country and have been published since 'after'?'''
with open(index_file, 'r') as raw:
@@ -102,11 +111,13 @@ def what_is_available(index_file, country, after):
print what_is_available('data/index.csv', 'BRA', '2014-05-27')
~~~
-~~~ {.output}
+{: .python}
+~~~
['AUS-BRA.csv', 'BRA-CAN.csv']
~~~
+{: .output}
-> ## New Kinds of Science {.callout}
+> ## New Kinds of Science
>
> This may not seem like a breakthrough,
> but it is actually an example of how the web helps researchers do new kinds of science.
@@ -125,8 +136,9 @@ print what_is_available('data/index.csv', 'BRA', '2014-05-27')
> It then checks the articles listed in that index against its local record of what has already been seen,
> then downloads any articles that are new.
> By automating this process, blogging tools help us focus attention on things that are actually worth looking at.
+{: .callout}
-> ## Indexing {.challenge}
+> ## Indexing
>
> We should always create an index for generated data because:
>
@@ -134,19 +146,23 @@ print what_is_available('data/index.csv', 'BRA', '2014-05-27')
> 2. The web server will not display the directory without an index.
> 3. REST APIs require an index to function.
> 4. It is too complicated for a program to calculate itself.
+{: .challenge}
-> ## Metadata for Metadata {.challenge}
+> ## Metadata for Metadata
>
> Should the first line of the index file be a header giving the names of the columns?
> Why or why not?
+{: .challenge}
-> ## To Automate or Not {.challenge}
+> ## To Automate or Not
>
> Should `update_index` be called inside `save_records`
> so that the index is automatically updated every time a new data set is generated?
> Why or why not?
+{: .challenge}
-> ## Removing Redundant Redundancy {.challenge}
+> ## Removing Redundant Redundancy
>
> `update_index` and `save_records` both construct the name of the data file.
> Refactor them to remove this redundancy.
+{: .challenge}
diff --git a/_extras/figures.md b/_extras/figures.md
new file mode 100644
index 0000000..de99a57
--- /dev/null
+++ b/_extras/figures.md
@@ -0,0 +1,6 @@
+---
+layout: page
+title: Figures
+permalink: /figures/
+---
+{% include all_figures.html %}
diff --git a/_includes/all_figures.html b/_includes/all_figures.html
new file mode 100644
index 0000000..97f3c5d
--- /dev/null
+++ b/_includes/all_figures.html
@@ -0,0 +1,3 @@
+![First Plot](fig/plot-01.png)
+
+![Second Plot](fig/plot-02.png)
diff --git a/fig/maddie-chat.jpg b/fig/maddie-chat.jpg
deleted file mode 100644
index 7f5eb77..0000000
Binary files a/fig/maddie-chat.jpg and /dev/null differ
diff --git a/index.md b/index.md
index 80b28e3..6a30899 100644
--- a/index.md
+++ b/index.md
@@ -4,7 +4,7 @@ layout: lesson
This lesson explains how to consume data from the web,
and how to create data for others to use.
-> ## Prerequisites {.prereq}
+> ## Prerequisites
>
> This lesson assumes learners understand basic Python programming,
> and can:
@@ -13,3 +13,4 @@ and how to create data for others to use.
> * Use strings and lists
> * Write and call simple functions
> * Access the Internet in class
+{: .prereq}