Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

Getting Started

Dave Landry edited this page Aug 31, 2016 · 1 revision

Data is returned by the API as JSON, with the column headers separated from each data point. This is done to reduce the overall file size of the data being transferred, which in turn dramatically speeds up the loading time. For example, let's look at the data of average wage for each state. Here is what our data call would look like:

http://api.datausa.io/api/?show=geo&sumlevel=state&required=avg_wage

If you copy/paste that URL into a web browser, you should see results that look like this:

{
  "data": [
    [2014, "04000US01", 41185.2],
    [2014, "04000US02", 51959.1],
    [2014, "04000US04", 43582.8],
    [2014, "04000US05", 38570.2],
    [2014, "04000US06", 51851.5],
    ...
  ],
  "headers": ["year", "geo", "avg_wage"],
  "source": {
    "link": "http://census.gov/programs-surveys/acs/technical-documentation/pums.html",
    "org": "Census Bureau",
    "table": "yg",
    "supported_levels": {
      "geo": ["nation", "state", "puma", "all"]
    },
    "dataset": "ACS PUMS 1-year Estimate"
  },
  "subs": {},
  "logic": [
    {
      "link": "http://census.gov/programs-surveys/acs/technical-documentation/pums.html",
      "org": "Census Bureau",
      "table": "yg",
      "supported_levels": {
        "geo": ["nation", "state", "puma", "all"]
      },
      "dataset": "ACS PUMS 1-year Estimate"
    },
    {
      "link": "http://census.gov/programs-surveys/acs/technical-documentation/pums.html",
      "org": "Census Bureau",
      "table": "ygo",
      "supported_levels": {
        "soc": ["0", "1", "2", "3", "all"],
        "geo": ["nation", "state", "puma", "all"]
      },
      "dataset": "ACS PUMS 5-year Estimate"
    },
    ...
  ]
}

In addition to the data and headers, the API also provides information on the data source, as well as all of the other possible tables that could have provided the data. The Data API includes logic to determine which table best suits the query, based on granularity and margin of error.

A common use case when using the API to build visualizations is to load the data in a client-side AJAX request and pass that as an array of objects to a visualization library. Here is an example that uses D3 to load the data, and then combines the headers with the data array in preparation for the visualization:

var url = "http://api.datausa.io/api/?show=geo&sumlevel=state&required=avg_wage";

d3.json(url, function(json) {

  var data = json.data.map(function(data){
    return json.headers.reduce(function(obj, header, i){
      obj[header] = data[i];
      return obj;
    }, {});
  });

});

Using the Data USA API in Python is very similar to the Javascript approach listed above. Here is an example using the open source requests library to load the data (install using pip install requests):

import requests

url = "http://api.datausa.io/api/?show=geo&sumlevel=state&required=avg_wage"

json = requests.get(url).json()

data = [dict(zip(json["headers"], d)) for d in json["data"]]