Collection of user guides, tools, and links to resources for working with OpenAQ data.
- Resources
- Download OpenAQ archive data from S3 using
awscli
- How big is the OpenAQ S3 bucket?
- Convert ndjson to InfluxDB line protocol format
- Convert CSV to InfluxDB line protocol format
- Contributing
- Access OpenAQ data via a filterable SNS topic
- Using Athena to access the whole archive
- Air Quality Collection with TimescaleDB - Sample Application
- openaq.org - The main OpenAQ website, contains CSV download pages and the world pollutant map.
- ropensci/ropenaq - R package for the OpenAQ API
- nickolasclarke/openaq - JavaScript client for the OpenAQ API
- dhhagan/py-openaq - Python wrapper for the OpenAQ API
- openaq-postman - Postman collections for working with OpenAQ API
- jackkoppa/cityaq - Compare air quality for cities
- dolugen/openaq-browser - A web client for OpenAQ API
- barronh/scrapenaq - Download and convert OpenAQ archived data with Pandas
- dolugen/openaq-swagger - OpenAPI v3 spec of OpenAQ API
- dolugen/sns-s3-influxdb - Populate InfluxDB with air quality data
- OpenAQ on AWS - OpenAQ's publically available S3 bucket and SNS topic informations.
OpenAQ stores metric data in a S3 bucket, and it's publicly available. One way to download from the archive is using the aws s3
command.
Prerequisites: You need a free AWS account, and have awscli
installed and configured.
Download a single file:
aws s3 cp s3://openaq-fetches/realtime-gzipped/2020-06-06/1591476667.ndjson .
Download files for 1 day:
aws s3 cp s3://openaq-fetches/realtime-gzipped/2020-06-06/ . --recursive
You can go up 1 level and download the entire archive if you wish.
If you prefer to not use awscli
, take a look at this tool that uses the scraping approach: barronh/scrapenaq.
aws s3 ls --summarize --human-readable --recursive s3://openaq-fetches
As of June 2020, it's 323 GB.
The archive files in the S3 bucket are ndjson
formatted, or newline delimited JSON. Meaning it's just JSON, but each line is a separate JSON object.
If you were to convert this to InfluxDB's line protocol, you can use ndjson2lineprotocol.py
script that's found in this repo.
cat *.ndjson | ./ndjson2lineprotocol.py
The script outputs to standard output, so you may want to redirect it to a file.
Addition to the S3 option, you can filter and download data as CSV from openaq.org website.
After downloading the CSV, feed the file to csv2lineprotocol.py
like so:
cat openaq.csv | ./csv2lineprotocol.py
Something missing or need fixing here? Please use the issues page to submit requests and ask questions. You can also create a Pull Request with your changes.