155 Branches 0 Tags

Name	Name	Last commit message	Last commit date
Latest commit caparker A little refactoring to help with error handling Mar 19, 2024 3c76836 · Mar 19, 2024 History 1,022 Commits
.github	.github	Update Node version to 18 (#1070 )	Jan 18, 2024
cdk	cdk	Adding role policy	Mar 8, 2024
data_scripts	data_scripts	Lambda deployment (#980 )	Apr 10, 2023
docs	docs	Update source.md (#804 )	Apr 23, 2021
scripts	scripts	Lambda deployment (#980 )	Apr 10, 2023
src	src	A little refactoring to help with error handling	Mar 19, 2024
test/lib	test/lib	Improve tests with additional comments, fix test case on error.	Nov 11, 2018
.babelrc	.babelrc	Lambda deployment (#980 )	Apr 10, 2023
.eslintignore	.eslintignore	Switching to eslint for lintin	Mar 13, 2016
.gitignore	.gitignore	Lambda deployment (#980 )	Apr 10, 2023
CHANGELOG.md	CHANGELOG.md	Add taiwan (#1078 )	Feb 7, 2024
CONTRIBUTING.md	CONTRIBUTING.md	Fix header formatting	Jul 17, 2017
Dockerfile	Dockerfile	Script to execute the main process and some other adapters sequentially	Aug 24, 2021
LICENSE.md	LICENSE.md	Remove semistandard due to errors and update license	Nov 17, 2015
README.md	README.md	Enhancement/checking deployments and testing (#1064 )	Oct 24, 2023
docker-compose.yml	docker-compose.yml	Using Docker for deployment to ECS	Dec 13, 2015
index.adapter.sh	index.adapter.sh	Script to execute the main process and some other adapters sequentially	Aug 24, 2021
index.js	index.js	Lambda deployment (#980 )	Apr 10, 2023
index.sh	index.sh	Fix typo	Dec 21, 2021
knexfile.js	knexfile.js	Lambda deployment (#980 )	Apr 10, 2023
package-lock.json	package-lock.json	Adding role policy	Mar 8, 2024
package.json	package.json	Adding role policy	Mar 8, 2024
run-scripts.js	run-scripts.js	Add Hanoi (#1074 )	Jan 26, 2024

Repository files navigation

OpenAQ Data Ingest Pipeline

Overview

This is the main data ingest pipeline for the OpenAQ project.

Starting with index.js, there is an ingest mechanism to gather global air quality measurements from a variety of sources. This is currently run every 10 minutes and saves all unique measurements to a database.

openaq-api powers the API and more information on the data format can be found in openaq-data-format.

For more info see the OpenAQ-Fetch documentation index.

Installing & Running

To run the API locally, you will need Node.js installed.

Install necessary Node.js packages by running

npm install

Now you can get started with:

node index.js --help

For a full development quick start (with database setup etc.), please see the dev-quick-start doc.

For production deployment, you will need to have certain environment variables set as in the table below

Name	Description	Default
SENDGRID_PASSWORD	Email service password	not set
SENDGRID_USERNAME	Email service username	not set
API_URL	URL of openaq-api	http://localhost:3004/v1/webhooks
WEBHOOK_KEY	Secret key to interact with openaq-api	'123'
AIRNOW_FTP_USER	User for AirNow FTP	not set
AIRNOW_FTP_PASSWORD	Password for AirNow FTP	not set
EEA_TOKEN	API token for EEA API	not set
DATA_GOV_IN_TOKEN	API token for data.gov.in	not set
EPA_VICTORIA_TOKEN	API token for portal.api.epa.vic.gov.au	not set
EEA_GLOBAL_TIMEOUT	How long to check for EEA async results before quitting in seconds	360
EEA_ASYNC_RECHECK	How long to wait to recheck for EEA async results in seconds	60
SAVE_TO_S3	Does the process save the measurements to an AWS S3 Bucket	not set

For full list of environment variables and process arguments, see environment documentation.

Pushing to AWS S3

If you want to push results to an S3 bucket as well for further processing, the environment variable SAVE_TO_S3 should be set to the value true. Additionally, you have to set the following environment variables (or be running in a process with a suitable IAM role):

Name	Description	Default
AWS_BUCKET_NAME	AWS Bucket to store the results	not set
AWS_ACCESS_KEY_ID	AWS Credentials key ID	not set
AWS_SECRET_ACCESS_KEY	AWS Credentials secret key	not set

The measurements will be stored using the structure bucket_name/fetches/yyyy-mm-dd/unixtime.ndjson for each fetch.

Tests

To confirm that everything is working as expected, you can run the tests with

npm test

To test an individual adapter, you can use something like:

node index.js --dryrun --source 'Beijing US Embassy'

For a more detailed description of the command line options available, use: node index.js --help

Deployment

Deployment is is being built from the lambda-deployment branch. Any development for openaq-fetch should be branched/merged from/to the lambda-deployment branch until further notice.

Deployments rely on a json object that contains the different deployments. The schedular is then used to loop through that object and post a message that will trigger a lambda to run that deployment. A deployment consists of a set of arguments that are passed to the fetch script to limit the sources that are run.

You can test the deployments with the following

Show all deployments but dont submit and dont run the fetcher node index.js --dryrun --deployments all --nofetch Only the japan deployment but dont run the fetcher node index.js --dryrun --deployments japan --nofetch

Only the japan deployment, dont submit a file but run the fetcher node index.js --dryrun --deployments japan

Data Source Criteria

This section lists the key criteria for air quality data aggregated onto the platform. A full explanation can be accessed here. OpenAQ is an ever-evolving process that is shaped by its community: your feedback and questions are actively invited on the criteria listed in this section.

Data must be of one of these pollutant types: PM10, PM2.5, sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), and black carbon (BC).
Data must be from an official-level outdoor air quality source, as defined as data produced by a government entity or international organization. We do not, at this stage, include data from low-cost, temporary, and/or indoor sensors.
Data must be ‘raw’ and reported in physical concentrations on their originating site. Data cannot be shared in an 'Air Quality Index' or equivalent (e.g. AQI, PSI, API) format.
Data must be at the ‘station-level,’ associable with geographic coordinates, not aggregated into a higher (e.g. city) level.
Data must be from measurements averaged between 10 minutes and 24 hours.

Contributing

There are a lot of ways to contribute to this project, more details can be found in the contributing guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAQ Data Ingest Pipeline

Overview

Installing & Running

Pushing to AWS S3

Tests

Deployment

Data Source Criteria

Contributing

About

Releases

Packages

Contributors 28

Languages

License

openaq/openaq-fetch

Folders and files

Latest commit

History

Repository files navigation

OpenAQ Data Ingest Pipeline

Overview

Installing & Running

Pushing to AWS S3

Tests

Deployment

Data Source Criteria

Contributing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 28

Languages

Packages