GitHub - tee8z/noaa-data-pipeline: NOAA data pipeline, queryable from the browser and supports a Bitcoin DLC oracle, in dlctix style

A simple system showing how to create a data pipeline from NOAA

Live site located at: 4casttruth.win
Feel free to pull the parquet files and use in your own data analysis (python is a good choice to do this with)

Example of using the UI:

Where data comes from:

Info on where the data used to generate the parquet files comes from:
- Observations: https://madis.ncep.noaa.gov/madis_metar.shtml accessed via https://aviationweather.gov/data/api/
- Forecasts (Multiple Point Un-summarized Data): https://graphical.weather.gov/xml/rest.php
These xml data files are updated once an hour by NOAA, so to be respectful of their services we run our data pulling process once an hour as well

How the system works:

daemon:
- Background process to pull down data from NOAA and transform it into flatted parquet files. These files are then pushed to the oracle via the REST endpoint POST http://localhost:9100/file (via multipart form)
oracle:
- A REST API that takes in the parquet files and allows downloading of them from a browser UI that is also hosted by the api.
ui:
- Holds the browser UI that's just an index.html and main.js file. It uses @duckdb/duckdb-wasm to allow the end user to query directly against the download parquet files
- It uses https://bulma.io/ for css styling

Why build a data pipeline like this:

No remote DB needed, only a dumb file server, makes this cheap to run
Faster and more flexible querying ability provided to the end user, allowing them to find unique insights that the original system design may not be looking to find
Each piece is a 'simple' logical item, allowing for scalability for however large the usage is on the service
Would NOT recommend using this approach if the data being tracked needs to be updated in the parquet files and stored as a relational model, only really works if the data model can be snapshots and immutable over time

Data Pipeline Process (arrows in direction of who initiates the talking):

[noaa api] <- [daemon] -> parquet files -> [oracle] <- parquet files <- [browser duck_db]

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
.vscode		.vscode
daemon		daemon
git_duckdb		git_duckdb
oracle		oracle
ui_demo		ui_demo
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
download_release.sh		download_release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A simple system showing how to create a data pipeline from NOAA

Example of using the UI:

Where data comes from:

How the system works:

Why build a data pipeline like this:

Data Pipeline Process (arrows in direction of who initiates the talking):

How to use:

About

Releases 9

Packages

Languages

License

tee8z/noaa-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

A simple system showing how to create a data pipeline from NOAA

Example of using the UI:

Where data comes from:

How the system works:

Why build a data pipeline like this:

Data Pipeline Process (arrows in direction of who initiates the talking):

How to use:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages