By Drew White
This project is part of a larger project located here: Team Week 3
Summary | Technologies Used | Sources | Description | dw_weather_scrape.py | dw_weekly_avg.py | Visualizations | Known Bugs
This is a data engineering project that utilizes various technologies to scrape weather data, transform it, and store it in Google BigQuery. The project uses Python as its primary language and Apache Airflow as its workflow management system. BeautifulSoup is used to scrape the data from the National Weather Service and Pandas is used to manipulate the data. Google BigQuery is used as the primary data store.
The dw_weather_scrape.py
script contains three functions that work together to scrape weather data from the National Weather Service, transform the data, and write it to Google BigQuery on a daily basis.
The dw_weekly_avg.py
script pulls data from the daily table in Google BigQuery and calculates weekly averages for select columns. The script then writes the averages to the weekly_avg table on a weekly basis.
Note: For demonstration purposes in this project, the scheduled intervals are not daily and weekly but instead hourly and daily. This was to gather more data for the presentation of this project. In a full production environment, the Airflow DAGs will trigger at the daily and and weekly intervals.
- Python
- Apache Airflow
- Pandas
- BeautifulSoup
- Google BigQuery
A dictionary of the sources of the city weather data:
scrape_weather_data
- Uses BeautifulSoup to scrape National Weather Service and put into Pandas data frame.
transform_weather_data
- Takes Pandas data frame and makes transformations on data to create more usable values.
write_weather_data_to_bq
- Writes the scraped/transformed data to Google BigQuery daily appending on to existing
daily
table.
- Writes the scraped/transformed data to Google BigQuery daily appending on to existing
calculate_weekly_averages
- Pulls
daily
data from BigQuery and gets averages of select columns.
- Pulls
write_weekly_avg_to_bq
- Writes averages to BigQuery on weekly schedule to
weekly_avg
table.
- Writes averages to BigQuery on weekly schedule to
- No known bugs
If you find any issues, please reach out at: [email protected].
Copyright (c) 2023 Drew White