This project aims at exploring the shifts in yearly temperature and rainfall patterns dating back to 1979.
I've chosen to focus on five beautiful destinations worldwide:
Berlin, Ko Tao, Parque Nacional Corcovado, San Diego, and Tulum.
The project is divided into two main parts:
Both historical and current weather data collection and analysis.
- OpenWeather Free API Key 1000 API calls per day for free.
- Amazon Web Service Account There is a free tier option but definitely check out the AWS cost explorer as some additional costs might arise.
- AWS RDS PostgreSQL Database You can follow the steps in the video and choose PostgreSQL instead of MySQL for this project
- AWS S3
- AWS Lambda
- A database administration tool, for example DBeaver 23.3.5 which has to be connected to your AWS RDS Database
- Python 3.11
- Jupyter Notebook
pip install notebook
- Download historical weather data of your cities of interest as a JSON file - costs 9€ for each location
- Save the data as "weather_history_bulk.json"
- Create an AWS S3 bucket for historical weather data
- Configure a credentials.json file containing all values which are required in the historical_weather_data.ipynb file
- Install the required dependencies by executing
pip install -r requirements.txt
- Run the Jupyter notebook for historical data transformation
- Using DBeaver, create RDS weather tables by executing the queries provided in the .sql file
- In DBeaver, import the historical weather data into the designated table for each city
This part of the project is accomplished by using two different AWS Lambda functions. The initial function is scheduled to execute every hour automatically, while the second function triggers upon the upload of a .csv file into the designated S3 bucket. This upload marks the concluding step of the first function's execution.
AWS Lambda functions need to be initialized with a specific Python runtime. If additional dependencies are required, such as psycopg2 for database connections, they must to be uploaded as a .zip file, along with the lambda_function. Pandas should be integrated as an AWS Lambda function layer rather than being part of the uploaded dependencies.
❗ I highly recommend the approach of downloading dependencies for the second AWS Lambda function in Python 3.8 runtime as shown in the video as it resolves several compatibility issues encountered with alternative methods ❗
AWS Lambda ❗Python 3.10❗
- Create an AWS S3 bucket for current weather data
- Create an AWS Lambda function in Python 3.10 runtime
- Add Pandas 3.10 layer
- Upload the function and its dependencies as a .zip file, as described in the video
- Create an AWS CloudWatch Event to activate the function at hourly intervals
AWS Lambda ❗Python 3.8❗
- Create an AWS Lambda function in Python 3.8 runtime
- Add Pandas 3.8 layer
- Download the required dependencies as described here, using AWS Cloud9
- Upload the function and its dependencies as a .zip file, as described in the video
- Add an event trigger to invoke the function whenever a .csv file is uploaded into the S3 bucket
Have fun collecting weather data for your personal analysis! 🌿