A wrap up project for the completion of the 7 weeks of the ZoomCamp!
Explore the docs »
View Dashboard
·
Report Bug
Table of Contents
Today microblogging has become a very common platform for exchanging opinion among us. Many users exchange their thoughts on a various aspect of their activity. Consequently, microblogging websites are the substantial origin of information for sentiment analysis and opinion mining. Twitter is a famous microblogging website where 500 million tweets are posted every day. This Project summarizes the data set of Tweets related to the forth coming 2023 general election in Nigeria targeted at the two leading presidential aspirants in the country.
The dataset will be scraped daily from twitter, cleaned and transformed with the necessary sentimental analysis carried out on the tweets before loading to the datalake, then the data warehouse for storage and staging for provisioning the data studio with clean data for presenting the insights and analysis using well defined charts and dashboards. All the processes above will be carried out using the various knowledge and tools(cloud engineering and devops) associated with data and analytical engineering.
- Nigeria Political Tweets: the dataset we will use during the course.
- Pandas: a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- Google BigQuery: serverless data warehouse (central repository of integrated data from one or more disparate sources).
- Airflow: workflow management platform for data engineering pipelines. In other words, a pipeline orchestration tool.
- Docker: a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
- Google Cloud Storage: a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure.
- Google Data Studio: Google Data Studio turns your data into fully customizable informative reports and dashboards that are easy to read and share
Language, frameworks, libraries, Services and Tools used to bootstrap this project.
for real time dashboard of our data and with its analysis , please refer to the Political Arena Dashboard
- Create a GCP project and Get the google service key and store in a file path
- Install Terraform and create the main.tf and variable.tf file
- Provision the various Google Cloud Resources Using Terraform
- Create an Airflow folder with dags,logs and plugins folders inside it
- Install Docker and Docker Compose
- Add a custom Docker file with airflow image to take in airflow environment, python environment and google development kit/environment
- Build the airflow image
- Add the docker compose file with various airflow services and variables together with google variables
- Build the bash data ingestion script
- Build the dag python file with various operators for the execution off various tasks
- Run the docker compose up to build and start our containers for the execution of the project
- Connect the ingested dataset in the dataware to google data studio
- Build dashboards to pass the necessary information effectively
See the open issues for a full list of proposed features (and known issues).
I am extremely grateful for the time this set of wonderful people put in place to ensure we understood the various aspect of data and analytical engineering
Your Name - Adesoba Adewale Olamide
Project Link: 2023 Political Arena