Twitter data pipeline using Airflow

Purpose

The purpose of this project is to build a data pipeline using Airflow and Python. This can be divided into three steps. First, Twitter data is extracted using Twitter API and transformed with Python. Second, the code is deployed on Airflow/Amazon EC2. Third, triggering a newly created DAG runs the code and saves the final result, a csv file, on Amazon S3.

Tools

Airflow
Python
Amazon EC2
Amazon S3

Version

23 Nov 2022

Code

twitter_etl.py: this contains the run_twitter_etl function which creates connection between the code and Twitter API and extracts data from it. The file also contains some steps and commands used to launch the airflow server, deploy the code, and triggers DAG.
twitter_dag.py: this creates DAG and triggers it using PythonOperator.

Data sources

Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter data pipeline using Airflow

Purpose

Tools

Version

Code

Data sources

About

Releases

Packages

mijikm/twitter-airflow-project

Folders and files

Latest commit

History

Repository files navigation

Twitter data pipeline using Airflow

Purpose

Tools

Version

Code

Data sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages