tweet-processing

ETL pipeline that extracts tweets from the accounts a user is following using the twitter api v2. The tweets are then transformed into a format to be used for natural langauge processing (NLP), and loaded into a Postgres database.

ETL Pipeline High Level Design

Requirements

The following are requirements for using this code:

AWS account & IAM user created: AWS Homepage
AWS CLI installed and configured, and access key for IAM user created
Twitter account & API Key, Secret, and Bearer Token created: Twitter Developer Docs

Setting up the infrastructure in Amazon Web Services (AWS)

Run the insert_account_id script with your AWS account id asan argument as follows:

$ ./insert_account_id ###########

This inserts your account id into the setup script, and the json files detailing the role id. The following script will create the lambda functions and their associated roles, the s3 bucket, and the PostgreSQL RDS database instance.

$ ./setup_script

Python Dependencies

The dependencies are added to lambda functions in the form of layers. A layer is created with a provided zip file with relevant libraries installed. More information can be found here: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

In order to create layers that are compatible with the lambda operating system, an ec2 instance was created in order to install and package the dependencies. This is accomplished by executing the following script:

./ec2-instance-setup

lambda-extraction-function

Python libraries required: requests, boto3

lambda-transformation-function

Python libraries required: boto3, pandas, nltk

lambda-loading-function

Python libraries required: pandas, psycopg2, sqlalchemy, boto3

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
env_variables		env_variables
lib		lib
role-permissions		role-permissions
README.md		README.md
aws-architecture-diagram.png		aws-architecture-diagram.png
ec2-instance-setup		ec2-instance-setup
insert_account_id		insert_account_id
setup_script		setup_script
tweet_extraction.py		tweet_extraction.py
tweet_loader.py		tweet_loader.py
tweet_transformation.py		tweet_transformation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tweet-processing

ETL Pipeline High Level Design

Requirements

Setting up the infrastructure in Amazon Web Services (AWS)

Python Dependencies

lambda-extraction-function

lambda-transformation-function

lambda-loading-function

About

Releases

Packages

Languages

oiannace/tweet-processing

Folders and files

Latest commit

History

Repository files navigation

tweet-processing

ETL Pipeline High Level Design

Requirements

Setting up the infrastructure in Amazon Web Services (AWS)

Python Dependencies

lambda-extraction-function

lambda-transformation-function

lambda-loading-function

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages