E-Commerce-DataPipeline

Project Overview

This project is an overview of an e-commerce data pipeline that involves building a sophisticated event-driven data ingestion and transformation pipeline focusing on e-commerce transactional data. We will design a system using AWS services such as S3, Lambda, Glue, Redshift, and SNS to ingest, transform, validate, and upsert data into Amazon Redshift for analytical purposes.

Architectural Diagram

Key Steps

1. Dimension Tables with Sample Records and Fact Table

Pre-load these dimension tables into Redshift as part of the setup process.
Products Dimension Table (dim_products):
- Columns: product_id, product_name, category, price, supplier_id
Customers Dimension Table (dim_customers):
- Columns: customer_id, first_name, last_name, email, membership_level
Transactions Fact Table (fact_transactions):
- Columns: transaction_id, customer_id, customer_email, product_id, product_name, total_price, transaction_date, payment_type, status

2. Mock Data Generation

Generate Mock Data for the customers based on products. Use Code build for lambda. This lambda function will generate data and upload csv files into S3 in Hive Partitioning manner.
- CODE BUILD:
- Mock CSV files generated from the Lambda, the code in lambda is updated from GitHub repo by the CICD setup with AWS CodeBuild. stored using the following hive-style partitioning in S3: s3://your-bucket/transactions/year=2023/month=03/day=15/transactions_2023-03-15.csv.

3. Create AWS Glue Crawlers

Create a Glue crawler for the S3 bucket input file directory (data stored in HIVE style with multiple partitioning)
Create a Glue crawler for the fact_transaction Redshift table. NOTE: we have to create the Redshift connector and 2 important points to remember
- Security Group associated with Redshift should be exposed to the Redshift PORT 5439.
- VPC associated with Redshift should have a S3 Endpoint defined.

4. Create Glue ETL Flow

5. Create lambda to start the glue job when data is generated in S3:

6. Create SNS Topic:

Create an SNS Topic to send emails when Glue ETL job is done, and it can be used in the final lambda to archive the data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

E-Commerce-DataPipeline

Project Overview

Architectural Diagram

Key Steps

1. Dimension Tables with Sample Records and Fact Table

2. Mock Data Generation

3. Create AWS Glue Crawlers

4. Create Glue ETL Flow

5. Create lambda to start the glue job when data is generated in S3:

6. Create SNS Topic:

7. Create EventBridge Rule:

8. Create lambda to archive data after SNS notification:

9. Create S3 Bucket to Archive the data:

10. Final Fact_Transaction table in Redshift:

Files

README.md

Latest commit

History

README.md

File metadata and controls

E-Commerce-DataPipeline

Project Overview

Architectural Diagram

Key Steps

1. Dimension Tables with Sample Records and Fact Table

2. Mock Data Generation

3. Create AWS Glue Crawlers

4. Create Glue ETL Flow

5. Create lambda to start the glue job when data is generated in S3:

6. Create SNS Topic:

7. Create EventBridge Rule:

8. Create lambda to archive data after SNS notification:

9. Create S3 Bucket to Archive the data:

10. Final Fact_Transaction table in Redshift: