Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take Home Exam Solution #1

Open
wants to merge 121 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
5fdfdde
Added idea folder in gitignore
Jul 28, 2021
0241e85
Deleted data.zip to save storage
Jul 28, 2021
4d17309
Development Checklist for Continuous Integration
Jul 28, 2021
f0f41d9
Folder blueprint for development
Jul 28, 2021
2734723
Docker compose setup
Jul 28, 2021
51b2978
Added Project Package Requirements
Jul 28, 2021
df53a47
Git ignore logs folder
Jul 28, 2021
b8b5eb8
Renamed README.md to instructions.md
Jul 28, 2021
a5bcb3c
Renamed README.md to instructions.md
Jul 28, 2021
8f090a1
Added the project's manual
Jul 28, 2021
1785d63
Renamed README.md to test-instructions.md
Jul 28, 2021
b1f9065
Flake8 and Pytest configuration
Jul 28, 2021
ae24fb0
Spark package where Spark Jobs will reside, and deleted airflow confi…
Jul 28, 2021
2190b52
Added Note for docker-compose Postgresql execution, airflow web serve…
Jul 28, 2021
5e5d8b9
Adde pyspark in the project's package requirements
Jul 28, 2021
0d57403
Docker config for Spark with 3 executors
Jul 28, 2021
f8c9607
Added a note for future development
Jul 28, 2021
25170f5
Git Ignore Pycharm Config
Jul 28, 2021
c0015ed
Added TODO notes in README
Jul 28, 2021
33507d8
Fixed airflow scheduler is failing when database creation is not yet …
Jul 29, 2021
fdeea40
Added Development Notes
Jul 29, 2021
331778f
Upgraded airflow version, and reverted back pyspark to 3.0.1
Jul 29, 2021
9cbaa96
Retail dag auto formatted by black
Jul 29, 2021
5740f82
Docker configuration for Spark added in airflow Dockerfile, since we …
Jul 29, 2021
7724086
Added example dag for spark job
Jul 29, 2021
0e48e1f
Added Kafka in the Docker registry
Jul 29, 2021
e35cb40
Added log4j config for Kafka Logs
Jul 29, 2021
2f1828a
Added Kafka Python library in the package requirements
Jul 29, 2021
0247765
Added data folder in git ignore file
Jul 29, 2021
948919c
Added few more development notes for the future
Jul 29, 2021
3f84053
Added mysqlclient and mysqlconnector to support airflow's mysql hook …
Jul 29, 2021
441f848
Data generation (?)
Jul 29, 2021
a49e46c
Initialized data sync operator to test the MySQLHook installed
Jul 29, 2021
d049b17
Separate Dockerfile for MySQL service.
Jul 29, 2021
e6e9f6e
Black Auto Formatting
Jul 29, 2021
a62df65
Added development env file
Jul 29, 2021
6cc637b
Fixed Checklist Lints
Jul 29, 2021
82036b2
Data Generation (???)
Jul 29, 2021
a5f12db
Added notes for documentation later on
Jul 30, 2021
505d612
Added clean up of cache objects in make cmd
Jul 30, 2021
749185f
Remove synchronization of data folder to docker container
Jul 30, 2021
e585df7
Tbl to Stage Operator setup
Jul 30, 2021
0f91bd2
Added pandas in package requirements
Jul 30, 2021
4e02172
Renamed ORDERS table to ORDER
Jul 30, 2021
9ab60f8
TblToStage Operator implementation
Jul 30, 2021
b739847
Black Autoformatting
Jul 30, 2021
ec6bb8e
Added Upsert Query for region table
Jul 30, 2021
880841c
Renamed table names into lowercase
Jul 30, 2021
628a9c8
Renamed SQL table names into lowercase
Jul 30, 2021
1ca3572
Removed Kafka and Upgraded postgres into v13
Jul 30, 2021
35aa39d
Configured automated security check and static type checking
Jul 31, 2021
4358d2e
Added a command for pruning docker containers
Jul 31, 2021
69448e0
Replaced the usage of pandas to_sql for inserting.
Jul 31, 2021
a498131
Modularization of TblToStageOperator class
Jul 31, 2021
f2e98e0
Correction on each of items in the checklist configuration
Jul 31, 2021
c416dcc
Pytest plugin for dockerized testing
Jul 31, 2021
6b24f1a
Removed variables in conftest.py
Jul 31, 2021
f2bc149
Added unit tests for general rules for each DAGs
Jul 31, 2021
7f5bd13
Added unit tests for the retail_dag DAG
Jul 31, 2021
9ae3bc2
Removed unused variables in dag integrity test suite
Jul 31, 2021
3ec82ec
Black Autoformatting
Jul 31, 2021
59ca313
Sample dag for parallel spark job tasks
Jul 31, 2021
688b2c0
New Integration Tests for TblToStageOperator
Jul 31, 2021
6388788
Added pytest-docker in package requirements for dockerized testing
Jul 31, 2021
1009b23
Added documentation for common errors
Jul 31, 2021
b82ff26
Enhanced the static typing for TblToStageOperator.execute() method
Jul 31, 2021
4649e78
Removed Kafka Package Dependency
Jul 31, 2021
987057b
Added more notes for documentation purposes.
Jul 31, 2021
7db71f7
Adjustment on test for dag integrity due to changes in dag definition
Jul 31, 2021
8218ad5
Added a getter function to get upsert queries for a given table name.
Jul 31, 2021
6f78f01
Clean up retail_dag.py
Jul 31, 2021
bd81a43
Wrote an instruction to mitigate issues when running tests
Jul 31, 2021
4d03d41
Added a test for idempotency, also DRYed the code for cleaner tests.
Jul 31, 2021
dfa255a
Added INSERT queries for data insertion to MySQL
Aug 1, 2021
44ea6a2
Added execution time for query table filtering
Aug 1, 2021
18fef67
Modification of typo errors in SupplierHeader model
Aug 1, 2021
cc74896
Black Autoformatting
Aug 1, 2021
bb9a2fc
Improved the ddl operations by adding if not exists upon creation.
Aug 1, 2021
c9c532a
Restructured DAG connections
Aug 1, 2021
f5623bc
Black Autoformatting
Aug 1, 2021
ba99845
Fixed test errors for dag test suites.
Aug 1, 2021
7c18b63
Reduced verbosity of pytest
Aug 1, 2021
3818765
Fixed unit test errors for TblToStageOperator
Aug 1, 2021
38baba6
Merge pull request #1 from 1byte-yoda/mark/data-ingester-pipeline
1byte-yoda Aug 1, 2021
5107390
Merge pull request #1 from 1byte-yoda/mark/data-ingester-pipeline
1byte-yoda Aug 1, 2021
493e072
Modified the commit message for the merged branch.
Aug 1, 2021
9dde073
Removed PySpark from the project
Aug 1, 2021
12c4229
Removed spark volume binding in docker
Aug 1, 2021
f6096fd
Added PostgreSQL requirements for development
Aug 1, 2021
810700b
Health check for Postgres DW
Aug 1, 2021
70304ca
Optional return for get_upsert_query method
Aug 1, 2021
bd819f4
Removed unused files / irrelevant to project
Aug 2, 2021
6913d84
Added a Step-by-Step guide to run the project.
Aug 2, 2021
e0b8042
Postgres to Datawarehouse Operator Implementation
Aug 2, 2021
5723147
Clean up and added some TODO notes for later development use.
Aug 2, 2021
d46f64e
Added SQL Analytical Queries that will serve as an anwer for the Repo…
Aug 2, 2021
dc3fdb4
Initial test setup for postgres_dw_operator
Aug 2, 2021
b30c878
Decoupled environment variables for docker-compose services
Aug 2, 2021
c46a3c8
README minimal typo error
Aug 2, 2021
b45cef1
Removed extra/unuseful texts in README.md
Aug 2, 2021
772d2c4
Cleanup TODO lists
Aug 2, 2021
aab3bfd
Added Answers for each exam questions
Aug 2, 2021
d1eaf54
Removed some unnecessary details in README and ANSWERS file
Aug 2, 2021
1b1b26f
Removed docker environment variables in git ignored files
Aug 2, 2021
cd554cb
Added a link in the ANSWERS.md file for SQL Queries for Dashboard Rep…
Aug 2, 2021
464be36
Format SQL query for better readability
Aug 2, 2021
f20024c
Fixed formatting issues in ANSWERS.md
Aug 2, 2021
423aca2
Fixed some line indentions
Aug 2, 2021
e146119
Fixed Checklist Errors
Aug 2, 2021
86c1b5e
Fixed typo error in dashboard queries
Aug 2, 2021
5223044
Fixed some typo issue in ANSWERS.md file
Aug 2, 2021
1ba1cbb
Added more make commands in README.md file
Aug 2, 2021
30dcb30
Improved SQL Query variable naming
Aug 2, 2021
ac36a35
Improvements in the Table DDL file formatting
Aug 2, 2021
1634b49
Added surrogate key for each dimension tables and also adjusted the f…
Aug 4, 2021
cb6eff7
Fixed flake8 lints
Aug 4, 2021
6ec0182
Fixed Checklist lints
Aug 4, 2021
1388bde
Improved star schema by adding data types
Aug 4, 2021
492e8f0
Added key notations for start schema for better readability
Aug 4, 2021
eefe524
Optimized dag batch run to 15,000 rows per chunks
Aug 4, 2021
6d098da
Fixed checklist errors and optimized sql queries for auto incremented…
Aug 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adjustment on test for dag integrity due to changes in dag definition
Mark committed Jul 31, 2021
commit 7db71f76cbc4fd79bac885f17b7d18ed414f0d70
2 changes: 1 addition & 1 deletion tests/retail_etl/dags/dag_integrity_test.py
Original file line number Diff line number Diff line change
@@ -37,6 +37,6 @@ def test_dag_default_configs(dag_file: str):
emails = dag.default_args.get("email", [])
num_retries = dag.default_args.get("retries", None)
retry_delay_sec = dag.default_args.get("retry_delay", None)
assert emails == ["airflow@airflow.com"]
assert emails == ["1byteyoda@makr.dev"]
assert num_retries is not None
assert retry_delay_sec is not None
2 changes: 1 addition & 1 deletion tests/retail_etl/dags/retail_dag_tasks_definition_test.py
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@


class RetailDagTaskDefTest:
EXPECTED_TASKS_COUNT = 3
EXPECTED_TASKS_COUNT = 9
DAG_ID = "retail_dag"
EXPECTED_TASKS = ["begin_execution", "region_tbl_to_staging_db", "end_execution"]