Skip to content

patcha-ranat/kde-zoomcamp

Repository files navigation

Learning Materials for Data Engineering Zoomcamp

DataTalksClub - data-engineering-zoomcamp

Topics

  • Week 1: Basics and setup (Done)

    • Note for Week 1 (Highly Recommended)
    • Learn basic docker and docker compose
    • Learn basic Terraform
    • Setup a VM on the Google Cloud Platform
      • gcloud cli, IAM&Admin, service account
    • GitBash

    Note: Credentials or sensitive information already is hidden.

  • Week 2: Workflow Orchestration (Done)

    • Note for Week 2
    • Prefect (2023)
    • Airflow (2022)
    • setting up a data pipeline, ETLs using Google Cloud Storage and Local PostgreSQL (containerized)
  • Week 3: Data Warehouse (Done)

  • Week 4: Analytics Engineering (Done)

  • Week 5: Batch Processing (Done)

    • Note for week 5
    • Example of Using Pyspark api for data transformation such as initilize session, defining schema, udf, read/write, sql, and join
    • Anatomy of Spark and how it work underneath such as, reshuffling, repartitioning, broadcasting
    • Submitting spark job to a cluster with parameterized script by argparse to be able to parse parameter to use within the script
  • Week 6: Stream Processing (In progress)

About

Learning Materials for Zoomcamp (DE/MLOps) and etc

Topics

Resources

Stars

Watchers

Forks