Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HUBBLE 444 - Refactor Elementary monitoring to run every 30 min #379

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 71 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,68 @@ This repository contains the Airflow DAGs for the [Stellar ETL](https://github.c

## **Table of Contents**

- [stellar-etl-airflow](#stellar-etl-airflow)
- [**Table of Contents**](#table-of-contents)
- [Installation and Setup](#installation-and-setup)
- [Google Cloud Platform](#google-cloud-platform)
- [Cloud Composer](#cloud-composer)
- [Airflow Variables Explanation](#airflow-variables-explanation)
- [Normal Variables](#normal-variables)
- [Kubernetes Specific Variables](#kubernetes-specific-variables)
- [**Google Cloud Platform**](#google-cloud-platform)
- [**Setup the Cloud SDK**](#setup-the-cloud-sdk)
- [**Create Google Project**](#create-google-project)
- [**Create BigQuery Dataset**](#create-bigquery-dataset)
- [**Create Google Cloud Storage bucket**](#create-google-cloud-storage-bucket)
- [**Cloud Composer**](#cloud-composer)
- [**Create Google Cloud Composer environment**](#create-google-cloud-composer-environment)
- [**Upload DAGs and Schemas to Cloud Composer**](#upload-dags-and-schemas-to-cloud-composer)
- [**Add Service Account Key**](#add-service-account-key)
- [**Add private docker registry auth secrets**](#add-private-docker-registry-auth-secrets)
- [**Create Namespace for ETL Tasks (Optional)**](#create-namespace-for-etl-tasks-optional)
- [**Authenticating Tasks in an Autopilot-Managed Environment**](#authenticating-tasks-in-an-autopilot-managed-environment)
- [**Modify Kubernetes Config for Airflow Workers**](#modify-kubernetes-config-for-airflow-workers)
- [**Add Airflow Variables and Connections**](#add-airflow-variables-and-connections)
- [**Airflow Variables Explanation**](#airflow-variables-explanation)
- [**Normal Variables**](#normal-variables)
- [**DBT Variables**](#dbt-variables)
- [**Kubernetes-Specific Variables**](#kubernetes-specific-variables)
- [Execution Procedures](#execution-procedures)
- [Starting Up](#starting-up)
- [Handling Failures](#handling-failures)
- [Clearing Failures](#clearing-failures)
- [**Starting Up**](#starting-up)
- [**Handling Failures**](#handling-failures)
- [**Clearing Failures**](#clearing-failures)
- [Understanding the Setup](#understanding-the-setup)
- [DAG Diagrams](#dag-diagrams)
- [Public DAGs](#public-dags)
- [History Table Export DAG](#history-table-export-dag)
- [State Table Export DAG](#state-table-export-dag)
- [DBT Enriched Base Tables DAG](#dbt-enriched-base-tables-dag)
- [SDF Internal DAGs](#sdf-internal-dags)
- [Sandbox update DAG](#sandbox-update-dag)
- [Cleanup metadata DAG](#cleanup-metadata-dag)
- [Partner Pipeline DAG](#partner-pipeline-dag)
- [DBT SDF Marts DAG](#dbt-sdf-marts-dag)
- [Daily Euro OHLC DAG](#daily-euro-ohlc-dag)
- [Audit Log DAG](#audit-log-dag)
- [Task Explanations](#task-explanations)
- [build_time_task](#build_time_task)
- [build_export_task](#build_export_task)
- [build_gcs_to_bq_task](#build_gcs_to_bq_task)
- [build_apply_gcs_changes_to_bq_task](#build_apply_gcs_changes_to_bq_task)
- [build_batch_stats](#build_batch_stats)
- [bq_insert_job_task](#bq_insert_job_task)
- [cross_dependency_task](#cross_dependency_task)
- [build_delete_data_task](#build_delete_data_task)
- [build_dbt_task](#build_dbt_task)
- [build_elementary_slack_alert_task](#build_elementary_slack_alert_task)
- [**DAG Diagrams**](#dag-diagrams)
- [**Public DAGs**](#public-dags)
- [**History Table Export DAG**](#history-table-export-dag)
- [**State Table Export DAG**](#state-table-export-dag)
- [**DBT Enriched Base Tables DAG**](#dbt-enriched-base-tables-dag)
- [**SDF Internal DAGs**](#sdf-internal-dags)
- [**Sandbox DAGs**](#sandbox-dags)
- [**Sandbox Create DAG**](#sandbox-create-dag)
- [**Sandbox Update DAG**](#sandbox-update-dag)
- [**Cleanup Metadata DAG**](#cleanup-metadata-dag)
- [**Partner Pipeline DAG**](#partner-pipeline-dag)
- [**DBT Stellar Marts DAG**](#dbt-stellar-marts-dag)
- [**DBT Data Quality Alerts DAG**](#dbt-data-quality-alerts-dag)
- [**Daily Euro OHLC DAG**](#daily-euro-ohlc-dag)
- [**Audit Log DAG**](#audit-log-dag)
- [**Task Explanations**](#task-explanations)
- [**build_time_task**](#build_time_task)
- [**build_export_task**](#build_export_task)
- [**build_gcs_to_bq_task**](#build_gcs_to_bq_task)
- [**build_apply_gcs_changes_to_bq_task**](#build_apply_gcs_changes_to_bq_task)
- [**build_batch_stats**](#build_batch_stats)
- [**bq_insert_job_task**](#bq_insert_job_task)
- [**cross_dependency_task**](#cross_dependency_task)
- [**build_delete_data_task**](#build_delete_data_task)
- [**build_copy_table_task**](#build_copy_table_task)
- [**build_coingecko_api_to_gcs_task**](#build_coingecko_api_to_gcs_task)
- [**build_check_execution_date_task**](#build_check_execution_date_task)
- [**build_dbt_task**](#build_dbt_task)
- [**build_elementary_slack_alert_task**](#build_elementary_slack_alert_task)
- [Further Development](#further-development)
- [Extensions](#extensions)
- [Pre-commit Git hook scripts](#pre-commit-git-hook-scripts)
- [Adding New DAGs](#adding-new-dags)
- [Adding tasks to existing DAGs](#adding-tasks-to-existing-dags)
- [Adding New Tasks](#adding-new-tasks)
- [Testing Changes](#testing-changes)
- [**Extensions**](#extensions)
- [**Pre-commit Git hook scripts**](#pre-commit-git-hook-scripts)
- [**Adding New DAGs**](#adding-new-dags)
- [**Adding tasks to existing DAGs**](#adding-tasks-to-existing-dags)
- [**Adding New Tasks**](#adding-new-tasks)
- [**Testing Changes**](#testing-changes)

<br>

Expand Down Expand Up @@ -534,7 +555,8 @@ This section contains information about the Airflow setup. It includes our DAG d
- [Sandbox Update DAG](#sandbox-update-dag)
- [Cleanup Metadata DAG](#cleanup-metadata-dag)
- [Partner Pipeline DAG](#partner-pipeline-dag)
- [DBT SDF Marts DAG](#dbt-sdf-marts-dag)
- [DBT Stellar Marts DAG](#dbt-sdf-marts-dag)
- [DBT Data Quality Alerts DAG](#dbt-data-quality-alerts-dag)
- [Daily Euro OHLC DAG](#daily-euro-ohlc-dag)
- [Audit Log DAG](#audit-log-dag)
- [Task Explanations](#task-explanations)
Expand Down Expand Up @@ -586,7 +608,6 @@ This section contains information about the Airflow setup. It includes our DAG d
- Creates the DBT staging views for models
- Updates the enriched_history_operations table
- Updates the current state tables
- If found any warnings, it sends a Slack notification about what table has a warning, the time and date it ocurred.

![dbt_enriched_base_tables DAG](documentation/images/dbt_enriched_base_tables.png)

Expand Down Expand Up @@ -625,14 +646,22 @@ This section contains information about the Airflow setup. It includes our DAG d

- Used by SDF for internal partnership pipelines

#### **DBT SDF Marts DAG**
#### **DBT Stellar Marts DAG**

[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/dbt_sdf_marts_dag.py)
[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/dbt_stellar_marts_dag.py)

- Updates the DBT mart tables daily

![dbt_stellar_marts DAG](documentation/images/dbt_stellar_marts.png)

#### **DBT Data Quality Alerts DAG**

[This DAG](https://github.com/stellar/stellar-etl-airflow/blob/master/dags/dbt_data_quality_alerts_dag.py)

- Runs DBT tests tagged as `singular_test` in `stellar-dbt` and elementary alerts at a half-hourly cadence
- If found any warnings, it sends a Slack notification about what table has a warning, the time and date it ocurred.

![dbt_sdf_marts DAG](documentation/images/dbt_sdf_marts.png)
![dbt_data_quality_alerts DAG](documentation/images/dbt_data_quality_alerts.png)

#### **Daily Euro OHLC DAG**

Expand Down
5 changes: 3 additions & 2 deletions airflow_variables_dev.json
Original file line number Diff line number Diff line change
Expand Up @@ -251,8 +251,8 @@
"build_time_task": 480,
"build_export_task": 840,
"enriched_history_operations": 780,
"enriched_history_operations_with_exclude": 780,
"current_state": 720,
"elementary_dbt_enriched_base_tables": 1080,
"ohlc": 720,
"liquidity_pool_trade_volume": 1140,
"mgi": 660,
Expand All @@ -267,7 +267,8 @@
"history_assets": 720,
"soroban": 720,
"snapshot_state": 600,
"elementary_dbt_stellar_marts": 1620,
"singular_test": 600,
"elementary_dbt_data_quality": 1620,
"create_sandbox": 2400,
"update_sandbox": 60,
"cleanup_metadata": 60,
Expand Down
5 changes: 3 additions & 2 deletions airflow_variables_prod.json
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,8 @@
"build_time_task": 300,
"build_export_task": 600,
"enriched_history_operations": 1800,
"enriched_history_operations_with_exclude": 1800,
"current_state": 1200,
"elementary_dbt_enriched_base_tables": 2100,
"ohlc": 960,
"liquidity_pool_trade_volume": 1200,
"mgi": 1020,
Expand All @@ -262,7 +262,8 @@
"history_assets": 360,
"soroban": 420,
"snapshot_state": 840,
"elementary_dbt_stellar_marts": 1560,
"singular_test": 840,
"elementary_dbt_data_quality": 1560,
"create_sandbox": 1020,
"update_sandbox": 5460,
"cleanup_metadata": 60,
Expand Down
42 changes: 42 additions & 0 deletions dags/dbt_data_quality_alerts_dag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from datetime import datetime

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from kubernetes.client import models as k8s
from stellar_etl_airflow.build_dbt_task import dbt_task
from stellar_etl_airflow.build_elementary_slack_alert_task import elementary_task
from stellar_etl_airflow.default import (
alert_sla_miss,
get_default_dag_args,
init_sentry,
)

init_sentry()

dag = DAG(
"dbt_data_quality_alerts",
default_args=get_default_dag_args(),
start_date=datetime(2024, 6, 11, 0, 0),
description="This DAG runs dbt tests and Elementary alerts at a half-hourly cadence",
schedule="*/15,*/45 * * * *", # Runs every 15th minute and every 45th minute
user_defined_filters={
"container_resources": lambda s: k8s.V1ResourceRequirements(requests=s),
},
max_active_runs=1,
catchup=False,
tags=["dbt-data-quality", "dbt-elementary-alerts"],
sla_miss_callback=alert_sla_miss,
)


# DBT tests to run
dbt_unit_tests = dbt_task(
dag,
command_type="test",
tag="singular_test",
)
singular_tests_elementary_alerts = elementary_task(dag, "dbt_data_quality")
start_tests = EmptyOperator(task_id="start_tests_task")

# DAG task graph
start_tests >> dbt_unit_tests >> singular_tests_elementary_alerts
10 changes: 3 additions & 7 deletions dags/dbt_enriched_base_tables_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,12 @@
wait_on_state_table = build_cross_deps(dag, "wait_on_state_table", "state_table_export")

# DBT models to run
enriched_history_operations_task = dbt_task(dag, tag="enriched_history_operations")
enriched_history_operations_task = dbt_task(
dag, tag="enriched_history_operations", excluded="singular_test"
)
current_state_task = dbt_task(dag, tag="current_state")

elementary = elementary_task(dag, "dbt_enriched_base_tables")

# DAG task graph
wait_on_history_table >> enriched_history_operations_task

wait_on_state_table >> current_state_task

enriched_history_operations_task >> elementary

current_state_task >> elementary
16 changes: 0 additions & 16 deletions dags/dbt_stellar_marts_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,6 @@
soroban = dbt_task(dag, tag="soroban")
snapshot_state = dbt_task(dag, tag="snapshot_state")

elementary = elementary_task(dag, "dbt_stellar_marts")

# DAG task graph
wait_on_dbt_enriched_base_tables >> ohlc_task >> liquidity_pool_trade_volume_task

Expand All @@ -73,17 +71,3 @@
wait_on_dbt_enriched_base_tables >> history_assets
wait_on_dbt_enriched_base_tables >> soroban
wait_on_dbt_enriched_base_tables >> snapshot_state

mgi_task >> elementary
liquidity_providers_task >> elementary
liquidity_pools_values_task >> elementary
liquidity_pools_value_history_task >> elementary
trade_agg_task >> elementary
fee_stats_agg_task >> elementary
asset_stats_agg_task >> elementary
network_stats_agg_task >> elementary
partnership_assets_task >> elementary
history_assets >> elementary
soroban >> elementary
liquidity_pool_trade_volume_task >> elementary
snapshot_state >> elementary
10 changes: 10 additions & 0 deletions dags/stellar_etl_airflow/build_dbt_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ def dbt_task(
flag="select",
operator="",
command_type="build",
excluded=None,
resource_cfg="default",
):
namespace = conf.get("kubernetes", "NAMESPACE")
Expand Down Expand Up @@ -97,6 +98,15 @@ def dbt_task(
args.append(",".join(models))
else:
args.append(models[0])
# --exclude selector added for necessary use cases
# Argument should be string or list of strings
if excluded:
task_name = f"{task_name}_with_exclude"
args.append("--exclude")
if isinstance(excluded, list):
args.append(" ".join(excluded))
else:
args.append(excluded)

if Variable.get("dbt_full_refresh_models", deserialize_json=True).get(task_name):
args.append("--full-refresh")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@

from airflow.configuration import conf
from airflow.models import Variable
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (
KubernetesPodOperator,
)
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
from kubernetes import client, config
from kubernetes.client import models as k8s
from stellar_etl_airflow.default import alert_after_max_retries
Expand Down Expand Up @@ -93,4 +91,5 @@ def elementary_task(
f"elementary_{task_name}"
]
),
trigger_rule="all_done",
)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified documentation/images/dbt_enriched_base_tables.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed documentation/images/dbt_sdf_marts.png
Binary file not shown.
Binary file added documentation/images/dbt_stellar_marts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading