create ledger transaction dag #230

cayod · 2023-10-24T17:33:37Z

This PR creates a ledger transaction dag.

… into tx-dag

sydneynotthecity

@cayod do you mind giving me an overview of the data pipeline design?

I think this is how the pipeline will run, but want to confirm:

Airflow will specify a time range, converting that to a ledger range to export
one export job will spin up a captive core, read the ledgers and dump tx data to a file
Each separate ledger will be read and converted to its own json file (lt_lake_export_task)
Each ledger will individually be loaded to BQ (lt_bq_task)

Is this correct?

dags/history_ledger_transaction_with_captive_core_dag.py

sydneynotthecity · 2023-11-08T19:41:49Z

dags/stellar_etl_airflow/build_date_lake_to_bq_task.py

+init_sentry()
+
+
+def build_data_lake_to_bq_task(dag, project, dataset, data_type, ledger_range):


We have multiple tasks that load GCS files to BQ using the GoogleCloudStorageToBigQueryOperator. Can you explain how this one differs and the thought around loading files using a different design?

This task will be used to load multiple ledger files as the organization method of the data lake will be by the number of ledgers.

sydneynotthecity · 2023-11-08T19:42:35Z

dags/history_ledger_transaction_with_captive_core_dag.py

+ledger_range = "{{ task_instance.xcom_pull(task_ids='time_task.task_id') }}"
+ledger_range = ast.literal_eval(ledger_range)
+for ledger in range(ledger_range["start"], ledger_range["end"]):
+    lt_lake_export_task = export_to_lake(dag, lt_export_task.task_id, ledger)


will this spin up individual captive cores for each ledger we export?

cayod · 2023-11-09T16:17:56Z

@cayod do you mind giving me an overview of the data pipeline design?

I think this is how the pipeline will run, but want to confirm:

Airflow will specify a time range, converting that to a ledger range to export

one export job will spin up a captive core, read the ledgers and dump tx data to a file

Each separate ledger will be read and converted to its own json file (lt_lake_export_task)

Each ledger will individually be loaded to BQ (lt_bq_task)

Is this correct?

Correct, this ledger will be loaded in a temp table to be consumed by Dbt, which will consume the js-stellar-base package and generate the history_transactions table.

lucaszanotelli · 2023-11-09T17:03:36Z

dags/stellar_etl_airflow/build_export_to_lake_task.py

+
+
+def export_to_lake(dag, export_task_id, ledger_sequence):
+    bucket_source = Variable.get("gcs_exported_data_bucket_name")


You should be using Jinja template to avoid getting variables at DAG parsing time. E.g.: {{ var.value.variable_name }} instead of Variable.get(“variable_name”).

See for more details: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code

create ledger transaction dag

b230602

cayod requested a review from a team as a code owner October 24, 2023 17:33

cayod and others added 27 commits October 24, 2023 14:59

update airflow variables

e4aedbc

update dag config

953e1fa

update dag config

131e9de

update dag config

857ef8b

update dag config

2f9bf16

Merge branch 'master' into tx-dag

98728a9

create export to lake task

f9c68cf

Merge branch 'tx-dag' of https://github.com/stellar/stellar-etl-airflow…

d7c77a7

… into tx-dag

add data lake variable

4884e9a

update destination object

abc6431

Merge branch 'master' into tx-dag

af72769

add temp table task

ed88c39

Merge branch 'tx-dag' of https://github.com/stellar/stellar-etl-airflow…

99c34e7

… into tx-dag

Merge branch 'master' into tx-dag

cc93379

fix ci test

6a4fb5a

insert json schema

78f6118

change data lake file disposition

64eb468

update ledger transaction dag

388ee6f

fix xcom pull

f4b698d

fix test

2b6ed53

update export to bq task

8177d25

fix variables conflicts

583715e

Merge branch 'master' into tx-dag

16e0e45

fix test

9c0848b

define xcom key to pull

d028e8e

fix test

9d0671a

fix test

59222f0

sydneynotthecity reviewed Nov 8, 2023

View reviewed changes

lucaszanotelli reviewed Nov 9, 2023

View reviewed changes

cayod closed this Feb 27, 2024

cayod deleted the tx-dag branch February 27, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create ledger transaction dag #230

create ledger transaction dag #230

cayod commented Oct 24, 2023

sydneynotthecity left a comment

sydneynotthecity Nov 8, 2023

cayod Nov 9, 2023

sydneynotthecity Nov 8, 2023

cayod commented Nov 9, 2023

lucaszanotelli Nov 9, 2023

		init_sentry()


		def build_data_lake_to_bq_task(dag, project, dataset, data_type, ledger_range):



		def export_to_lake(dag, export_task_id, ledger_sequence):
		bucket_source = Variable.get("gcs_exported_data_bucket_name")

create ledger transaction dag #230

create ledger transaction dag #230

Conversation

cayod commented Oct 24, 2023

sydneynotthecity left a comment

Choose a reason for hiding this comment

sydneynotthecity Nov 8, 2023

Choose a reason for hiding this comment

cayod Nov 9, 2023

Choose a reason for hiding this comment

sydneynotthecity Nov 8, 2023

Choose a reason for hiding this comment

cayod commented Nov 9, 2023

lucaszanotelli Nov 9, 2023

Choose a reason for hiding this comment