Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized DAGs parse time #258

Merged
merged 4 commits into from
Nov 28, 2023
Merged

Optimized DAGs parse time #258

merged 4 commits into from
Nov 28, 2023

Conversation

lucaszanotelli
Copy link
Contributor

Summary

This PR focuses on implementing optimizations recommended by Google Cloud to reduce Airflow DAG parse times, as outlined in their blog article.

Changes

  • Reduced Variable.get() in top level code.
  • Reorganized DAG structures and dependencies to align with the recommended practices.
  • Streamlined the import of external dependencies to enhance DAG parsing efficiency.
  • Eliminated redundant imports and organized dependencies based on DAG requirements.

Testing

I tested these modifications in our test environment and the DAG parse time saw a reduction of +60%.
image
full line: DAG parse time after changes

Next Steps

  • Use .airflowignore
  • Review paused DAGs

@lucaszanotelli lucaszanotelli requested a review from a team as a code owner November 28, 2023 15:48
Copy link
Contributor

@sydneynotthecity sydneynotthecity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Was the intent to replace all instances of Variable.get() with {{ var.value.x }}? If so, there are a quite a few instances of the original Variable.get() references that remain.

This change is quite exciting--our DAG parse times were really long and this reduction will be a big impact in making Airflow healthier.

@lucaszanotelli
Copy link
Contributor Author

Was the intent to replace all instances of Variable.get() with {{ var.value.x }}? If so, there are a quite a few instances of the original Variable.get() references that remain.

You can only replace Variable.get() with {{ var.value.x }} when you use the value inside a templateable field.
More info about Airflow templating here: https://docs.astronomer.io/learn/templating

If you're using the value at top-level code (we are) and/or inside non templateable fields (also we are), there isn't a pretty way to access the Airflow variables.

I thought about parsing the variable JSON file that we upload to the bucket but to accomplish that but I'd need to invest more time in this.

@sydneynotthecity
Copy link
Contributor

Alright, that makes sense. I don't think we need to invest more time in it unless you think we would have significant benefit. The current improvement is good enough as a first pass

@lucaszanotelli lucaszanotelli merged commit 478dc2c into master Nov 28, 2023
4 checks passed
@lucaszanotelli lucaszanotelli deleted the reduce-dags-parse-time branch November 28, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants