Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using PostgreSQL ROW_NUMBER, in combination with getting the real dag limit (max_active_tasks - running_tasks) we are able to query for the task_instances that can actually be executed, instead of getting all of them and then filtering those who can't be executed.
This PR improves the way airflow scheduler takes batches of TIs to execute, so now we won't be seeing this message:
Not executing %s since the number of tasks running or queued from DAG %s is >= to the DAG's max_active_tasks limit of %s
.This issue gets more visible when we run a DAG with many tasks (close to the max_tis_per_query param), and with a high priority weight. This DAG will take all the scheduler slots most of the time, even when only 1 task from it can be executed, leading to having many task stuck in scheduled with lower priority.