Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Support for Expanding Task Groups in dag-factory #304

Open
su-four opened this issue Dec 1, 2024 · 1 comment
Open

[Feature] Add Support for Expanding Task Groups in dag-factory #304

su-four opened this issue Dec 1, 2024 · 1 comment
Labels
enhancement New feature or request triage-needed

Comments

@su-four
Copy link

su-four commented Dec 1, 2024

Description

Airflow supports dynamic task mapping within task groups, allowing all tasks inside the group to be expanded against the same inputs. This feature, referred to as "expanding task groups," introduces depth-first execution for mapped tasks, enabling more logical task separation, fine-grained dependency rules, and efficient resource allocation.

Airflow Example:

@task_group  
def file_transforms(filename):  
    converted = convert_to_yaml(filename)  
    return replace_defaults(converted)  

file_transforms.expand(filename=["data1.json", "data2.json"])  

In this example, the task group file_transforms is expanded for each filename. Inside the group:

The convert_to_yaml task will have two instances: one for "data1.json" and another for "data2.json".
The replace_defaults task will also have two instances, each processing the output of its corresponding convert_to_yaml task.

With task groups, each task instance only depends on its relevant upstream task, allowing tasks to execute as soon as their specific input is ready. This ensures that, for instance, the first replace_defaults task can run before the second convert_to_yaml completes, as it only depends on the first convert_to_yaml task.

Use case/motivation

Benefits:

Expanding task groups provides several advantages over expanding individual tasks:

  1. Depth-First Execution: Each task within the group processes its specific inputs independently, ensuring efficient execution and allowing downstream tasks to begin as soon as upstream tasks complete for their respective inputs.
  2. Logical Task Separation: Expanding the task group together ensures that dependencies are scoped logically within the group, avoiding unintended cross-task interference.
  3. Improved Resource Allocation: Resources can be allocated more accurately since task dependencies within the group are isolated, preventing delays caused by tasks waiting unnecessarily.

Use Case:

This feature is especially useful for workflows involving multiple dependent transformations on the same set of inputs. For instance, file processing pipelines, where each file undergoes a series of transformations, can benefit significantly from this depth-first execution model.

@su-four su-four added enhancement New feature or request triage-needed labels Dec 1, 2024
@tatiana
Copy link
Collaborator

tatiana commented Dec 4, 2024

This looks like a great addition, @su-four ! Would you be willing to contribute to this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage-needed
Projects
None yet
Development

No branches or pull requests

2 participants