Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-83 run_id logic when no logical date #46398

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion airflow/api/common/trigger_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,10 @@ def _trigger_dag(

data_interval = dag.timetable.infer_manual_data_interval(run_after=coerced_logical_date)
run_id = run_id or dag.timetable.generate_run_id(
run_type=DagRunType.MANUAL, logical_date=coerced_logical_date, data_interval=data_interval
run_type=DagRunType.MANUAL,
logical_date=coerced_logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Comment on lines +87 to +90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (before 3.0) I want to reduce the arguments here to just take a DagRunInfo, but that can be done separately instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the doc, when logical date is null, then data interval end should be null. so it would not make sense to use data interval end as the run_after date. thoughts @uranusjr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you’re right. This should do coerced_logical_date or timezone.utcnow() (as currently implemented, coerced_logical_date can never be None, but it will be when everything is finished).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this change as a part of this PR or have a separate PR?

)

# This intentionally does not use 'session' in the current scope because it
Expand Down
3 changes: 2 additions & 1 deletion airflow/api_connexion/schemas/dag_run_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ def autogenerate(self, data, **kwargs):
if "dag_run_id" not in data:
try:
data["dag_run_id"] = DagRun.generate_run_id(
DagRunType.MANUAL, timezone.parse(data["logical_date"])
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
)
Comment on lines 90 to 93
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pass run_after here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there is no change here, just a new line?

Copy link
Member

@uranusjr uranusjr Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and I think that’s wrong, we should change the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required here? as run_after was not added in this as part of the PR that added the new field run_after ( #46195 ).
I assumed this was going to be deprecated so wasn't added here.

except (ParserError, TypeError) as err:
raise BadRequest("Incorrect datetime argument", detail=str(err))
Expand Down
1 change: 1 addition & 0 deletions airflow/api_fastapi/core_api/routes/public/dag_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ def trigger_dag_run(
run_type=DagRunType.MANUAL,
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, if logical date is null, then we won't generally have a data interval.... correct @uranusjr ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_interval would be null as well.

)

dag_run = dag.create_dagrun(
Expand Down
2 changes: 2 additions & 0 deletions airflow/jobs/scheduler_job_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -1281,6 +1281,7 @@ def _create_dag_runs(self, dag_models: Collection[DagModel], session: Session) -
run_type=DagRunType.SCHEDULED,
logical_date=dag_model.next_dagrun,
data_interval=data_interval,
run_after=dag_model.next_dagrun_create_after,
),
logical_date=dag_model.next_dagrun,
data_interval=data_interval,
Expand Down Expand Up @@ -1391,6 +1392,7 @@ def _create_dag_runs_asset_triggered(
run_type=DagRunType.ASSET_TRIGGERED,
logical_date=logical_date,
data_interval=data_interval,
run_after=max(logical_dates.values()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a task to make asset-triggered runs have None logical_date instead, so we’ll need to change this again soon. This is good enough for now.

session=session,
events=asset_events,
),
Expand Down
1 change: 1 addition & 0 deletions airflow/models/backfill.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ def _create_backfill_dag_run(
run_type=DagRunType.BACKFILL_JOB,
logical_date=info.logical_date,
data_interval=info.data_interval,
run_after=info.run_after,
),
logical_date=info.logical_date,
data_interval=info.data_interval,
Expand Down
6 changes: 5 additions & 1 deletion airflow/models/baseoperator.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,11 @@ def run(
# This is _mostly_ only used in tests
dr = DagRun(
dag_id=self.dag_id,
run_id=DagRun.generate_run_id(DagRunType.MANUAL, info.logical_date),
run_id=DagRun.generate_run_id(
DagRunType.MANUAL,
info.logical_date,
run_after=info.run_after,
),
run_type=DagRunType.MANUAL,
logical_date=info.logical_date,
data_interval=info.data_interval,
Expand Down
2 changes: 1 addition & 1 deletion airflow/models/dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -1638,7 +1638,7 @@ def add_logger_if_needed(ti: TaskInstance):
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
run_id=DagRun.generate_run_id(DagRunType.MANUAL, logical_date),
run_id=DagRun.generate_run_id(DagRunType.MANUAL, logical_date, run_after=data_interval.end),
session=session,
conf=run_conf,
triggered_by=DagRunTriggeredByType.TEST,
Expand Down
6 changes: 4 additions & 2 deletions airflow/models/dagrun.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,10 +621,12 @@ def find_duplicate(cls, dag_id: str, run_id: str, *, session: Session = NEW_SESS
return session.scalars(select(cls).where(cls.dag_id == dag_id, cls.run_id == run_id)).one_or_none()

@staticmethod
def generate_run_id(run_type: DagRunType, logical_date: datetime) -> str:
def generate_run_id(
run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None
) -> str:
"""Generate Run ID based on Run Type and logical Date."""
# _Ensure_ run_type is a DagRunType, not just a string from user code
return DagRunType(run_type).generate_run_id(logical_date)
return DagRunType(run_type).generate_run_id(logical_date, run_after)

@staticmethod
@provide_session
Expand Down
5 changes: 3 additions & 2 deletions airflow/timetables/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,8 +281,9 @@ def generate_run_id(
self,
*,
run_type: DagRunType,
logical_date: DateTime,
logical_date: DateTime | None,
data_interval: DataInterval | None,
run_after: DateTime | None = None,
**extra,
) -> str:
return run_type.generate_run_id(logical_date)
return run_type.generate_run_id(logical_date, run_after)
5 changes: 3 additions & 2 deletions airflow/timetables/simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,15 +186,16 @@ def generate_run_id(
self,
*,
run_type: DagRunType,
logical_date: DateTime,
logical_date: DateTime | None,
data_interval: DataInterval | None,
run_after: DateTime | None = None,
session: Session | None = None,
events: Collection[AssetEvent] | None = None,
**extra,
) -> str:
from airflow.models.dagrun import DagRun

return DagRun.generate_run_id(run_type, logical_date)
return DagRun.generate_run_id(run_type, logical_date, run_after)

def data_interval_for_events(
self,
Expand Down
7 changes: 6 additions & 1 deletion airflow/utils/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from typing import TYPE_CHECKING, TypedDict

import airflow.sdk.definitions._internal.types
from airflow.utils.strings import get_random_string

if TYPE_CHECKING:
from datetime import datetime
Expand All @@ -42,7 +43,11 @@ class DagRunType(str, enum.Enum):
def __str__(self) -> str:
return self.value

def generate_run_id(self, logical_date: datetime) -> str:
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
if logical_date is None:
if run_after is None:
raise ValueError("run_after cannot be None")
return run_after + get_random_string()
return f"{self}__{logical_date.isoformat()}"
Comment on lines -45 to 51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_after could be None as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can’t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Here we need to know if logical_date is None or not before generating a random string that gets appended to run_after.
If we take just one argument, it wouldn't know when to append the random string. Are you suggesting to move this logic into the callers(Timetable.generate_run_id and DagRun.generate_run_id)?
I went with the current implementation to avoid duplicate code/multiple function calls.


@staticmethod
Expand Down
1 change: 1 addition & 0 deletions airflow/www/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -2230,6 +2230,7 @@ def trigger(self, dag_id: str, session: Session = NEW_SESSION):
logical_date=logical_date,
data_interval=data_interval,
run_type=DagRunType.MANUAL,
run_after=data_interval.end,
)

try:
Expand Down
2 changes: 1 addition & 1 deletion tests/models/test_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -3121,7 +3121,7 @@ def test_get_asset_triggered_next_run_info_with_unresolved_asset_alias(dag_maker
)
def test_create_dagrun_disallow_manual_to_use_automated_run_id(run_id_type: DagRunType) -> None:
dag = DAG(dag_id="test", start_date=DEFAULT_DATE, schedule="@daily")
run_id = run_id_type.generate_run_id(DEFAULT_DATE)
run_id = run_id_type.generate_run_id(logical_date=DEFAULT_DATE, run_after=DEFAULT_DATE)

with pytest.raises(ValueError) as ctx:
dag.create_dagrun(
Expand Down
Loading