Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-83 run_id logic when no logical date #46398

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion airflow/api/common/trigger_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,10 @@ def _trigger_dag(

data_interval = dag.timetable.infer_manual_data_interval(run_after=coerced_logical_date)
run_id = run_id or dag.timetable.generate_run_id(
run_type=DagRunType.MANUAL, logical_date=coerced_logical_date, data_interval=data_interval
run_type=DagRunType.MANUAL,
logical_date=coerced_logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Comment on lines +87 to +90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (before 3.0) I want to reduce the arguments here to just take a DagRunInfo, but that can be done separately instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the doc, when logical date is null, then data interval end should be null. so it would not make sense to use data interval end as the run_after date. thoughts @uranusjr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you’re right. This should do coerced_logical_date or timezone.utcnow() (as currently implemented, coerced_logical_date can never be None, but it will be when everything is finished).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this change as a part of this PR or have a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to do it in this PR 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timezone.utcnow() seems to be a reasonable default for run_after

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

)

# This intentionally does not use 'session' in the current scope because it
Expand Down
3 changes: 2 additions & 1 deletion airflow/api_connexion/schemas/dag_run_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ def autogenerate(self, data, **kwargs):
if "dag_run_id" not in data:
try:
data["dag_run_id"] = DagRun.generate_run_id(
DagRunType.MANUAL, timezone.parse(data["logical_date"])
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
Comment on lines +91 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DagRunType.MANUAL,
timezone.parse(data["logical_date"]),
run_type=DagRunType.MANUAL,
logical_date=timezone.parse(data["logical_date"]),

)
Comment on lines 90 to 93
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pass run_after here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there is no change here, just a new line?

Copy link
Member

@uranusjr uranusjr Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and I think that’s wrong, we should change the logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required here? as run_after was not added in this as part of the PR that added the new field run_after ( #46195 ).
I assumed this was going to be deprecated so wasn't added here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

except (ParserError, TypeError) as err:
raise BadRequest("Incorrect datetime argument", detail=str(err))
Expand Down
1 change: 1 addition & 0 deletions airflow/api_fastapi/core_api/routes/public/dag_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ def trigger_dag_run(
run_type=DagRunType.MANUAL,
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, if logical date is null, then we won't generally have a data interval.... correct @uranusjr ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_interval would be null as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utcnow for this as well probably?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

)

dag_run = dag.create_dagrun(
Expand Down
2 changes: 2 additions & 0 deletions airflow/jobs/scheduler_job_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -1281,6 +1281,7 @@ def _create_dag_runs(self, dag_models: Collection[DagModel], session: Session) -
run_type=DagRunType.SCHEDULED,
logical_date=dag_model.next_dagrun,
data_interval=data_interval,
run_after=dag_model.next_dagrun_create_after,
),
logical_date=dag_model.next_dagrun,
data_interval=data_interval,
Expand Down Expand Up @@ -1391,6 +1392,7 @@ def _create_dag_runs_asset_triggered(
run_type=DagRunType.ASSET_TRIGGERED,
logical_date=logical_date,
data_interval=data_interval,
run_after=max(logical_dates.values()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a task to make asset-triggered runs have None logical_date instead, so we’ll need to change this again soon. This is good enough for now.

session=session,
events=asset_events,
),
Expand Down
1 change: 1 addition & 0 deletions airflow/models/backfill.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ def _create_backfill_dag_run(
run_type=DagRunType.BACKFILL_JOB,
logical_date=info.logical_date,
data_interval=info.data_interval,
run_after=info.run_after,
),
logical_date=info.logical_date,
data_interval=info.data_interval,
Expand Down
6 changes: 5 additions & 1 deletion airflow/models/baseoperator.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,11 @@ def run(
# This is _mostly_ only used in tests
dr = DagRun(
dag_id=self.dag_id,
run_id=DagRun.generate_run_id(DagRunType.MANUAL, info.logical_date),
run_id=DagRun.generate_run_id(
DagRunType.MANUAL,
info.logical_date,
Comment on lines +632 to +633
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DagRunType.MANUAL,
info.logical_date,
run_type=DagRunType.MANUAL,
logical_date=info.logical_date,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

run_after=info.run_after,
),
run_type=DagRunType.MANUAL,
logical_date=info.logical_date,
data_interval=info.data_interval,
Expand Down
2 changes: 1 addition & 1 deletion airflow/models/dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -1638,7 +1638,7 @@ def add_logger_if_needed(ti: TaskInstance):
logical_date=logical_date,
data_interval=data_interval,
run_after=data_interval.end,
run_id=DagRun.generate_run_id(DagRunType.MANUAL, logical_date),
run_id=DagRun.generate_run_id(DagRunType.MANUAL, logical_date, run_after=data_interval.end),
session=session,
conf=run_conf,
triggered_by=DagRunTriggeredByType.TEST,
Expand Down
6 changes: 4 additions & 2 deletions airflow/models/dagrun.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,10 +621,12 @@ def find_duplicate(cls, dag_id: str, run_id: str, *, session: Session = NEW_SESS
return session.scalars(select(cls).where(cls.dag_id == dag_id, cls.run_id == run_id)).one_or_none()

@staticmethod
def generate_run_id(run_type: DagRunType, logical_date: datetime) -> str:
def generate_run_id(
run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None
*, run_type: DagRunType, logical_date: datetime | None, run_after: datetime | None = None

as the method becomes more complicate (hard to reason the order of logcial_date and run_after), we probably should make it keyword only

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

) -> str:
"""Generate Run ID based on Run Type and logical Date."""
# _Ensure_ run_type is a DagRunType, not just a string from user code
return DagRunType(run_type).generate_run_id(logical_date)
return DagRunType(run_type).generate_run_id(logical_date, run_after)

@staticmethod
@provide_session
Expand Down
5 changes: 3 additions & 2 deletions airflow/timetables/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,8 +281,9 @@ def generate_run_id(
self,
*,
run_type: DagRunType,
logical_date: DateTime,
logical_date: DateTime | None,
data_interval: DataInterval | None,
run_after: DateTime | None = None,
**extra,
) -> str:
return run_type.generate_run_id(logical_date)
return run_type.generate_run_id(logical_date, run_after)
5 changes: 3 additions & 2 deletions airflow/timetables/simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,15 +186,16 @@ def generate_run_id(
self,
*,
run_type: DagRunType,
logical_date: DateTime,
logical_date: DateTime | None,
data_interval: DataInterval | None,
run_after: DateTime | None = None,
session: Session | None = None,
events: Collection[AssetEvent] | None = None,
**extra,
) -> str:
from airflow.models.dagrun import DagRun

return DagRun.generate_run_id(run_type, logical_date)
return DagRun.generate_run_id(run_type, logical_date, run_after)

def data_interval_for_events(
self,
Expand Down
7 changes: 6 additions & 1 deletion airflow/utils/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from typing import TYPE_CHECKING, TypedDict

import airflow.sdk.definitions._internal.types
from airflow.utils.strings import get_random_string

if TYPE_CHECKING:
from datetime import datetime
Expand All @@ -42,7 +43,11 @@ class DagRunType(str, enum.Enum):
def __str__(self) -> str:
return self.value

def generate_run_id(self, logical_date: datetime) -> str:
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def generate_run_id(self, logical_date: datetime | None, run_after: datetime | None) -> str:
def generate_run_id(self, *, logical_date: datetime | None, run_after: datetime | None) -> str:

same here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

if logical_date is None:
if run_after is None:
raise ValueError("run_after cannot be None")
return run_after + get_random_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return run_after + get_random_string()
return f"{run_after}{get_random_string()}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616

return f"{self}__{logical_date.isoformat()}"
Comment on lines -45 to 51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_after could be None as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can’t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if this function should continue to accept one single datetime value, and we do the if-else check outside instead.

Here we need to know if logical_date is None or not before generating a random string that gets appended to run_after.
If we take just one argument, it wouldn't know when to append the random string. Are you suggesting to move this logic into the callers(Timetable.generate_run_id and DagRun.generate_run_id)?
I went with the current implementation to avoid duplicate code/multiple function calls.

Copy link
Member

@uranusjr uranusjr Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move random string generation to DagRun.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it here: #46616


@staticmethod
Expand Down
1 change: 1 addition & 0 deletions airflow/www/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -2230,6 +2230,7 @@ def trigger(self, dag_id: str, session: Session = NEW_SESSION):
logical_date=logical_date,
data_interval=data_interval,
run_type=DagRunType.MANUAL,
run_after=data_interval.end,
)

try:
Expand Down
2 changes: 1 addition & 1 deletion tests/models/test_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -3121,7 +3121,7 @@ def test_get_asset_triggered_next_run_info_with_unresolved_asset_alias(dag_maker
)
def test_create_dagrun_disallow_manual_to_use_automated_run_id(run_id_type: DagRunType) -> None:
dag = DAG(dag_id="test", start_date=DEFAULT_DATE, schedule="@daily")
run_id = run_id_type.generate_run_id(DEFAULT_DATE)
run_id = run_id_type.generate_run_id(logical_date=DEFAULT_DATE, run_after=DEFAULT_DATE)

with pytest.raises(ValueError) as ctx:
dag.create_dagrun(
Expand Down
Loading