Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Late flow runs appear despite CANCEL_NEW collision strategy when worker or work pool has a concurrency limit #16984

Open
biancaines opened this issue Feb 5, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@biancaines
Copy link
Contributor

biancaines commented Feb 5, 2025

Bug summary

When using the CANCEL_NEW collision strategy, new flow runs exceeding the defined concurrency limit should be immediately canceled rather than queued. However, users are observing that flow runs still accumulate with a Late status when concurrency limits are set at the worker or work pool level, even though the deployment’s collision strategy is correctly configured as CANCEL_NEW.

This behavior suggests that the additional concurrency limits at the worker or work pool level may be interfering with the expected cancellation process, causing flow runs to remain in a Late state instead of being cleared out as intended.

Expected Behavior:
Flow runs that exceed the concurrency limit should be canceled immediately when CANCEL_NEW is set, regardless of additional concurrency limits on the worker or work pool.

Actual Behavior:
Instead of being canceled, excess flow runs remain in a Late state.

Steps to Reproduce:

  1. Create a work pool and worker that both have concurrency limits of 3.
  2. Deploy a slow-running flow with a CANCEL_NEW collision strategy, and a concurrency limit of 3.
  3. Observe that new runs, instead of being canceled, are accumulating in the queue with a Late status.
  4. Removing the concurrency limits at both the worker and work pool levels allows the expected behavior (cancellation of new runs) to occur.

Reproduction Example:

test flow

# flows.py
from prefect import flow
from time import sleep

@flow
def slow_flow():
    sleep(90)

prefect.yaml file

# prefect.yaml
name: slow_flow_test

deployments:
    - name: "slow_flow"
      entrypoint: flows.py:slow_flow
      work_pool:
        name: slow_flow_pool
      schedule:
          interval: 30
      concurrency_limit:
          limit: 3
          collision_strategy: CANCEL_NEW

To create the work pool, deploy the flow, and start the worker through the CLI:

prefect work-pool create slow_flow_pool --type process
prefect work-pool set-concurrency-limit slow_flow_pool 3
prefect deploy --all
prefect worker start --pool slow_flow_pool --type process --limit 3 

Version info

Version:             3.1.12.dev4
API version:         0.8.4
Python version:      3.12.8
Git commit:          e9890a92
Built:               Tue, Jan 7, 2025 3:17 PM
OS/Arch:             darwin/arm64
Profile:             bianca-sandbox
Server type:         cloud
Pydantic version:    2.10.4

Additional context

No response

@biancaines biancaines added the bug Something isn't working label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant