Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function on flex consumption cancelling <100ms under heavy load #10693

Open
jeremy-skippen-jbhifi opened this issue Dec 12, 2024 · 8 comments
Open
Assignees
Labels
bot: do not close Prevents bot automation from closing issues Needs: Attention 👋

Comments

@jeremy-skippen-jbhifi
Copy link

I'm seeing some behavior in a test app I've stood up where a small percentage of functions are being cancelled when under heavy load.
The app is a .NET 9 isolated function with a storage queue trigger.

Example:

  • Timestamp: 2024-12-12T03:33:23.2017486Z
  • Function App version: azurefunctions-netiso: 2.0.0+d8b5fe998a8c92819b8ee41d2569d252541
  • Invocation ID: 61472a11-657d-4bd9-9fa7-4430ff62de9a
  • Host Instance ID: 498be43b-39ce-4b92-94f5-9944e9e3f43e
  • Region: Australia East

Repro steps

My test function is triggered off a storage queue.
It does the following.

QueueMessage -> Get Blob -> Update Blob -> Call API

For my test I'm spamming the queue with messages and getting anywhere from 50k - 120k requests per minute.
A small percentage of these are failing with a task cancelled exception:

Image

The cancellation is happening at all points in my test app; some during the Get Blob, some during the Update Blob, and some during the API step. 1 seems to have failed before the worker was invoked.

@jeremy-skippen-jbhifi
Copy link
Author

I've checked on a linux consumption plan and can see the same behavior, e.g.:

  • Timestamp: 2024-12-17T03:32:28.837434Z
  • Function App version: azurefunctions-netiso: 2.0.0+d8b5fe998a8c92819b8ee41d2569d252541
  • Invocation ID: b72b1602-73c1-4a59-81dc-5f0165cd2c32
  • Host Instance ID: 256a5709-c776-4aab-b097-4d3752a2840d
  • Region: Australia East

The consumption plan performed significantly worse and I started to get ~2.4k of these messages a minute:

[HostMonitor] Host CPU threshold exceeded (82 >= 80)

@jeremy-skippen-jbhifi
Copy link
Author

I've run the same load on the same build on an elastic premium plan and did not get exceptions

@JAdluri JAdluri self-assigned this Dec 19, 2024
@JAdluri
Copy link

JAdluri commented Dec 19, 2024

Hello @jeremy-skippen-jbhifi Thank you reporting the issue will check and let you know further steps

@JAdluri
Copy link

JAdluri commented Dec 20, 2024

Could you please let me know the steps to reproduce the issue.

@jeremy-skippen-jbhifi
Copy link
Author

/bot not-stale

@jeremy-skippen-jbhifi
Copy link
Author

@JAdluri can you please reopen this issue, I've been on leave for 2 weeks over the holiday period so missed these updates.

The reproduction steps are described in the description of the issue - I have a storage queue triggered function that I'm hitting hard with lots of messages. Do you need a sample application?

@jeremy-skippen-jbhifi
Copy link
Author

@JAdluri here's a cut-down version of the app that triggered the errors.
JBHi-Fi/az-fn-host-10693

I've run it locally and am getting the same cancellation errors, e.g. Invocation Id: 6299feb7-bb55-4dc2-af21-b4bd7c6efbfb

@JAdluri JAdluri reopened this Jan 6, 2025
@JAdluri
Copy link

JAdluri commented Jan 6, 2025

Hello @jeremy-skippen-jbhifi I will look into this furtherly . Thanks for reporting back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: do not close Prevents bot automation from closing issues Needs: Attention 👋
Projects
None yet
Development

No branches or pull requests

2 participants