make the default JOB_EVENT_BUFFER_SECONDS 1 seconds #14335

jainnikhil30 · 2023-08-11T14:23:38Z

This is in conjunction with experiment done for increase the JOB_EVENT_BUFFER_SECONDS from 0.1 second to 1 second. In a high load scenario with loads of event getting generated, with the default 0.1 seconds we end up with loads of small bins, which is not desirable. With 1 second we get bins with bigger sizes. It is clearly visible with following graphs from Grafana:

With 0.1 seconds:

with 1 seconds:

Thus making the default JOB_EVENT_BUFFER_SECONDS 1 seconds makes sense.

ISSUE TYPE

Bug, Docs Fix or other nominal change

AlanCoding · 2023-08-11T14:37:21Z

Let me go ahead and make some comments public - specifically why I'm not worried about this change for responsiveness.

When I think of responsiveness, I have manual behavior in mind. If you are trying to do something by clicking, you presumably go to a job template and click the "launch" button. In that case, it is unlikely there are other background jobs running which are actively making progress as most long-running jobs are doing work, but spend their time inside heavy tasks.

If your playbook runs fast (or just runs in bursts, which will always be true) then you'll see delay due to the redis timeout.

awx/awx/main/dispatch/worker/callback.py

Line 89 in 5cf93fe

res = self.redis.blpop(self.queue_name, timeout=1)

What's probably normal and predicable (as user is watching the standard out) is that a handful of events come in, go in the buffer, and then there's nothing else to read. As such, it'll stay in the read until the timeout is hit. After that, it will flush. So that 1 second read timeout is the main thing that will delay the time until the user sees the first events. This is particularly true with a significant number of callback workers (as is factually the case, at least 4) that split the events between themselves.

I think it's worth considering how we can increase responsiveness. For example, if the last read was known to be a timeout, we could flush on seeing the first new event (only if stdout is not empty). That would alert the user to a batch of events coming in. However, to back this up with measurements, we should have some first-event timing benchmarks.

AlanCoding

those are some big 🪣 s

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds

d6bf985

github-actions bot added the component:api label Aug 11, 2023

jainnikhil30 requested a review from kdelee August 11, 2023 14:23

github-actions bot added the community label Aug 11, 2023

jainnikhil30 requested a review from AlanCoding August 11, 2023 14:24

AlanCoding approved these changes Aug 11, 2023

View reviewed changes

kdelee approved these changes Aug 11, 2023

View reviewed changes

jainnikhil30 merged commit 4cd9016 into ansible:devel Aug 12, 2023

jainnikhil30 deleted the increase_the_job_event_buffer_seconds branch August 12, 2023 02:19

djyasin pushed a commit to djyasin/awx that referenced this pull request Sep 16, 2024

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds (ansible#14335)

730d580

djyasin pushed a commit to djyasin/awx that referenced this pull request Nov 11, 2024

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds (ansible#14335)

d298d08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds #14335

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds #14335

jainnikhil30 commented Aug 11, 2023 •

edited by AlanCoding

Loading

AlanCoding commented Aug 11, 2023

AlanCoding left a comment

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds #14335

make the default JOB_EVENT_BUFFER_SECONDS 1 seconds #14335

Conversation

jainnikhil30 commented Aug 11, 2023 • edited by AlanCoding Loading

ISSUE TYPE

AlanCoding commented Aug 11, 2023

AlanCoding left a comment

Choose a reason for hiding this comment

jainnikhil30 commented Aug 11, 2023 •

edited by AlanCoding

Loading