-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(runner): heartbeat #1493
feat(runner): heartbeat #1493
Conversation
Co-authored-by: Frank Elsinga <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it locally and everything works as expected. There's just one edge case/bug (?) where I'm not sure if this behavior is intended:
Setup:
- Start GoCast API
- Start one runner:
+-----------+-------+-------------------------+----------+-----------+
| hostname | port | last_seen | draining | job_count |
+-----------+-------+-------------------------+----------+-----------+
| localhost | 43215 | 2025-02-07 01:12:21.015 | 0 | 0 |
+-----------+-------+-------------------------+----------+-----------+
- Refresh the runner table; the port should be set (here to
43215
), andlast_seen
should be updated. - Start a second runner (registered on the same hostname):
+-----------+-------+-------------------------+----------+-----------+
| hostname | port | last_seen | draining | job_count |
+-----------+-------+-------------------------+----------+-----------+
| localhost | 44173 | 2025-02-07 01:13:36.044 | 0 | 0 |
+-----------+-------+-------------------------+----------+-----------+
-> As per #1490, on insert conflicts, the port is updated (here to 44173
).
-> Refresh runner table; last_seen
should be now updated.
To reproduce the unexpected part:
- Stop the second runner:
{"time":"2025-02-07T01:16:59.060683039+01:00","level":"INFO","msg":"Runner set to drain.","version":"dev"}
2025/02/07 01:17:09 INFO No jobs left, shutting down
- I'd expect the runner to be set to
draining=1
, but it remainsdraining=0
. - I'd expect that
last_seen
is set to the time of the last notification received by the runner before it was stopped. Instead, it continues to be updated by the first runner:
+-----------+-------+-------------------------+----------+-----------+
| hostname | port | last_seen | draining | job_count |
+-----------+-------+-------------------------+----------+-----------+
| localhost | 44173 | 2025-02-07 01:20:11.049 | 0 | 0 |
+-----------+-------+-------------------------+----------+-----------+
- This results in the database making it look like there's an alive runner (in this case on port
44173
)
I think we can assume that only one runner ever runs on a single host. If we want to change that in the future, we need to add the port to the Get() query and should be good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think we should keep it in mind for the future in case someone plans to deploy runners manually or if we ever support multiple runners per host.
I agree, we should get back to this in the future. Thanks for pointing out! |
Motivation and Context
A runner should regularly report its liveliness, load and state so gocast can make decisions when scheduling jobs.
Description
This PR
LastSeen
,Draining
andJobCount
to the runner modelNotification.data
;HeartbeatNotification
runner_manager.Manager
r.notifications
channelSteps for Testing
last_seen
should update periodically)