-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reachability check loop to expose reachability failures in node logs #556
Conversation
11fdf44
to
4d26766
Compare
4d26766
to
97ff548
Compare
97ff548
to
073ba2e
Compare
Adds auto semantic versioning to node makefile.
GitVersion knows that the latest tag release (currently) is
|
e5a5d24
to
3190725
Compare
node/node.go
Outdated
} | ||
|
||
checkUrl, err := url.Parse(fmt.Sprintf("%s/api/v1/operators-info/port-check?operator_id=%s", n.Config.DataApiUrl, n.Config.ID.Hex())) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which condition does it fall into if the dataapi itself was down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this one. This is just ensuring that the constructed URL parses as valid.
If DataAPI is down/unresolvable
If DataAPI is up but throwing non-200
Default check interval in 5min Minimum check interval is 10sec For backwards compatibility, the reachability check loop is disabled if 1. NODE_DATA_URL is undefined (ie operator did not update their .env) 1. NODE_REACHABILITY_POLL_INTERVAL is set to 0
…go unnoticed. - If NODE_DATAAPI_URL is not defined, we will continue to log error every interval. - If the constructed checkUrl is invalid, we will continue to log error every interval
Disable reachability goroutine if configuration is broken
3f03b34
to
6f15896
Compare
n.Logger.Debug("Calling reachability check", "url", checkUrl.String()) | ||
|
||
resp, err := http.Get(checkUrl.String()) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set a value in the gauge in case it gets error? @shrimalmadhur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should, otherwise it will show last value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one comment on if we need to update the gauge, when the call to dataapi fails to get response. @shrimalmadhur may have dealt with an issue for the onchain metric before, please chime in.
Thanks @jianoaix. If Dataapi is down, this isn't really the operators issue and does not mean they are unreachable. We will log this condition as error in log every interval, so I feel like it wont go unnoticed. |
so if the dataapi is down - what's the indicator for operators? like they will still see reachable from their side since the gauge value has not changed. How do they know if data api is down and reachability check is not working fine? |
Default check interval is 5min
Minimum check interval is 10sec
Operators can disable by setting
NODE_REACHABILITY_POLL_INTERVAL
to 0For backwards compatibility, the reachability check loop is disabled if the
NODE_DATAAPI_URL
is undefined (ie operator did not update their .env)See also Layr-Labs/eigenda-operator-setup#123
Checks