Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Health endpoint #836

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Adding Health endpoint #836

wants to merge 19 commits into from

Conversation

otherview
Copy link
Member

Description

A health check service that lives inside the node as an API endpoint.

I think it's a good illustrative start for a PR, there are a few tech choices made that I consider would be worth discussing, such as:

  • Should this endpoint live in the Admin API ?
  • Should this be a singleton service ?
  • Is the naming on point ?

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • New and existing E2E tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have not added any vulnerable dependencies to my code

health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 58.33333% with 75 lines in your changes missing coverage. Please review.

Project coverage is 60.66%. Comparing base (49d9704) to head (c0aee23).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
cmd/thor/main.go 0.00% 29 Missing ⚠️
api/admin/health/health_api.go 63.15% 10 Missing and 4 partials ⚠️
api/admin/loglevel/log_level.go 69.76% 12 Missing and 1 partial ⚠️
api/admin/admin.go 0.00% 10 Missing ⚠️
api/admin/health/health.go 94.44% 1 Missing and 2 partials ⚠️
api/admin_server.go 0.00% 3 Missing ⚠️
comm/communicator.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #836      +/-   ##
==========================================
+ Coverage   60.62%   60.66%   +0.04%     
==========================================
  Files         215      218       +3     
  Lines       23099    23204     +105     
==========================================
+ Hits        14003    14076      +73     
- Misses       7947     7973      +26     
- Partials     1149     1155       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@otherview otherview marked this pull request as ready for review October 29, 2024 10:49
@otherview otherview requested a review from a team as a code owner October 29, 2024 10:49
@leszek-vechain
Copy link
Contributor

do we have any tests for this new endpoint ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin endpoints were buckled in this package and reformatted to follow the same pattern as other endpoints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, this can live in the init functions of the thor and thor-solo ?

@darrenvechain
Copy link
Member

darrenvechain commented Nov 4, 2024

I got this when I sync a new node, is it expected?

Edit: I just saw previous comment. IMO it is healthy if its syncing like this

curl http://localhost:2113/admin/health
{"healthy":false,"blockIngestion":{"bestBlock":"0x00014fe5a9e48a02268655e8206bb4cbcc95a1ae0d1f5f4360f697f02077c727","bestBlockIngestionTimestamp":"2024-11-04T10:49:16.06031Z"},"chainSync":false}
image

@darrenvechain
Copy link
Member

darrenvechain commented Nov 4, 2024

Is the response always 200? From my biased experienced with spring actuator, it returns 503 if some aspect of the application is unhealthy. This makes it easier for @kgapos health checks as he doesn't have to get the response body and parse and check etc. He can just check the response code, which is built into AWS very easily

FYI we can still have a response body with 503, so client can grab the response and do what they want

Edit: nvm: looks like it has 503, wasn't seeing it in testing locally

@kgapos
Copy link
Member

kgapos commented Nov 4, 2024

Yeah, @darrenvechain is right, it's going to be useless as far as the AWS ALB is concerned if it doesn't return a 5XX status code when the node is unhealthy. I understand this is unfortunate and counter intuitive, but unless you build it like that I have to implement some sort of wrapper, which is the exact thing we're trying to replace. Tagging @otherview for visibility.

Note that I mentioned this constraint in the related issue.

@otherview
Copy link
Member Author

I got this when I sync a new node, is it expected?
Edit: I just saw previous comment. IMO it is healthy if its syncing like this

I understand what you mean, it's the regular and expected behaviour to sync to the lastest block.
Perhaps it's possible to find another naming system, (ready comes to mind,) but the idea is that a node is not healthy to provide node operations if it's in the sync period.

This is the standard I've seen in other nodes, as it helps node operators to know when the node is 100% ready to process blockchain operations.

@otherview
Copy link
Member Author

Yeah, @darrenvechain is right, it's going to be useless as far as the AWS ALB is concerned if it doesn't return a 5XX status code when the node is unhealthy. I understand this is unfortunate and counter intuitive, but unless you build it like that I have to implement some sort of wrapper, which is the exact thing we're trying to replace. Tagging @otherview for visibility.

Yeah I got you covered :) It's returning a 503 when "healthy":false

health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
@libotony
Copy link
Member

libotony commented Nov 5, 2024

I would suggest to only leverage best block's timestamp in this function.

@darrenvechain darrenvechain mentioned this pull request Nov 8, 2024
10 tasks
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated
Comment on lines 45 to 51
func New(repo *chain.Repository, p2p *comm.Communicator, timeBetweenBlocks time.Duration) *Health {
return &Health{
repo: repo,
timeBetweenBlocks: timeBetweenBlocks + delayBuffer,
p2p: p2p,
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the health service to accept components and work in a pull fashion.
I think it looks better, thanks guys 🙏

health/health.go Outdated Show resolved Hide resolved
}
}

acc, err := h.healthStatus.Status(maxTimeBetweenSlots, minPeerCount)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added http query params so it's possible to access different tolerance in health constraints.

darrenvechain
darrenvechain previously approved these changes Nov 21, 2024
type Status struct {
Healthy bool `json:"healthy"`
BlockIngestion *BlockIngestion `json:"blockIngestion"`
ChainBootstrapped bool `json:"chainBootstrapped"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainBootstrapped can be removed considering BlockIngestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants