Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds NodeInfo endpoint to retriever and dispersal services for observability #630

Merged
merged 8 commits into from
Jul 10, 2024

Conversation

pschork
Copy link
Contributor

@pschork pschork commented Jul 9, 2024

After deploying this change, we can have dataapi (via reachability check) hit this endpoint and log semver per operator.

grpcurl --plaintext localhost:32005 node.Dispersal.VersionInfo
{
  "semver": "0.7.5"
}
grpcurl --plaintext localhost:32005 node.Dispersal.VersionInfo
{
  "semver": "0.8.0-rc.1"
}

Why are these changes needed?

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@pschork pschork requested review from jianoaix and ian-shim July 9, 2024 20:07
…ervability into node versions

```
grpcurl --plaintext localhost:32005 node.Dispersal.VersionInfo
{
  "semver": "0.7.5"
}
```

```
grpcurl --plaintext localhost:32005 node.Dispersal.VersionInfo
{
  "semver": "0.8.0-rc.1"
}
```
@pschork pschork force-pushed the pschork/version_info branch from 6e4b456 to ad343e7 Compare July 9, 2024 20:13
@pschork pschork marked this pull request as ready for review July 9, 2024 20:19
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
Does this work in parallel with the existing http endpoint? xxx.xxx.xxx.xxx/eigen/node
What's the advantage over the previous version check?

@@ -14,13 +14,17 @@ service Dispersal {
// for the protocol-defined length of custody. It will return a signature at the
// end to attest to the data in this request it has processed.
rpc StoreChunks(StoreChunksRequest) returns (StoreChunksReply) {}
// Retrieve version into metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

@@ -33,7 +33,7 @@ var (
0: "eth_quorum",
1: "permissioned_quorum",
}
SemVer = "v0.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this used today and what difference does this make?

Copy link
Contributor Author

@pschork pschork Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fallback. Only gets triggered if you do a go build without passing -ldflags="-X 'github.com/Layr-Labs/eigenda/node.SemVer=${SEMVER}' so pretty much just our unit tests will see 0.0.0

I'm removed the v to be semver spec compliant. None of our released docker builds use the v prefix.

node/config.go Outdated
@@ -33,7 +33,7 @@ var (
0: "eth_quorum",
1: "permissioned_quorum",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is used. Do we need to update this to eigen_quorum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

node/config.go Outdated
@@ -33,7 +33,7 @@ var (
0: "eth_quorum",
1: "permissioned_quorum",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

service Retrieval {
// RetrieveChunks retrieves the chunks for a blob custodied at the Node.
rpc RetrieveChunks(RetrieveChunksRequest) returns (RetrieveChunksReply) {}
// GetBlobHeader is similar to RetrieveChunks, this just returns the header of the blob.
rpc GetBlobHeader(GetBlobHeaderRequest) returns (GetBlobHeaderReply) {}
// Retrieve version into metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into -> info

api/proto/node/node.proto Outdated Show resolved Hide resolved
@pschork
Copy link
Contributor Author

pschork commented Jul 10, 2024

lgtm Does this work in parallel with the existing http endpoint? xxx.xxx.xxx.xxx/eigen/node What's the advantage over the previous version check?

The advantage is that we have a guarantee that it is reachable because the IP:PORT is registered. The node api runs on port 9093 by default, but operator is free to change that port, and there is no guarantee that port is publicly exposed.

@pschork pschork force-pushed the pschork/version_info branch from ba014f7 to fb9314f Compare July 10, 2024 01:43
@pschork
Copy link
Contributor Author

pschork commented Jul 10, 2024

Adds node info metadata

grpcurl --plaintext localhost:32005 node.Dispersal.NodeInfo
{
  "semver": "0.8.0-pschork-version-info.1",
  "arch": "arm64",
  "os": "linux",
  "numCpu": 8,
  "memBytes": "8222203904"
}

api/proto/node/node.proto Show resolved Hide resolved
@jianoaix
Copy link
Contributor

jianoaix commented Jul 10, 2024

lgtm Does this work in parallel with the existing http endpoint? xxx.xxx.xxx.xxx/eigen/node What's the advantage over the previous version check?

The advantage is that we have a guarantee that it is reachable because the IP:PORT is registered. The node api runs on port 9093 by default, but operator is free to change that port, and there is no guarantee that port is publicly exposed.

If we want to rely on this API operationally (not just informationally), it has to have a granrantee.
However I wonder if there is any concern of forcing this API on each operator

node/grpc/server.go Outdated Show resolved Hide resolved
@pschork
Copy link
Contributor Author

pschork commented Jul 10, 2024

However I wonder if there is any concern of forcing this API on each operator

Exposing semver seems reasonable and defensible for purpose of managing breaking migration rollout. Semver is already exposed on node api, though there is no guarantee node api is exposed externally.

System level info is less defensible and operators could raise security/privacy concerns to allowing the enumeration of internal specs.

cc @anupsv

@pschork
Copy link
Contributor Author

pschork commented Jul 10, 2024

Added a flag to disable resource information (os, arch, cpus, mem) from the NodeInfo API.
This will allow operator to OPT-OUT of exposing resource info.
Semver is always exposed.

# NODE_DISABLE_NODE_INFO_RESOURCES=true
grpcurl --plaintext localhost:32005 node.Dispersal.NodeInfo
{
  "semver": "0.8.0-pschork-version-info.1"
}

@pschork pschork force-pushed the pschork/version_info branch from 6e6f421 to 44695ea Compare July 10, 2024 21:47
@pschork
Copy link
Contributor Author

pschork commented Jul 10, 2024

Added a flag to disable resource information (os, arch, cpus, mem) from the NodeInfo API. This will allow operator to OPT-OUT of exposing resource info. Semver is always exposed.

# NODE_DISABLE_NODE_INFO_RESOURCES=true
grpcurl --plaintext localhost:32005 node.Dispersal.NodeInfo
{
  "semver": "0.8.0-pschork-version-info.1"
}

If we want to be less aggressive, perhaps this should be OPT-IN. @jianoaix @ian-shim @anupsv @nivethapu

Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think opt out is totally reasonable

@pschork pschork changed the title Adds VersionInfo endpoint to retriever and dispersal services for observability Adds NodeInfo endpoint to retriever and dispersal services for observability Jul 10, 2024
@pschork pschork merged commit c2a0a33 into master Jul 10, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants