-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC-signer configuration #3725
base: master
Are you sure you want to change the base?
RPC-signer configuration #3725
Conversation
config/config.go
Outdated
|
||
switch { | ||
|
||
case ephemeralSignerEnabled && !contentKeyExists && !keyPathIsSet && !rpcSignerURLExists: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is annoying... MarkFlagsMutuallyExclusive
is defined but it's only in cobra
, not viper
(and I don't want to add scope to this PR). This comment is a bit random... just leaving thoughts so feel free to ignore this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it would be nice. We could abstract it (in another PR).
The problem (IMO) with the previous code is that it really clear what the behaviour should be. The flags had an implicit undocumented hierarchy, but it makes no sense to set more than one flag. I also don't think it's safe to assume that users will never accidentally set more than one of these flags.
Random comment, random response.
config/config.go
Outdated
contentKeyExists := v.IsSet(StakingSignerKeyContentKey) | ||
keyPathIsSet := v.IsSet(StakingSignerKeyPathKey) | ||
rpcSignerURLExists := v.IsSet(StakingRPCSignerKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be consistent w/ naming here? I think these should all just follow the naming of something like *IsSet
instead of some adopting the *Exists
suffix to also be consistent w/ viper.IsSet
config/config.go
Outdated
signingKeyPath := getExpandedArg(v, StakingSignerKeyPathKey) | ||
_, err := os.Stat(signingKeyPath) | ||
keyFileNotFound := errors.Is(err, fs.ErrNotExist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks better + reduces the diff if we just keep this within the case
statement it's used in (+ we don't need to name the path not existing error).
config/config.go
Outdated
signerKeyRawContent := v.GetString(StakingSignerKeyContentKey) | ||
signerKeyContent, err := base64.StdEncoding.DecodeString(signerKeyRawContent) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unneeded diff, generally errors are checked on the following line without an extra line break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old habits die hard. I always prefer empty lines surround any curly braces, but I think error checking is an exception
config/config.go
Outdated
|
||
signer, err := rpcsigner.NewClient(ctx, conn) | ||
if err != nil { | ||
conn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way we manage the lifecycle of this conn is supposed to live somewhere else (upstream of this code). I think we need to hold a reference to this conn and upon a fatal error and call Close
to prevent the server from relying on a timeout to terminate the connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions. I'm not used to having to manage connections and clients separately so not sure what the best pattern is here. The conn
is only required if the rpc-signer is enabled, does it not make sense for it to manage the connection? I know we had a bit of that discussion on my other PR.
Alternatively, I could return (bls.Signer, Conn, error)
from this function (whatever the actual type of the connection is) and attach it to the config
? Seems even weird to do that. I wouldn't have attached attached any kind of private-key or signer to the config, but that change would have been a lot more intrusive.
I'm open to suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I think it's fine to leave the conn
as is for now and the signer
manage the connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... I went down a little bit of a rabbit hole here.
It would make sense to initialize the signer before initializing the node and pass it in to the constructor. That way, the node.Node
can actually manage the signer as well as the connection. We do however get into a little bit of dependency hell as the same signer is passed down to a couple of other places as well.
I propose that sick with the current design for now and make broader configuration changes while we test out the rpc signer on a live network.
config/config_test.go
Outdated
type config map[string]any | ||
type test map[string]struct { | ||
viperKeys string | ||
config config | ||
expectedSignerType reflect.Type | ||
expectedErr error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can define these anonymously like so (and also avoid the map-style initialization that is inconsistent w/ the way we have most of our tests written)
tests := []struct{
name string
foo bar
} {
{
name: "foo",
foo: bar{},
},
}
for _, tt := range tests {
// test here
}
} | ||
|
||
// required for proper write permissions for the default signer-key location | ||
t.Setenv("HOME", t.TempDir()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a defer step to clean up stuff? Or is this overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe t.SetEnv
takes care of this for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI:
// Setenv calls os.Setenv(key, value) and uses Cleanup to
// restore the environment variable to its original value
// after the test.
//
// Because Setenv affects the whole process, it cannot be used
// in parallel tests or tests with parallel ancestors.
func (t *T) Setenv(key, value string) {
// Non-parallel subtests that have parallel ancestors may still
// run in parallel with other tests: they are only non-parallel
// with respect to the other subtests of the same parent.
// Since SetEnv affects the whole process, we need to disallow it
// if the current test or any parent is parallel.
isParallel := false
for c := &t.common; c != nil; c = c.parent {
if c.isParallel {
isParallel = true
break
}
}
if isParallel {
panic("testing: t.Setenv called after t.Parallel; cannot set environment variables in parallel tests")
}
t.isEnvSet = true
t.common.Setenv(key, value)
}
v.Set(key, value) | ||
} | ||
|
||
signer, err := getStakingSigner(context.Background(), v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we test this function instead of GetNodeConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change GetNodeConfig
aside from taking a context.Context
as an argument. These are unit tests; they are testing the most granular unit possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing code not exported by the actual package though (i.e, getStakingSigner
is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig
instead?
config/config_test.go
Outdated
if err != nil { | ||
require.ErrorIs(err, test.expectedErr) | ||
} else { | ||
require.Equal(test.expectedSignerType, reflect.TypeOf(signer)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this nil
handling? I thought the expected type of nil
would work here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting
require.ErrorIs(err, test.expectedErr)
require.Equal(test.expectedSigner, reflect.TypeOf(signer))
instead of the if
? I hadn't thought about it, but I think that'll work
config/flags.go
Outdated
@@ -271,6 +271,7 @@ func addNodeFlags(fs *pflag.FlagSet) { | |||
fs.Bool(StakingEphemeralSignerEnabledKey, false, "If true, the node uses an ephemeral staking signer key") | |||
fs.String(StakingSignerKeyPathKey, defaultStakingSignerKeyPath, fmt.Sprintf("Path to the signer private key for staking. Ignored if %s is specified", StakingSignerKeyContentKey)) | |||
fs.String(StakingSignerKeyContentKey, "", "Specifies base64 encoded signer private key for staking") | |||
fs.String(StakingRPCSignerKey, "", "Specifies the gRPC endpoint of the staking signer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be consistent w/ rpc/grpc
naming here
75bbe5c
to
ca2d04c
Compare
0ba7115
to
9801754
Compare
// TODO: figure out the best parameters here given the target block-time | ||
opts := grpc.WithConnectParams(grpc.ConnectParams{ | ||
Backoff: backoff.DefaultConfig, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let the caller doesn't provide these options? Or was there a reason we chose to not pass this in as a parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking I didn't want different callers to pass in different parameters but rather have something that's more of a global configuration. I don't think that this is something that a node operator should really be messing with (yet).
// the request to the actual signer instead of relying on tls-credentials | ||
conn, err := grpc.NewClient(rpcSignerURL, opts, grpc.WithTransportCredentials(insecure.NewCredentials())) | ||
if err != nil { | ||
return nil, fmt.Errorf("couldn't create rpc signer client: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I generally avoid special characters where possible in logs errors since you usually have to escape them when grepping through logs. Can we make this something like could not
or failed to
or anything along those lines? Our old code doesn't follow this pattern but I think we should avoid it in new code we write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good reason. What about the :
?
var err error | ||
defer func() { | ||
if err != nil { | ||
c.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea I'm wondering is that if we can do better than the current ux of forcing the caller to re-create the client when this errors and manage that within the Client
type. What if we started a goroutine in New
that would keep trying to re-connect if we closed the connection, and Sign*
if we're not connected could just fast-fail and return some "disconnected error"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Sign*
fast fails if we've closed the connection in the current implementation. You think we should just infinitely keep trying to reconnect? Otherwise, we should just set the backoff parameters accordingly.
Remember that this is meant to run as a sidecar. I think we could enhance it to a point where we're comfortable with people running on a less reliable network too, but I think it's beneficial to go with baby steps here. A good way to enforce that would be to only take in a port as the configuration parameter and add a URL later (always connecting on the loopback IP).
But I just can't imagine what strategy would be used on top of the gprc-client strategy for reconnecting. If we have an error, it's likely that we need some manual intervention.
The good thing right now is that we sign our IP when the node is booting up then re-use that signed IP. I'm not sure how an IP change is handled, but PoP on the IP-message only happens once from what I can tell. If the signer fails, we would stop signing warp messages, which doesn't necessarily mean we need to crash the node, but it's likely that there needs to be manual intervention. Maybe in that case it would make sense that we just keep trying to reconnect periodically in a loop, is that what you're suggesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing for both of us to keep in mind is that if we have further BLS adoption, this code might be used in ways that we wouldn't expect to use it right now (like signing blocks or something) in which case we will have to provide further protections here. Maybe you're already thinking of that though
v.Set(key, value) | ||
} | ||
|
||
signer, err := getStakingSigner(context.Background(), v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing code not exported by the actual package though (i.e, getStakingSigner
is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig
instead?
Why this should be merged
The enables the rpc-signer configuration
How this works
If the proper flag is set, a gprc-client is instantiated and used for signing. The
node
will fail to start if there isn't a grpc-server implmentation at supplied connection-string.How this was tested
A table test was added in
config/config_test.go
that tests all possible signing configurations. This only tests the instantiation of the signer as the actual signing tests live in the respective signer packages.Need to be documented in RELEASES.md?
Maybe? Do we have some kind of experimental flag? I believe we should have some actual experience running live nodes with an rpc-signer before we really document the usage here.