Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC-signer configuration #3725

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

RPC-signer configuration #3725

wants to merge 7 commits into from

Conversation

richardpringle
Copy link
Contributor

Why this should be merged

The enables the rpc-signer configuration

How this works

If the proper flag is set, a gprc-client is instantiated and used for signing. The node will fail to start if there isn't a grpc-server implmentation at supplied connection-string.

How this was tested

A table test was added in config/config_test.go that tests all possible signing configurations. This only tests the instantiation of the signer as the actual signing tests live in the respective signer packages.

Need to be documented in RELEASES.md?

Maybe? Do we have some kind of experimental flag? I believe we should have some actual experience running live nodes with an rpc-signer before we really document the usage here.

config/config.go Outdated

switch {

case ephemeralSignerEnabled && !contentKeyExists && !keyPathIsSet && !rpcSignerURLExists:
Copy link
Contributor

@joshua-kim joshua-kim Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is annoying... MarkFlagsMutuallyExclusive is defined but it's only in cobra, not viper (and I don't want to add scope to this PR). This comment is a bit random... just leaving thoughts so feel free to ignore this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it would be nice. We could abstract it (in another PR).

The problem (IMO) with the previous code is that it really clear what the behaviour should be. The flags had an implicit undocumented hierarchy, but it makes no sense to set more than one flag. I also don't think it's safe to assume that users will never accidentally set more than one of these flags.

Random comment, random response.

config/config.go Outdated
Comment on lines 649 to 651
contentKeyExists := v.IsSet(StakingSignerKeyContentKey)
keyPathIsSet := v.IsSet(StakingSignerKeyPathKey)
rpcSignerURLExists := v.IsSet(StakingRPCSignerKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be consistent w/ naming here? I think these should all just follow the naming of something like *IsSet instead of some adopting the *Exists suffix to also be consistent w/ viper.IsSet

config/config.go Outdated
Comment on lines 653 to 655
signingKeyPath := getExpandedArg(v, StakingSignerKeyPathKey)
_, err := os.Stat(signingKeyPath)
keyFileNotFound := errors.Is(err, fs.ErrNotExist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks better + reduces the diff if we just keep this within the case statement it's used in (+ we don't need to name the path not existing error).

config/config.go Outdated
signerKeyRawContent := v.GetString(StakingSignerKeyContentKey)
signerKeyContent, err := base64.StdEncoding.DecodeString(signerKeyRawContent)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded diff, generally errors are checked on the following line without an extra line break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits die hard. I always prefer empty lines surround any curly braces, but I think error checking is an exception

config/config.go Outdated

signer, err := rpcsigner.NewClient(ctx, conn)
if err != nil {
conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way we manage the lifecycle of this conn is supposed to live somewhere else (upstream of this code). I think we need to hold a reference to this conn and upon a fatal error and call Close to prevent the server from relying on a timeout to terminate the connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions. I'm not used to having to manage connections and clients separately so not sure what the best pattern is here. The conn is only required if the rpc-signer is enabled, does it not make sense for it to manage the connection? I know we had a bit of that discussion on my other PR.

Alternatively, I could return (bls.Signer, Conn, error) from this function (whatever the actual type of the connection is) and attach it to the config? Seems even weird to do that. I wouldn't have attached attached any kind of private-key or signer to the config, but that change would have been a lot more intrusive.

I'm open to suggestions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I think it's fine to leave the conn as is for now and the signer manage the connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... I went down a little bit of a rabbit hole here.

It would make sense to initialize the signer before initializing the node and pass it in to the constructor. That way, the node.Node can actually manage the signer as well as the connection. We do however get into a little bit of dependency hell as the same signer is passed down to a couple of other places as well.

I propose that sick with the current design for now and make broader configuration changes while we test out the rpc signer on a live network.

Comment on lines 585 to 591
type config map[string]any
type test map[string]struct {
viperKeys string
config config
expectedSignerType reflect.Type
expectedErr error
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can define these anonymously like so (and also avoid the map-style initialization that is inconsistent w/ the way we have most of our tests written)

	tests := []struct{
		name string
		foo     bar
	} {
		{
			name: "foo",
			foo:  bar{},
		},
	}

	for _, tt := range tests {
      // test here
	}

}

// required for proper write permissions for the default signer-key location
t.Setenv("HOME", t.TempDir())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a defer step to clean up stuff? Or is this overkill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe t.SetEnv takes care of this for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI:

// Setenv calls os.Setenv(key, value) and uses Cleanup to
// restore the environment variable to its original value
// after the test.
//
// Because Setenv affects the whole process, it cannot be used
// in parallel tests or tests with parallel ancestors.
func (t *T) Setenv(key, value string) {
	// Non-parallel subtests that have parallel ancestors may still
	// run in parallel with other tests: they are only non-parallel
	// with respect to the other subtests of the same parent.
	// Since SetEnv affects the whole process, we need to disallow it
	// if the current test or any parent is parallel.
	isParallel := false
	for c := &t.common; c != nil; c = c.parent {
		if c.isParallel {
			isParallel = true
			break
		}
	}
	if isParallel {
		panic("testing: t.Setenv called after t.Parallel; cannot set environment variables in parallel tests")
	}

	t.isEnvSet = true

	t.common.Setenv(key, value)
}

v.Set(key, value)
}

signer, err := getStakingSigner(context.Background(), v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we test this function instead of GetNodeConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change GetNodeConfig aside from taking a context.Context as an argument. These are unit tests; they are testing the most granular unit possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing code not exported by the actual package though (i.e, getStakingSigner is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig instead?

Comment on lines 644 to 648
if err != nil {
require.ErrorIs(err, test.expectedErr)
} else {
require.Equal(test.expectedSignerType, reflect.TypeOf(signer))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this nil handling? I thought the expected type of nil would work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting

require.ErrorIs(err, test.expectedErr)
require.Equal(test.expectedSigner, reflect.TypeOf(signer))

instead of the if? I hadn't thought about it, but I think that'll work

config/flags.go Outdated
@@ -271,6 +271,7 @@ func addNodeFlags(fs *pflag.FlagSet) {
fs.Bool(StakingEphemeralSignerEnabledKey, false, "If true, the node uses an ephemeral staking signer key")
fs.String(StakingSignerKeyPathKey, defaultStakingSignerKeyPath, fmt.Sprintf("Path to the signer private key for staking. Ignored if %s is specified", StakingSignerKeyContentKey))
fs.String(StakingSignerKeyContentKey, "", "Specifies base64 encoded signer private key for staking")
fs.String(StakingRPCSignerKey, "", "Specifies the gRPC endpoint of the staking signer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be consistent w/ rpc/grpc naming here

@richardpringle richardpringle force-pushed the signers-config-wip branch 2 times, most recently from 75bbe5c to ca2d04c Compare February 19, 2025 23:31
@richardpringle richardpringle marked this pull request as ready for review February 19, 2025 23:33
Comment on lines +28 to +31
// TODO: figure out the best parameters here given the target block-time
opts := grpc.WithConnectParams(grpc.ConnectParams{
Backoff: backoff.DefaultConfig,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we let the caller doesn't provide these options? Or was there a reason we chose to not pass this in as a parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking I didn't want different callers to pass in different parameters but rather have something that's more of a global configuration. I don't think that this is something that a node operator should really be messing with (yet).

// the request to the actual signer instead of relying on tls-credentials
conn, err := grpc.NewClient(rpcSignerURL, opts, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
return nil, fmt.Errorf("couldn't create rpc signer client: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally avoid special characters where possible in logs errors since you usually have to escape them when grepping through logs. Can we make this something like could not or failed to or anything along those lines? Our old code doesn't follow this pattern but I think we should avoid it in new code we write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good reason. What about the :?

var err error
defer func() {
if err != nil {
c.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea I'm wondering is that if we can do better than the current ux of forcing the caller to re-create the client when this errors and manage that within the Client type. What if we started a goroutine in New that would keep trying to re-connect if we closed the connection, and Sign* if we're not connected could just fast-fail and return some "disconnected error"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Sign* fast fails if we've closed the connection in the current implementation. You think we should just infinitely keep trying to reconnect? Otherwise, we should just set the backoff parameters accordingly.

Remember that this is meant to run as a sidecar. I think we could enhance it to a point where we're comfortable with people running on a less reliable network too, but I think it's beneficial to go with baby steps here. A good way to enforce that would be to only take in a port as the configuration parameter and add a URL later (always connecting on the loopback IP).

But I just can't imagine what strategy would be used on top of the gprc-client strategy for reconnecting. If we have an error, it's likely that we need some manual intervention.

The good thing right now is that we sign our IP when the node is booting up then re-use that signed IP. I'm not sure how an IP change is handled, but PoP on the IP-message only happens once from what I can tell. If the signer fails, we would stop signing warp messages, which doesn't necessarily mean we need to crash the node, but it's likely that there needs to be manual intervention. Maybe in that case it would make sense that we just keep trying to reconnect periodically in a loop, is that what you're suggesting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing for both of us to keep in mind is that if we have further BLS adoption, this code might be used in ways that we wouldn't expect to use it right now (like signing blocks or something) in which case we will have to provide further protections here. Maybe you're already thinking of that though

v.Set(key, value)
}

signer, err := getStakingSigner(context.Background(), v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing code not exported by the actual package though (i.e, getStakingSigner is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants