RPC-signer configuration #3725

richardpringle · 2025-02-18T00:53:24Z

Why this should be merged

The enables the rpc-signer configuration

How this works

If the proper flag is set, a gprc-client is instantiated and used for signing. The node will fail to start if there isn't a grpc-server implmentation at supplied connection-string.

How this was tested

A table test was added in config/config_test.go that tests all possible signing configurations. This only tests the instantiation of the signer as the actual signing tests live in the respective signer packages.

Need to be documented in RELEASES.md?

Maybe? Do we have some kind of experimental flag? I believe we should have some actual experience running live nodes with an rpc-signer before we really document the usage here.

joshua-kim · 2025-02-18T15:50:26Z

config/config.go

+
+	switch {
+
+	case ephemeralSignerEnabled && !contentKeyExists && !keyPathIsSet && !rpcSignerURLExists:


Hmm this is annoying... MarkFlagsMutuallyExclusive is defined but it's only in cobra, not viper (and I don't want to add scope to this PR). This comment is a bit random... just leaving thoughts so feel free to ignore this.

I agree, it would be nice. We could abstract it (in another PR).

The problem (IMO) with the previous code is that it really clear what the behaviour should be. The flags had an implicit undocumented hierarchy, but it makes no sense to set more than one flag. I also don't think it's safe to assume that users will never accidentally set more than one of these flags.

Random comment, random response.

joshua-kim · 2025-02-18T15:54:23Z

config/config.go

+	contentKeyExists := v.IsSet(StakingSignerKeyContentKey)
+	keyPathIsSet := v.IsSet(StakingSignerKeyPathKey)
+	rpcSignerURLExists := v.IsSet(StakingRPCSignerKey)


Can we be consistent w/ naming here? I think these should all just follow the naming of something like *IsSet instead of some adopting the *Exists suffix to also be consistent w/ viper.IsSet

joshua-kim · 2025-02-18T16:03:43Z

config/config.go

+	signingKeyPath := getExpandedArg(v, StakingSignerKeyPathKey)
+	_, err := os.Stat(signingKeyPath)
+	keyFileNotFound := errors.Is(err, fs.ErrNotExist)


I think this looks better + reduces the diff if we just keep this within the case statement it's used in (+ we don't need to name the path not existing error).

joshua-kim · 2025-02-18T16:05:27Z

config/config.go

 		signerKeyRawContent := v.GetString(StakingSignerKeyContentKey)
 		signerKeyContent, err := base64.StdEncoding.DecodeString(signerKeyRawContent)
+


nit: unneeded diff, generally errors are checked on the following line without an extra line break

Old habits die hard. I always prefer empty lines surround any curly braces, but I think error checking is an exception

joshua-kim · 2025-02-18T16:17:30Z

config/config.go

+
+		signer, err := rpcsigner.NewClient(ctx, conn)
+		if err != nil {
+			conn.Close()


I think the way we manage the lifecycle of this conn is supposed to live somewhere else (upstream of this code). I think we need to hold a reference to this conn and upon a fatal error and call Close to prevent the server from relying on a timeout to terminate the connection

I'm open to suggestions. I'm not used to having to manage connections and clients separately so not sure what the best pattern is here. The conn is only required if the rpc-signer is enabled, does it not make sense for it to manage the connection? I know we had a bit of that discussion on my other PR.

Alternatively, I could return (bls.Signer, Conn, error) from this function (whatever the actual type of the connection is) and attach it to the config? Seems even weird to do that. I wouldn't have attached attached any kind of private-key or signer to the config, but that change would have been a lot more intrusive.

I'm open to suggestions here.

As discussed offline, I think it's fine to leave the conn as is for now and the signer manage the connection

So... I went down a little bit of a rabbit hole here.

It would make sense to initialize the signer before initializing the node and pass it in to the constructor. That way, the node.Node can actually manage the signer as well as the connection. We do however get into a little bit of dependency hell as the same signer is passed down to a couple of other places as well.

I propose that sick with the current design for now and make broader configuration changes while we test out the rpc signer on a live network.

joshua-kim · 2025-02-18T16:30:34Z

config/config_test.go

+	type config map[string]any
+	type test map[string]struct {
+		viperKeys          string
+		config             config
+		expectedSignerType reflect.Type
+		expectedErr        error
+	}


nit: you can define these anonymously like so (and also avoid the map-style initialization that is inconsistent w/ the way we have most of our tests written)

tests := []struct{ name string foo bar } { { name: "foo", foo: bar{}, }, } for _, tt := range tests { // test here }

joshua-kim · 2025-02-18T16:32:50Z

config/config_test.go

+	}
+
+	// required for proper write permissions for the default signer-key location
+	t.Setenv("HOME", t.TempDir())


Should we have a defer step to clean up stuff? Or is this overkill?

I believe t.SetEnv takes care of this for you

FYI:

// Setenv calls os.Setenv(key, value) and uses Cleanup to // restore the environment variable to its original value // after the test. // // Because Setenv affects the whole process, it cannot be used // in parallel tests or tests with parallel ancestors. func (t *T) Setenv(key, value string) { // Non-parallel subtests that have parallel ancestors may still // run in parallel with other tests: they are only non-parallel // with respect to the other subtests of the same parent. // Since SetEnv affects the whole process, we need to disallow it // if the current test or any parent is parallel. isParallel := false for c := &t.common; c != nil; c = c.parent { if c.isParallel { isParallel = true break } } if isParallel { panic("testing: t.Setenv called after t.Parallel; cannot set environment variables in parallel tests") } t.isEnvSet = true t.common.Setenv(key, value) }

joshua-kim · 2025-02-18T16:34:04Z

config/config_test.go

+				v.Set(key, value)
+			}
+
+			signer, err := getStakingSigner(context.Background(), v)


Why do we test this function instead of GetNodeConfig?

I didn't change GetNodeConfig aside from taking a context.Context as an argument. These are unit tests; they are testing the most granular unit possible.

This is testing code not exported by the actual package though (i.e, getStakingSigner is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig instead?

joshua-kim · 2025-02-18T16:34:46Z

config/config_test.go

+			if err != nil {
+				require.ErrorIs(err, test.expectedErr)
+			} else {
+				require.Equal(test.expectedSignerType, reflect.TypeOf(signer))
+			}


Do we need this nil handling? I thought the expected type of nil would work here

Are you suggesting

require.ErrorIs(err, test.expectedErr) require.Equal(test.expectedSigner, reflect.TypeOf(signer))

instead of the if? I hadn't thought about it, but I think that'll work

joshua-kim · 2025-02-18T16:35:03Z

config/flags.go

@@ -271,6 +271,7 @@ func addNodeFlags(fs *pflag.FlagSet) {
 	fs.Bool(StakingEphemeralSignerEnabledKey, false, "If true, the node uses an ephemeral staking signer key")
 	fs.String(StakingSignerKeyPathKey, defaultStakingSignerKeyPath, fmt.Sprintf("Path to the signer private key for staking. Ignored if %s is specified", StakingSignerKeyContentKey))
 	fs.String(StakingSignerKeyContentKey, "", "Specifies base64 encoded signer private key for staking")
+	fs.String(StakingRPCSignerKey, "", "Specifies the gRPC endpoint of the staking signer")


Can we be consistent w/ rpc/grpc naming here

joshua-kim · 2025-02-21T20:19:28Z

utils/crypto/bls/signer/rpcsigner/client.go

+	// TODO: figure out the best parameters here given the target block-time
+	opts := grpc.WithConnectParams(grpc.ConnectParams{
+		Backoff: backoff.DefaultConfig,
+	})


Can we let the caller doesn't provide these options? Or was there a reason we chose to not pass this in as a parameter

I was thinking I didn't want different callers to pass in different parameters but rather have something that's more of a global configuration. I don't think that this is something that a node operator should really be messing with (yet).

joshua-kim · 2025-02-21T20:20:09Z

utils/crypto/bls/signer/rpcsigner/client.go

+	// the request to the actual signer instead of relying on tls-credentials
+	conn, err := grpc.NewClient(rpcSignerURL, opts, grpc.WithTransportCredentials(insecure.NewCredentials()))
+	if err != nil {
+		return nil, fmt.Errorf("couldn't create rpc signer client: %w", err)


nit: I generally avoid special characters where possible in logs errors since you usually have to escape them when grepping through logs. Can we make this something like could not or failed to or anything along those lines? Our old code doesn't follow this pattern but I think we should avoid it in new code we write

I think that's a good reason. What about the :?

joshua-kim · 2025-02-21T20:37:09Z

utils/crypto/bls/signer/rpcsigner/client.go

+	var err error
+	defer func() {
+		if err != nil {
+			c.Close()


Another idea I'm wondering is that if we can do better than the current ux of forcing the caller to re-create the client when this errors and manage that within the Client type. What if we started a goroutine in New that would keep trying to re-connect if we closed the connection, and Sign* if we're not connected could just fast-fail and return some "disconnected error"?

So Sign* fast fails if we've closed the connection in the current implementation. You think we should just infinitely keep trying to reconnect? Otherwise, we should just set the backoff parameters accordingly.

Remember that this is meant to run as a sidecar. I think we could enhance it to a point where we're comfortable with people running on a less reliable network too, but I think it's beneficial to go with baby steps here. A good way to enforce that would be to only take in a port as the configuration parameter and add a URL later (always connecting on the loopback IP).

But I just can't imagine what strategy would be used on top of the gprc-client strategy for reconnecting. If we have an error, it's likely that we need some manual intervention.

The good thing right now is that we sign our IP when the node is booting up then re-use that signed IP. I'm not sure how an IP change is handled, but PoP on the IP-message only happens once from what I can tell. If the signer fails, we would stop signing warp messages, which doesn't necessarily mean we need to crash the node, but it's likely that there needs to be manual intervention. Maybe in that case it would make sense that we just keep trying to reconnect periodically in a loop, is that what you're suggesting?

One thing for both of us to keep in mind is that if we have further BLS adoption, this code might be used in ways that we wouldn't expect to use it right now (like signing blocks or something) in which case we will have to provide further protections here. Maybe you're already thinking of that though

joshua-kim · 2025-02-21T20:38:17Z

config/config_test.go

+				v.Set(key, value)
+			}
+
+			signer, err := getStakingSigner(context.Background(), v)


This is testing code not exported by the actual package though (i.e, getStakingSigner is an implementation detail of the package and not part of its public API) ... shouldn't we test this through GetNodeConfig instead?

joshua-kim reviewed Feb 18, 2025

View reviewed changes

richardpringle force-pushed the signers-config-wip branch 2 times, most recently from 75bbe5c to ca2d04c Compare February 19, 2025 23:31

richardpringle marked this pull request as ready for review February 19, 2025 23:33

richardpringle requested a review from StephenButtolph as a code owner February 19, 2025 23:33

richardpringle added 7 commits February 21, 2025 14:16

Add comments to signer-config setup

3cad5e6

Extend siging configuration to include RPC

30834ff

Add default behaviour to switch

3175c0d

Add timeout to signer instantiation

9551683

Make rpc-signer client handle the connection

8742f9e

Fix linter error

df0e4ce

Close the connection on any error

9801754

richardpringle force-pushed the signers-config-wip branch from 0ba7115 to 9801754 Compare February 21, 2025 19:16

joshua-kim reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC-signer configuration #3725

RPC-signer configuration #3725

richardpringle commented Feb 18, 2025

joshua-kim Feb 18, 2025 •

edited

Loading

richardpringle Feb 19, 2025

joshua-kim Feb 18, 2025

joshua-kim Feb 18, 2025

joshua-kim Feb 18, 2025

richardpringle Feb 19, 2025

joshua-kim Feb 18, 2025

richardpringle Feb 19, 2025

richardpringle Feb 19, 2025

richardpringle Feb 19, 2025

joshua-kim Feb 18, 2025

joshua-kim Feb 18, 2025

richardpringle Feb 19, 2025

richardpringle Feb 19, 2025

joshua-kim Feb 18, 2025

richardpringle Feb 19, 2025

joshua-kim Feb 21, 2025

joshua-kim Feb 18, 2025

richardpringle Feb 19, 2025

joshua-kim Feb 18, 2025

joshua-kim Feb 21, 2025

richardpringle Feb 22, 2025

joshua-kim Feb 21, 2025

richardpringle Feb 21, 2025

joshua-kim Feb 21, 2025

richardpringle Feb 21, 2025

richardpringle Feb 22, 2025

joshua-kim Feb 21, 2025


		switch {

		case ephemeralSignerEnabled && !contentKeyExists && !keyPathIsSet && !rpcSignerURLExists:

		signerKeyRawContent := v.GetString(StakingSignerKeyContentKey)
		signerKeyContent, err := base64.StdEncoding.DecodeString(signerKeyRawContent)

RPC-signer configuration #3725

Are you sure you want to change the base?

RPC-signer configuration #3725

Conversation

richardpringle commented Feb 18, 2025

Why this should be merged

How this works

How this was tested

Need to be documented in RELEASES.md?

joshua-kim Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshua-kim Feb 18, 2025 •

edited

Loading