Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SigningConfig proto to have start dates? #474

Open
loosebazooka opened this issue Jan 14, 2025 · 18 comments · May be fixed by #539
Open

SigningConfig proto to have start dates? #474

loosebazooka opened this issue Jan 14, 2025 · 18 comments · May be fixed by #539

Comments

@loosebazooka
Copy link
Member

When rotating keys for rekor (while doing v2 sharding), currently we would require two signing events for root signing

  1. add new key to TrustedRoot for new current rekor
  2. rotate signers to the new rekor in SingingConfig

Root signing is a somewhat expensive process but we don't want the ecosystem to end up in a situation where verifiers slightly behind on time can't verify new signatures.

I think this can be solved by making SigningConfig more flexible to key rotations. Potentially by adding startTimes or some sort of time for when signers should start using it. Signers could go down the list of providers and pick the first one that is valid.

fyi: @jku

@haydentherapper
Copy link
Collaborator

we don't want the ecosystem to end up in a situation where verifiers slightly behind on time can't verify new signatures.

As long as the sharding does not take place until a complete expiration period of the TUF metadata has passed, this shouldn't happen. For PGI, that would mean a) add the new key to TrustedRoot and do a target signing, b) wait a week, c) start the sharding.

With that said, I do like the idea of simplifying sharding to require only one root signing event. I could see us having the same validity windows between the key material in the TrustedRoot and SigningConfig.

Do you think including validity windows at signing time overly complicates signing? How do we handle overlapping windows? How do we handle overlapping windows and private logs concurrently? To give a specific example:

  • rekorv1.sigstore.dev - valid from X to Y
  • rekorv2.sigstore.dev - valid starting at Y-1 month (e.g. rolling out rekorv2 one month before rekorv1 expires)
  • internal.log - valid from X

At time Y-1 month, when we roll out rekorv2, I would want a client publishing to rekorv2 and also internal.log, but not rekorv1.

@jku
Copy link
Member

jku commented Jan 15, 2025

I think this can be solved by making SigningConfig more flexible to key rotations.

technically TrustedRoot + SigningConfig might already have the required data for this, and the only thing strictly needed is the rules for how clients should operate:

  • TrustedRoot tlogs have a baseUrl and publicKey.validFor
  • SigningConfig tlogs have baseUrl
  • We could tell signing clients to use all tlogs listed in SigningConfig but only if the log has a currently valid key in TrustedRoot (with maybe a little buffer time in there)

This btw is a good example of how keeping SigningConfig and TrustedRoot separate is not as great as it sounded in theory...

@kommendorkapten
Copy link
Member

Thanks for calling this out.

I could see us having the same validity windows between the key material in the TrustedRoot and SigningConfig.

My current thinking is that this is the most desirable option. In general this gives a flexible way to plan for a sharding, or change in a service at the time of a signing event, giving clients time to learn about it, and so when the shift happens, there should be no outages, or cases where multiple clients are out of sync.

Note that time ranges (between TrustedRoot and SigningConfig) doesn't necessarily need to be 100% in sync, to account for some leeway time (i.e 30min or so), the old service should be valid for verification a bit longer compared to when the new service was commissioned.

@loosebazooka
Copy link
Member Author

loosebazooka commented Jan 21, 2025

So how's this for a proposal (TimeBoundURL is not a final name)?

message SigningConfig {
    string media_type = 5;
    string ca_url = 1;
    string oidc_url = 2;
-   repeated string tlog_urls = 3;
+   repeated TimeBoundURL tlog_urls = 3;
    repeated string tsa_urls = 4;
}

+ message TimeBoundURL {
+    string url = 1;
+    TimeRange valid_for = 2; // TimeRange is what we use everywhere else in trusted_root
+ }

It not obvious to me that we need to do this for the other types, but we can. Rekor is the only one that will be sharded and url swapped. But we can make all of these repeated TimeBoundURL if we want. And we can rotate and move those around at will. (we can also just two step those with root signing as they wont happen at a regular interval)

@loosebazooka
Copy link
Member Author

Also may need to include information about the protocol. If the signingConfig is updated to include an endpoint that is communicating over protocol version V+1, then maybe clients need to know?

@haydentherapper
Copy link
Collaborator

Did you have thoughts on how to handle overlapping windows like in the example I noted above? Maybe the answer is also including log operator, and a client should only write to the latest log for each operator?

+1 on including protocol version, this will make transitioning between APIs simpler.

@kommendorkapten
Copy link
Member

If we include log operator, we should key the repeated urls on the operator name I think, so simplify the client side processing. I'm assuming the log operator name don't bear any specific meaning in general? Clients who cares ought to know what operators they trust. Or would the behaviour be that the client puts a message on one log from each log operator? This should probably go in to the client spec once we know the details.

My gut feeling is that the client should put a message on one log from each configured log operator as the default mode, but allow for configuration.

@jku
Copy link
Member

jku commented Jan 22, 2025

Did you have thoughts on how to handle overlapping windows like in the example I noted above?

I assume you want this because you want to allow clients to use the v1 log for a time event though we have a v2 available already? At least I can't think of other use cases where this would be useful.

What kind of time frame are we talking about when both would be valid? Would it be problematic if clients just used both logs during that time?

@jku
Copy link
Member

jku commented Jan 22, 2025

Assuming we can avoid clients thinking about operators (as described in previous comment), I suppose this might be enough:

message SigningConfig {
  ... 
-   repeated string tlog_urls = 3;
+   repeated SigningTLog tlogs = 3;
}

+ message SigningTLog {
+    string url = 1;
+    string apiVersion = 2; // hand waving actual values, but something like "v1" and "v2"
+    TimeRange valid_for = 3; // TimeRange is what we use everywhere else in trusted_root
+ }

@haydentherapper
Copy link
Collaborator

@jku, the example I was thinking of is for users that want to concurrently publish to multiple logs by different operators, either for reliability, verification, or trust. This could be for when we have multiple logs in the ecosystem, or for private deployers who want to write to both their internal log and the public log. I don't think this is farfetched - Certificate Transparency requires publishing to multiple logs with distinct operators.

I think we should include operator for signing. We'll need operator for verification as well, as verification policies like "a signature is published to >= 2 logs" should be grouped on log operator, not each log instance.

@jku
Copy link
Member

jku commented Jan 25, 2025

I'm just saying that verifying correct behaviour of a sigstore client is already really difficult. Adding complexity to the policy management multiplies the difficulty for each added policy knob. So let's be really sure this is required

It feels like defining "operators" only works for non-malicious cases like running rekor v1 and v2 at same time during migration -- in other cases who gets to decide when two logs have separate operators?

@haydentherapper
Copy link
Collaborator

haydentherapper commented Jan 25, 2025

Focusing only on this first migration - How do we want to signify to a client that when given a signing config with both Rekor v1 and Rekor v2, the client should always prefer the latter? When we roll out v2, we can't specify an end-date for v1 (initially) since some clients may not yet have support. We shouldn't have clients writing to both. We could use API version, prefer the highest API version, and write to all logs with that version? In this case, operator is not necessary.

For sharding, if the client policy is to write to all active logs, we could keep the overlapping validity window as tiny as possible. Since creating a new shard can happen before we publish the new key (which is not true with v1, since sharding instantly swaps writes over to the new log), the procedure could look like:

  1. Create the new shard with the new key. Manually test against the shard. Clients are not aware of this shard or the new shard key.
  2. Update the SigningConfig and TrustBundle. The SigningConfig will specify that clients should begin to use the new shard 1 week (or whatever the TUF timestamp validity is) from publishing the updated TUF targets, and set an end date for the current shard to be the start of the new shard. The TrustBundle should specify that the key is to be trusted starting at the time of shard creation, since a client not using the SigningConfig might start writing to it. The TrustBundle should also include an end date for the current shard key, at least 1 week out, and give additional time to handle manually freezing the current shard.
  3. Publish TUF targets.
  4. Wait a week. During this week, as clients pick up the updated TUF targets, they'll start trusting the new shard public key, but not yet write to the new shard.
  5. After a week, all clients should begin writing to the new shard. The log operator can then freeze the current shard.

No operator needed! Thoughts?

in other cases who gets to decide when two logs have separate operators?

Whoever distributes the signing config/trust bundle decides. As we bring up more logs in the ecosystem, we should know who operates which logs. Though I think I'm convinced now we can make signing configs work without operators, and operators are only for verification policy.

@loosebazooka
Copy link
Member Author

loosebazooka commented Feb 4, 2025

Update from clients meeting:

Maybe each item should be a message for flexibility and we can add fields later.

message SigningConfig {
    string media_type = 5;
    repeated SigningCA cas = 1;
    repeated SigningOIDC oidc = 2;
    repeated SigningTLog tlogs = 3;
    repeated SigningTSA tsas = 4;
}

Although these may still each just contain the same kind of information (and be redundant)

message SingingX {
    string url = 1;
    string api_version = 2;
    TimeRanger valid_for = 3;
}

For public good, these will have 1 of each (unless we're transitioning), but otherwise the workflow for clients could be:

  1. Pick the first valid item in each repeated entry and use it.
  2. If configuration requires multiple TSAs or Logs, pick the first X and use them.

@haydentherapper
Copy link
Collaborator

Pick the first valid item in each repeated entry and use it.

Defining "valid", what do you think about "Clients pick the entry with the latest API version that the client supports"? Or do you want it to be the responsibility of trust root maintainers to order the SigningX entries?

@loosebazooka
Copy link
Member Author

loosebazooka commented Feb 4, 2025

Yeah I guess that's kind of vague
valid would mean that

  1. the current time is within validity window
  2. client can understand the api version

This can fail for old clients

  1. Gracefully if clients have implemented signing config consumption but have not updated to a new current api version.
  2. Ungracefully if they've hardcoded infra endpoints and/or the api changes

We have to move them to a new version of the client or backport. As adoption grows, the need for backports grows. We might want to strongly define what versions of sigstore clients are supported and have passed conformance (and tuf conformance). Lets also hope we don't have to make breaking changes after rekor v2

@kommendorkapten
Copy link
Member

The API version is listed as a string, do we want to be open ended here? Let the service the define the api version string, and assume clients of that services understands how the api version should be decoded? Can we force this to two integers, one for major and one for minor? Or possibly even a leap number?

@haydentherapper
Copy link
Collaborator

I'm taking a stab at this now, I'll have a PR up shortly.

@kommendorkapten, I agree, I've added the major API version as an int. I'm proposing leaving out minor. minor would be nice to have for signifying new methods (as non-breaking changes), but that's going to complicate the selection logic and how we update the SigningConfig with new versions. If I add a method to Fulcio, will I update the current entry? What happens if a client doesn't yet support that minor version? Clients are now also keeping track of per-minor-version methods.

haydentherapper added a commit to haydentherapper/protobuf-specs that referenced this issue Feb 14, 2025
In order to faciliate clients gracefully handling breaking API
changes, the SigningConfig will now include API versions for each
of the service URLs so that clients can determine what services
they are compatible with. Additionally, we've included validity periods
which will be used to faciliate Rekor log sharding, when we spin up new
log shards and distribute new key material.

Fixes sigstore#474

Signed-off-by: Hayden Blauzvern <[email protected]>
haydentherapper added a commit to haydentherapper/protobuf-specs that referenced this issue Feb 14, 2025
In order to faciliate clients gracefully handling breaking API
changes, the SigningConfig will now include API versions for each
of the service URLs so that clients can determine what services
they are compatible with. Additionally, we've included validity periods
which will be used to faciliate Rekor log sharding, when we spin up new
log shards and distribute new key material.

Fixes sigstore#474

Signed-off-by: Hayden Blauzvern <[email protected]>
@haydentherapper
Copy link
Collaborator

#539

A few things to note:

  • I kept the old url fields because I'm not sure if that will break sigstore-python, which supports ClientTrustConfig.
  • That also means we need a new media type version, which I added.
  • I've included comments for the selection process. For CA and OIDC URLs, the client must select one URL that matches its criteria. For Rekor and TSA URLs, the client must select all URLs that match its criteria. If this is confusing, lemme know and we can think about grouping things differently.
  • Naming is hard because we've already used the abbreviated names I would have used for the pluralized fields....so the names are a bit verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants