Implement predefined field constraints #178

jchadwick-buf · 2024-09-12T14:00:55Z

Adds the ability to specify an ExtensionRegistry and a TypeRegistry for resolving protobuf messages. Ordinarily, only a TypeRegistry would necessarily be needed. However, we need to be able to resolve extensions defined in file descriptor sets we don't control, which means we need to be able to reparse to and from the user's descriptors in the worst case: to the user's descriptors to get the extended rule message (whose message type descriptors may have a different hashcode and thus may not resolve using just an ExtensionRegistry alone) and back from the user's descriptors in order to parse the priv/shared field.
Refactors some of the code around reparsing and extensions in general:
- Reparsing options for protovalidate built-ins will always use a static extension registry.
Adds the rule variable.
- Some refactoring is done around the individual rule compilation, since the code was getting a bit unwieldy.
Updates the conformance runner to generate an ExtensionRegistry and a TypeRegistry. This enables the conformance runner to pass both the old conformance test suite and the new one, regardless of whether the proto descriptors match up.

TODO:

Update to new version of protovalidate protos when they are merged.

This will depend on bufbuild/protovalidate#246.

…into jchadwick/shared-field-rules

jchadwick-buf · 2024-09-19T17:56:45Z

We still need to synchronize this to the latest version of protovalidate, but that is held up for now. This can be reviewed/merged earlier if desired; it should pass all relevant conformance tests, except for the boolean rules fixes, because those would require the internal set of proto descriptors to be up-to-date. I'm going to push for reviews now so we can get this closed up as soon as possible.

jchadwick-buf · 2024-09-20T20:28:14Z

OK, it is now synchronized to Protovalidate v0.8.1 and should be ready to go.

… tests

jchadwick-buf · 2024-09-24T17:08:43Z

I decided to go over this PR today and made some adjustments.

Updated to add an allowUnknownFields option and have unknown fields behavior match other runtimes closer. Added a test that tests predefined fields with dynamic descriptor sets. Made a slight adjustment to TypeRegistry behavior so that it's possible to fall back to the built-in descriptors when appropriate (e.g. when using an ExtensionRegistry with generated code.)

jhump

This is some fairly complicated stuff, but I think I now understand it all and it seems mostly good.

I think there are some unnecessary uses of extensionRegistry in a few places, and likely some things we could to make it less error-prone (like maybe the ability to auto-create a correct type registry?).

jhump · 2024-09-24T19:34:40Z

conformance/src/main/java/build/buf/protovalidate/conformance/FileDescriptorUtil.java

@@ -70,4 +73,29 @@ static Map<String, Descriptors.FileDescriptor> parseFileDescriptors(
    }
    return fileDescriptorMap;
  }
+
+  static TypeRegistry createTypeRegistry(


Seems like this function and the one below it want to be recursive, in case there are ever messages or extensions nested inside other messages.

The one below it probably does, but createTypeRegistry doesn't: TypeRegistry.add does its own recursion.

OK, I have addressed this for the latter function (as mentioned above it is not needed for TypeRegistry.)

jhump · 2024-09-24T19:41:06Z

src/main/java/build/buf/protovalidate/Config.java

+   *
+   * @return if allowing unknown constraint fields is enabled
+   */
+  public boolean isAllowUnknownFieldsEnabled() {


This seems like a kind of awkward name. Maybe isAllowingUnknownFields? Or maybe even just allowUnknownFields?

I went with isAllowingUnknownFields. Personally I prefer allowUnknownFields but aside from the fact that this conflicts with the variable name I think it is just going against Java convention to have accessors named that way. Y'know, when in Rome...

jhump · 2024-09-24T20:40:31Z

src/main/java/build/buf/protovalidate/Config.java

+     * @param typeRegistry the type registry to use
+     * @return this builder
+     */
+    public Builder setTypeRegistry(TypeRegistry typeRegistry) {


I'm not really clear on why we need this. Is it for performance? Because when we have a descriptor with options, we also have the full transitive dependency graph for that proto and could use that to build these registries. I guess if we built them on demand, we might end up allocating/building numerous registries and re-processing (re-adding across multiple registry instances) the types from common imports. But we only ever need to do this once per FileDescriptor, so is a one-time cost.

Maybe the behavior of null/unset registries might be to do the above, so if you only have to set them as a performance optimization? And we could even provide helpers, similar to what you have in the conformance package, that take a set of file descriptors and crawl them to produce the type and extension registries. That way people don't have to really compute them on their own -- can use the helper to compute them once up-front.

It might actually be possible to do this but it's sort of complicated since we're jumping back and forth between user protos and our own. But actually, that's not my main concern. I have really two problems:

TypeRegistry is immutable. Once it is constructed, it is done. Therefore, incrementally building a TypeRegistry is always inefficient. We can, of course, build our own TypeRegistry which is essentially just a Map, but that smells. And it gets worse, because,

The behavior of conflicting file descriptors will depend on the order in which you try to validate messages. To me this is just too confusing.

Bear in mind that this situation (unknown extensions) should realistically only be a problem for users that are dealing with dynamic file descriptor sets. In many cases users that are dealing with dynamic descriptor sets might have to deal with multiple different dynamic descriptor sets, especially if they're doing some kind of multi-tenant service. By exposing the TypeRegistry we make it explicit: The user controls what resolves to what Descriptor, and you need a new Validator for a new TypeRegistry.

I can definitely see a case for us providing better helper functions. However, the helper functions for this would really not be terribly protovalidate-specific and so I opted to keep the helpers in our private code in the conformance test suite, which users could conceivably copy if they wanted our behavior. Users who are dealing with dynamic protocol buffers likely need to build an ExtensionRegistry and/or a TypeRegistry anyhow, and probably pass it to other libraries or use it in their own code.

Of course I am open to trying to improve this situation but I opted to do things the dumb way, because it's already pretty confusing to begin with, and adding more magic on top of it seems like it is just not the greatest user experience.

It's a good point that we could just provide our own type since we're not actually providing the TypeRegistry to any API that requires it.

The behavior of conflicting file descriptors will depend on the order in which you try to validate messages. To me this is just too confusing.

If we do it right, there is no such thing as conflicting descriptors. A single descriptor is not valid if it has conflicts within its own transitive dependency graph. So we really want a tree of maps -- each node has a map of the messages in that file, and then it has tree links to its imports. So finding a type could require O(n) time (where n is number of files in transitive graph) instead of O(1), but it wouldn't be subject to conflicts.

Sadly, we can't really do this with extension types since ExtensionRegistry is a concrete type that is required by APIs we're using.

Note that I wasn't suggesting incrementally building a single shared registry but rather to build a new registry per file descriptor. I understand that could be a perf issue, but since it's only done once per file per cache (and there will likely be a single cache for most programs plus fixed number of files with generated message types) it is likely to be perfectly acceptable.

Unfortunately, I just realized that our own tests illuminate a case where this won't work.

Consider the following:

ExtensionRegistry registry = ExtensionRegistry.newInstance(); registry.add(isIdent); TypeRegistry typeRegistry = TypeRegistry.newBuilder().add(isIdent.getDescriptor().getContainingType()).build(); Config config = Config.newBuilder().setTypeRegistry(typeRegistry).setExtensionRegistry(registry).build();

This obviously works just fine because the TypeRegistry and the ExtensionRegistry are synced with each-other. However... if we get rid of the TypeRegistry, this test can never work!

This is because this test does not actually go across two different Descriptor types like one might assume. It goes across three, each with their own distinct hashCode()s.

The first one is build.buf.validate.StringRules.

The second one is com.example.imports.buf.validate.StringRules. We have a second copy of the gencode to specifically test this case (dynamic proto of separate gencode.)

The third one is a dynamically-resolved com.validate.imports.buf.validate.StringRules - crucially, extensions aren't resolved in nested types. This means e.g. buf.validate.predefined fields are not resolved inside of field options, which results in a different hashCode() everywhere!

Our code already intends to handle this case (dynamic message parsed from potentially distinct file descriptor; we test it for other cases that don't involve user-defined extensions). This puts us in a tricky position, because we really want to reparse the extensions using the extension types, not the types that happen to be attached to the descriptor of a message, which may be dynamic or have a different gencode than whoever registered the extensions.

The only other alternative I can think of to explicitly having the user register a TypeRegistry is if we enumerate their ExtensionRegistry to find the types instead, but it seems like it just keeps getting worse the harder you try:

There's no great way to enumerate all extensions in an ExtensionRegistry so what you actually have to do is enumerate the extensions of each possible message that you might want to query extensions for.

There's no guarantee that there's only one Descriptor for a given full name stored in the ExtensionRegistry - it will happy return a list with extensions that have many distinct Descriptor types. We would probably need to throw an exception in this case since there's not much reasonable to do with it.

At this point I'm torn. The test case is indeed a bit convoluted, but it's convoluted on purpose. With the TypeRegistry it works for fairly obvious reasons. Without it, it doesn't work for reasons that I found very unintuitive.

What would be nice is if we could just have the ExtensionRegistry key on the full name of the descriptor instead of the descriptor identity, but after looking around for too long I've determined there's nothing we can really do on that front other than things that I'm pretty sure we shouldn't do.

jhump · 2024-09-24T20:41:19Z

src/main/java/build/buf/protovalidate/Validator.java

+    this.evaluatorBuilder =
+        new EvaluatorBuilder(
+            env,
+            config.isDisableLazy(),


Why not just pass the whole config through, since it seems to need every field?

Hey, that's a good idea. I just made the same change for ConstraintCache as well since there's really no obvious downside to doing this.

jhump · 2024-09-24T20:47:19Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+    if (descriptorMap.containsKey(constraintFieldDesc)) {
+      return descriptorMap.get(constraintFieldDesc);
+    }


Just use get and then return if the value is non-null, instead of querying the map 2x.

Oops, good point. (I didn't actually change this line, it just got relocated, but there's no sense in keeping it like this, since it's not relying on null values in the map or anything like that.)

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

jhump · 2024-09-25T15:16:18Z

src/main/java/build/buf/protovalidate/internal/evaluator/EvaluatorBuilder.java

@@ -165,11 +172,10 @@ private void buildMessage(Descriptor desc, MessageEvaluator msgEval)
      try {
        DynamicMessage defaultInstance =
            DynamicMessage.newBuilder(desc)
-                .mergeFrom(new byte[0], EXTENSION_REGISTRY)
+                .mergeFrom(new byte[0], extensionRegistry)


Pretty sure this line does nothing. Remove?

Yep. Removed.

jhump · 2024-09-25T15:31:22Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+      Descriptors.Descriptor expectedConstraintMessageDescriptor =
+          typeRegistry.find(expectedConstraintDescriptor.getMessageType().getFullName());


I don't understand the purpose of this. Why would we need touse the descriptor in a provided type registry instead of the descriptor we already have?

This appears to be the only use of typeRegistry, so I'm wondering if it's even needed.

Aside: oneofFieldDescriptor is an awful name. I was so confused reading this, thinking it might be the descriptor for OneofFieldConstraints. But, IIUC, it really means the descriptor for a field inside the type oneof on FieldConstraints, right? Might be nice to rename it... typedConstraintFieldDescriptor?

Okay, I re-read the PR description and think I understand now: we have to use the provided typeRegistry in case its version of the message descriptor has a different hash code (so it's queryable from a map inside extensionRegistry I think).

That is really subtle and I think that explanation deserves to be in the code here, not just in a PR description/commit message.

A comment explaining it has been added. I agree with you that it definitely can't just live in the PR comment although I put most of the effort into explaining what's going on here over in the setTypeRegistry documentation.

jhump · 2024-09-25T15:33:46Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+            build.buf.validate.PredefinedConstraints.parseFrom(
+                ((MessageLite) extensionValue).toByteString(), extensionRegistry);


PredefinedConstraints does not have any extensions (or nested extendable message fields), so I think the extensionReigstry here is unnecessary.

Good point, removed extensionRegistry here.

jhump · 2024-09-25T15:40:13Z

src/main/java/build/buf/protovalidate/internal/evaluator/EvaluatorBuilder.java

      this.env = Objects.requireNonNull(env, "env");
      this.constraintCache = Objects.requireNonNull(constraintCache, "constraintCache");
      this.cache = new HashMap<>(previousCache);
+      this.extensionRegistry = Objects.requireNonNull(extensionRegistry, "extensionRegistry");


This field appears to be unnecessary. It seems to only be used by calls that parse zero bytes (so never any extensions to actually parse). I assume we do that in order to build a partial message in the off-chance that any message descriptor we have has required fields (in which case the simple case of creating a new empty message would throw).

Note: if I am right and we can remove this, we can also remove from the enclosing type, which only has an extensionRegistry to pass to this (AFAICT).

Yeah, you're actually right. I realized this at some point and never looped back to fix it. (Frankly, I'm rather impressed you picked this one up during a code review. Very nice attention to detail.)

jchadwick-buf

OK, I think I have addressed all of the feedback now. Most of the things I have trivially resolved.

The main sticking point for me is how to handle the TypeRegistry bits. It would be nice for congruent behavior with other runtimes if we could avoid needing the TypeRegistry part but I think that it's not trivially possible to avoid it.

jchadwick-buf · 2024-09-25T17:32:59Z

conformance/src/main/java/build/buf/protovalidate/conformance/FileDescriptorUtil.java

@@ -70,4 +73,29 @@ static Map<String, Descriptors.FileDescriptor> parseFileDescriptors(
    }
    return fileDescriptorMap;
  }
+
+  static TypeRegistry createTypeRegistry(


OK, I have addressed this for the latter function (as mentioned above it is not needed for TypeRegistry.)

jchadwick-buf · 2024-09-25T17:34:41Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+      Descriptors.Descriptor expectedConstraintMessageDescriptor =
+          typeRegistry.find(expectedConstraintDescriptor.getMessageType().getFullName());


A comment explaining it has been added. I agree with you that it definitely can't just live in the PR comment although I put most of the effort into explaining what's going on here over in the setTypeRegistry documentation.

jchadwick-buf · 2024-09-25T17:36:03Z

src/main/java/build/buf/protovalidate/Config.java

 /** Config is the configuration for a Validator. */
 public final class Config {
+  private static final TypeRegistry DEFAULT_TYPE_REGISTRY = TypeRegistry.getEmptyTypeRegistry();
+  private static final ExtensionRegistry DEFAULT_EXTENSION_REGISTRY =
+      ExtensionRegistry.newInstance();


jchadwick-buf · 2024-09-25T17:37:22Z

src/main/java/build/buf/protovalidate/Config.java

+   *
+   * @return if allowing unknown constraint fields is enabled
+   */
+  public boolean isAllowUnknownFieldsEnabled() {


I went with isAllowingUnknownFields. Personally I prefer allowUnknownFields but aside from the fact that this conflicts with the variable name I think it is just going against Java convention to have accessors named that way. Y'know, when in Rome...

jchadwick-buf · 2024-09-25T17:38:01Z

src/main/java/build/buf/protovalidate/Validator.java

+    this.evaluatorBuilder =
+        new EvaluatorBuilder(
+            env,
+            config.isDisableLazy(),


Hey, that's a good idea. I just made the same change for ConstraintCache as well since there's really no obvious downside to doing this.

jchadwick-buf · 2024-09-25T17:39:41Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+    if (descriptorMap.containsKey(constraintFieldDesc)) {
+      return descriptorMap.get(constraintFieldDesc);
+    }


Oops, good point. (I didn't actually change this line, it just got relocated, but there's no sense in keeping it like this, since it's not relying on null values in the map or anything like that.)

jchadwick-buf · 2024-09-25T17:40:06Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+    build.buf.validate.PredefinedConstraints constraints = getFieldConstraints(constraintFieldDesc);
+    if (constraints == null) return null;
+    List<Expression> expressions = Expression.fromConstraints(constraints.getCelList());
+    List<CelRule> celRules = new ArrayList<>();


Great idea, done.

jchadwick-buf · 2024-09-25T17:40:21Z

src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java

+            build.buf.validate.PredefinedConstraints.parseFrom(
+                ((MessageLite) extensionValue).toByteString(), extensionRegistry);


Good point, removed extensionRegistry here.

jchadwick-buf · 2024-09-25T17:42:17Z

src/main/java/build/buf/protovalidate/internal/evaluator/EvaluatorBuilder.java

      this.env = Objects.requireNonNull(env, "env");
      this.constraintCache = Objects.requireNonNull(constraintCache, "constraintCache");
      this.cache = new HashMap<>(previousCache);
+      this.extensionRegistry = Objects.requireNonNull(extensionRegistry, "extensionRegistry");


Yeah, you're actually right. I realized this at some point and never looped back to fix it. (Frankly, I'm rather impressed you picked this one up during a code review. Very nice attention to detail.)

jchadwick-buf · 2024-09-25T17:42:33Z

src/main/java/build/buf/protovalidate/internal/evaluator/EvaluatorBuilder.java

@@ -165,11 +172,10 @@ private void buildMessage(Descriptor desc, MessageEvaluator msgEval)
      try {
        DynamicMessage defaultInstance =
            DynamicMessage.newBuilder(desc)
-                .mergeFrom(new byte[0], EXTENSION_REGISTRY)
+                .mergeFrom(new byte[0], extensionRegistry)


Yep. Removed.

jhump

It's a shame about needing users to provide type and extension registries for any custom extensions. But the Protobuf runtime's APIs here for Java really make this hard to do otherwise. @jchadwick-buf and I discussed a couple of alternatives that could make it so users don't have to supply anything extra, but they are all much more complicated to implement and test and possibly brittle/risk in some ways, so it seems like maybe we could merge as-is and noodle on it more later.

So this LGTM. Hopefully 🤞 we can find a solution to not need user-provided registries before a v1.0. But if not, c'est la via.

jchadwick-buf · 2024-09-26T17:28:29Z

Yeah, I kept wrestling with it a bit more because it really is a shame that we have to expose this complexity, but at the very least we can now say that we did an extraordinary amount of diligence before just accepting it, so if we missed anything obvious it wasn't for lack of trying.

Thanks for the very thorough review. This wound up being a challenging patchset to write and it was great to have a very in-depth review, I'm pretty confident in the shape this is in now at least.

jchadwick-buf added 4 commits September 6, 2024 08:08

Implement shared field rules

ea94ba1

Fix things spotlessApply borked

793ae90

Clean up extension registry bits

0348196

Fix javadoc errors

55b6de6

jchadwick-buf marked this pull request as draft September 12, 2024 14:14

Merge branch 'main' of https://github.com/bufbuild/protovalidate-java …

79f1955

…into jchadwick/shared-field-rules

jchadwick-buf marked this pull request as ready for review September 19, 2024 17:51

jchadwick-buf requested a review from pkwarren September 19, 2024 17:57

Update to Protovalidate v0.8.1

8d59611

jchadwick-buf removed the request for review from pkwarren September 24, 2024 14:15

Documentation, allowUnknownFields, better TypeRegistry behavior, more…

6d47d34

… tests

jchadwick-buf requested a review from jhump September 24, 2024 17:19

jhump reviewed Sep 25, 2024

View reviewed changes

Address review feedback

49bf22b

jchadwick-buf commented Sep 25, 2024

View reviewed changes

jhump approved these changes Sep 26, 2024

View reviewed changes

jchadwick-buf merged commit 7345e69 into main Sep 26, 2024
4 checks passed

jchadwick-buf deleted the jchadwick/shared-field-rules branch September 26, 2024 17:46

		Descriptors.Descriptor expectedConstraintMessageDescriptor =
		typeRegistry.find(expectedConstraintDescriptor.getMessageType().getFullName());

		build.buf.validate.PredefinedConstraints.parseFrom(
		((MessageLite) extensionValue).toByteString(), extensionRegistry);

Implement predefined field constraints #178

Implement predefined field constraints #178

Conversation

jchadwick-buf commented Sep 12, 2024 • edited Loading

jchadwick-buf commented Sep 19, 2024

jchadwick-buf commented Sep 20, 2024

jchadwick-buf commented Sep 24, 2024

jhump left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchadwick-buf Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchadwick-buf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhump left a comment

Choose a reason for hiding this comment

jchadwick-buf commented Sep 26, 2024

jchadwick-buf commented Sep 12, 2024 •

edited

Loading

jchadwick-buf Sep 25, 2024 •

edited

Loading