-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predefined field constraints #178
Conversation
…into jchadwick/shared-field-rules
We still need to synchronize this to the latest version of protovalidate, but that is held up for now. This can be reviewed/merged earlier if desired; it should pass all relevant conformance tests, except for the boolean rules fixes, because those would require the internal set of proto descriptors to be up-to-date. I'm going to push for reviews now so we can get this closed up as soon as possible. |
OK, it is now synchronized to Protovalidate v0.8.1 and should be ready to go. |
I decided to go over this PR today and made some adjustments. Updated to add an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some fairly complicated stuff, but I think I now understand it all and it seems mostly good.
I think there are some unnecessary uses of extensionRegistry
in a few places, and likely some things we could to make it less error-prone (like maybe the ability to auto-create a correct type registry?).
@@ -70,4 +73,29 @@ static Map<String, Descriptors.FileDescriptor> parseFileDescriptors( | |||
} | |||
return fileDescriptorMap; | |||
} | |||
|
|||
static TypeRegistry createTypeRegistry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this function and the one below it want to be recursive, in case there are ever messages or extensions nested inside other messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one below it probably does, but createTypeRegistry
doesn't: TypeRegistry.add
does its own recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have addressed this for the latter function (as mentioned above it is not needed for TypeRegistry
.)
* | ||
* @return if allowing unknown constraint fields is enabled | ||
*/ | ||
public boolean isAllowUnknownFieldsEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a kind of awkward name. Maybe isAllowingUnknownFields
? Or maybe even just allowUnknownFields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with isAllowingUnknownFields
. Personally I prefer allowUnknownFields
but aside from the fact that this conflicts with the variable name I think it is just going against Java convention to have accessors named that way. Y'know, when in Rome...
* @param typeRegistry the type registry to use | ||
* @return this builder | ||
*/ | ||
public Builder setTypeRegistry(TypeRegistry typeRegistry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really clear on why we need this. Is it for performance? Because when we have a descriptor with options, we also have the full transitive dependency graph for that proto and could use that to build these registries. I guess if we built them on demand, we might end up allocating/building numerous registries and re-processing (re-adding across multiple registry instances) the types from common imports. But we only ever need to do this once per FileDescriptor
, so is a one-time cost.
Maybe the behavior of null
/unset registries might be to do the above, so if you only have to set them as a performance optimization? And we could even provide helpers, similar to what you have in the conformance package, that take a set of file descriptors and crawl them to produce the type and extension registries. That way people don't have to really compute them on their own -- can use the helper to compute them once up-front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually be possible to do this but it's sort of complicated since we're jumping back and forth between user protos and our own. But actually, that's not my main concern. I have really two problems:
TypeRegistry
is immutable. Once it is constructed, it is done. Therefore, incrementally building aTypeRegistry
is always inefficient. We can, of course, build our ownTypeRegistry
which is essentially just aMap
, but that smells. And it gets worse, because,- The behavior of conflicting file descriptors will depend on the order in which you try to validate messages. To me this is just too confusing.
Bear in mind that this situation (unknown extensions) should realistically only be a problem for users that are dealing with dynamic file descriptor sets. In many cases users that are dealing with dynamic descriptor sets might have to deal with multiple different dynamic descriptor sets, especially if they're doing some kind of multi-tenant service. By exposing the TypeRegistry
we make it explicit: The user controls what resolves to what Descriptor
, and you need a new Validator
for a new TypeRegistry
.
I can definitely see a case for us providing better helper functions. However, the helper functions for this would really not be terribly protovalidate-specific and so I opted to keep the helpers in our private code in the conformance test suite, which users could conceivably copy if they wanted our behavior. Users who are dealing with dynamic protocol buffers likely need to build an ExtensionRegistry
and/or a TypeRegistry
anyhow, and probably pass it to other libraries or use it in their own code.
Of course I am open to trying to improve this situation but I opted to do things the dumb way, because it's already pretty confusing to begin with, and adding more magic on top of it seems like it is just not the greatest user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point that we could just provide our own type since we're not actually providing the TypeRegistry
to any API that requires it.
The behavior of conflicting file descriptors will depend on the order in which you try to validate messages. To me this is just too confusing.
If we do it right, there is no such thing as conflicting descriptors. A single descriptor is not valid if it has conflicts within its own transitive dependency graph. So we really want a tree of maps -- each node has a map of the messages in that file, and then it has tree links to its imports. So finding a type could require O(n) time (where n is number of files in transitive graph) instead of O(1), but it wouldn't be subject to conflicts.
Sadly, we can't really do this with extension types since ExtensionRegistry
is a concrete type that is required by APIs we're using.
Note that I wasn't suggesting incrementally building a single shared registry but rather to build a new registry per file descriptor. I understand that could be a perf issue, but since it's only done once per file per cache (and there will likely be a single cache for most programs plus fixed number of files with generated message types) it is likely to be perfectly acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I just realized that our own tests illuminate a case where this won't work.
Consider the following:
ExtensionRegistry registry = ExtensionRegistry.newInstance();
registry.add(isIdent);
TypeRegistry typeRegistry =
TypeRegistry.newBuilder().add(isIdent.getDescriptor().getContainingType()).build();
Config config = Config.newBuilder().setTypeRegistry(typeRegistry).setExtensionRegistry(registry).build();
This obviously works just fine because the TypeRegistry
and the ExtensionRegistry
are synced with each-other. However... if we get rid of the TypeRegistry
, this test can never work!
This is because this test does not actually go across two different Descriptor
types like one might assume. It goes across three, each with their own distinct hashCode()
s.
- The first one is
build.buf.validate.StringRules
. - The second one is
com.example.imports.buf.validate.StringRules
. We have a second copy of the gencode to specifically test this case (dynamic proto of separate gencode.) - The third one is a dynamically-resolved
com.validate.imports.buf.validate.StringRules
- crucially, extensions aren't resolved in nested types. This means e.g.buf.validate.predefined
fields are not resolved inside of field options, which results in a differenthashCode()
everywhere!
Our code already intends to handle this case (dynamic message parsed from potentially distinct file descriptor; we test it for other cases that don't involve user-defined extensions). This puts us in a tricky position, because we really want to reparse the extensions using the extension types, not the types that happen to be attached to the descriptor of a message, which may be dynamic or have a different gencode than whoever registered the extensions.
The only other alternative I can think of to explicitly having the user register a TypeRegistry
is if we enumerate their ExtensionRegistry
to find the types instead, but it seems like it just keeps getting worse the harder you try:
- There's no great way to enumerate all extensions in an
ExtensionRegistry
so what you actually have to do is enumerate the extensions of each possible message that you might want to query extensions for. - There's no guarantee that there's only one
Descriptor
for a given full name stored in theExtensionRegistry
- it will happy return a list with extensions that have many distinctDescriptor
types. We would probably need to throw an exception in this case since there's not much reasonable to do with it.
At this point I'm torn. The test case is indeed a bit convoluted, but it's convoluted on purpose. With the TypeRegistry
it works for fairly obvious reasons. Without it, it doesn't work for reasons that I found very unintuitive.
What would be nice is if we could just have the ExtensionRegistry
key on the full name of the descriptor instead of the descriptor identity, but after looking around for too long I've determined there's nothing we can really do on that front other than things that I'm pretty sure we shouldn't do.
this.evaluatorBuilder = | ||
new EvaluatorBuilder( | ||
env, | ||
config.isDisableLazy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just pass the whole config through, since it seems to need every field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, that's a good idea. I just made the same change for ConstraintCache
as well since there's really no obvious downside to doing this.
if (descriptorMap.containsKey(constraintFieldDesc)) { | ||
return descriptorMap.get(constraintFieldDesc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use get
and then return if the value is non-null, instead of querying the map 2x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point. (I didn't actually change this line, it just got relocated, but there's no sense in keeping it like this, since it's not relying on null
values in the map or anything like that.)
src/main/java/build/buf/protovalidate/internal/constraints/ConstraintCache.java
Show resolved
Hide resolved
@@ -165,11 +172,10 @@ private void buildMessage(Descriptor desc, MessageEvaluator msgEval) | |||
try { | |||
DynamicMessage defaultInstance = | |||
DynamicMessage.newBuilder(desc) | |||
.mergeFrom(new byte[0], EXTENSION_REGISTRY) | |||
.mergeFrom(new byte[0], extensionRegistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this line does nothing. Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Removed.
Descriptors.Descriptor expectedConstraintMessageDescriptor = | ||
typeRegistry.find(expectedConstraintDescriptor.getMessageType().getFullName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this. Why would we need touse the descriptor in a provided type registry instead of the descriptor we already have?
This appears to be the only use of typeRegistry
, so I'm wondering if it's even needed.
Aside: oneofFieldDescriptor
is an awful name. I was so confused reading this, thinking it might be the descriptor for OneofFieldConstraints
. But, IIUC, it really means the descriptor for a field inside the type
oneof on FieldConstraints
, right? Might be nice to rename it... typedConstraintFieldDescriptor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I re-read the PR description and think I understand now: we have to use the provided typeRegistry
in case its version of the message descriptor has a different hash code (so it's queryable from a map inside extensionRegistry
I think).
That is really subtle and I think that explanation deserves to be in the code here, not just in a PR description/commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining it has been added. I agree with you that it definitely can't just live in the PR comment although I put most of the effort into explaining what's going on here over in the setTypeRegistry
documentation.
build.buf.validate.PredefinedConstraints.parseFrom( | ||
((MessageLite) extensionValue).toByteString(), extensionRegistry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PredefinedConstraints
does not have any extensions (or nested extendable message fields), so I think the extensionReigstry
here is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, removed extensionRegistry
here.
this.env = Objects.requireNonNull(env, "env"); | ||
this.constraintCache = Objects.requireNonNull(constraintCache, "constraintCache"); | ||
this.cache = new HashMap<>(previousCache); | ||
this.extensionRegistry = Objects.requireNonNull(extensionRegistry, "extensionRegistry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field appears to be unnecessary. It seems to only be used by calls that parse zero bytes (so never any extensions to actually parse). I assume we do that in order to build a partial message in the off-chance that any message descriptor we have has required fields (in which case the simple case of creating a new empty message would throw).
Note: if I am right and we can remove this, we can also remove from the enclosing type, which only has an extensionRegistry
to pass to this (AFAICT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're actually right. I realized this at some point and never looped back to fix it. (Frankly, I'm rather impressed you picked this one up during a code review. Very nice attention to detail.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I have addressed all of the feedback now. Most of the things I have trivially resolved.
The main sticking point for me is how to handle the TypeRegistry
bits. It would be nice for congruent behavior with other runtimes if we could avoid needing the TypeRegistry
part but I think that it's not trivially possible to avoid it.
@@ -70,4 +73,29 @@ static Map<String, Descriptors.FileDescriptor> parseFileDescriptors( | |||
} | |||
return fileDescriptorMap; | |||
} | |||
|
|||
static TypeRegistry createTypeRegistry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have addressed this for the latter function (as mentioned above it is not needed for TypeRegistry
.)
Descriptors.Descriptor expectedConstraintMessageDescriptor = | ||
typeRegistry.find(expectedConstraintDescriptor.getMessageType().getFullName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining it has been added. I agree with you that it definitely can't just live in the PR comment although I put most of the effort into explaining what's going on here over in the setTypeRegistry
documentation.
/** Config is the configuration for a Validator. */ | ||
public final class Config { | ||
private static final TypeRegistry DEFAULT_TYPE_REGISTRY = TypeRegistry.getEmptyTypeRegistry(); | ||
private static final ExtensionRegistry DEFAULT_EXTENSION_REGISTRY = | ||
ExtensionRegistry.newInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* @return if allowing unknown constraint fields is enabled | ||
*/ | ||
public boolean isAllowUnknownFieldsEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with isAllowingUnknownFields
. Personally I prefer allowUnknownFields
but aside from the fact that this conflicts with the variable name I think it is just going against Java convention to have accessors named that way. Y'know, when in Rome...
this.evaluatorBuilder = | ||
new EvaluatorBuilder( | ||
env, | ||
config.isDisableLazy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, that's a good idea. I just made the same change for ConstraintCache
as well since there's really no obvious downside to doing this.
if (descriptorMap.containsKey(constraintFieldDesc)) { | ||
return descriptorMap.get(constraintFieldDesc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point. (I didn't actually change this line, it just got relocated, but there's no sense in keeping it like this, since it's not relying on null
values in the map or anything like that.)
build.buf.validate.PredefinedConstraints constraints = getFieldConstraints(constraintFieldDesc); | ||
if (constraints == null) return null; | ||
List<Expression> expressions = Expression.fromConstraints(constraints.getCelList()); | ||
List<CelRule> celRules = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, done.
build.buf.validate.PredefinedConstraints.parseFrom( | ||
((MessageLite) extensionValue).toByteString(), extensionRegistry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, removed extensionRegistry
here.
this.env = Objects.requireNonNull(env, "env"); | ||
this.constraintCache = Objects.requireNonNull(constraintCache, "constraintCache"); | ||
this.cache = new HashMap<>(previousCache); | ||
this.extensionRegistry = Objects.requireNonNull(extensionRegistry, "extensionRegistry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're actually right. I realized this at some point and never looped back to fix it. (Frankly, I'm rather impressed you picked this one up during a code review. Very nice attention to detail.)
@@ -165,11 +172,10 @@ private void buildMessage(Descriptor desc, MessageEvaluator msgEval) | |||
try { | |||
DynamicMessage defaultInstance = | |||
DynamicMessage.newBuilder(desc) | |||
.mergeFrom(new byte[0], EXTENSION_REGISTRY) | |||
.mergeFrom(new byte[0], extensionRegistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame about needing users to provide type and extension registries for any custom extensions. But the Protobuf runtime's APIs here for Java really make this hard to do otherwise. @jchadwick-buf and I discussed a couple of alternatives that could make it so users don't have to supply anything extra, but they are all much more complicated to implement and test and possibly brittle/risk in some ways, so it seems like maybe we could merge as-is and noodle on it more later.
So this LGTM. Hopefully 🤞 we can find a solution to not need user-provided registries before a v1.0. But if not, c'est la via.
Yeah, I kept wrestling with it a bit more because it really is a shame that we have to expose this complexity, but at the very least we can now say that we did an extraordinary amount of diligence before just accepting it, so if we missed anything obvious it wasn't for lack of trying. Thanks for the very thorough review. This wound up being a challenging patchset to write and it was great to have a very in-depth review, I'm pretty confident in the shape this is in now at least. |
ExtensionRegistry
and aTypeRegistry
for resolving protobuf messages. Ordinarily, only aTypeRegistry
would necessarily be needed. However, we need to be able to resolve extensions defined in file descriptor sets we don't control, which means we need to be able to reparse to and from the user's descriptors in the worst case: to the user's descriptors to get the extended rule message (whose message type descriptors may have a different hashcode and thus may not resolve using just anExtensionRegistry
alone) and back from the user's descriptors in order to parse thepriv
/shared
field.rule
variable.ExtensionRegistry
and aTypeRegistry
. This enables the conformance runner to pass both the old conformance test suite and the new one, regardless of whether the proto descriptors match up.TODO:
This will depend on bufbuild/protovalidate#246.