-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enumerations #43
Comments
I believe there was a proposal elsewhere to add a new annotation-only keyword that would sit adjacent to
That way we wouldn't have to change the |
I think the problem with That said, I wouldn't be opposed to adding a string restriction to |
I think JSON Schema's |
Can you expand on this? I'm leaning more toward just not using |
As a similar but simpler case, consider I'm suggesting the same kind of thing for
I'd very much like to avoid people having to define things differently when in a codegen context than when in a validation context. Ideally, people should be able to use the same schemas for both. If we need to provide more context for code generation it should be in the form of annotations that provide additional context to a standard schema.
I think about this differently. JSON Schema describes JSON. JSON doesn't have a concept of enums, so I don't expect enums to necessary be generated by a code generator. It would be fine if it fit, but it doesn't. For example, it would be fine to generate a However, I can see how that breaks down if the scope of this project is to also support generating schemas from types. Is that in scope here? Not having a way to represent enums would mean you couldn't generate a schema from a class that uses an enum. |
After many years, of messing around with code-gen tools and writing one myself; I found two things that I would definitely love to have:
|
@jdesrosiers you're still coming at this from a JSON-Schema-first approach: trying to fit concepts that exist in JSON Schema into a programming language. What we need to do is the opposite: try to find how to represent known programming language concepts in JSON Schema. Then this vocabulary only needs to support that subset of JSON Schema. The problem with JSON-Schema-first is that we already know there is a multitude of things that JSON Schema can represent that don't make sense in programming languages. Trying to invent schema constructs and then jamming them into a programming language is wrong. Start with the languages. Enumerations are a thing that just about every language does. We need to support that. The question is how we support it.
While this is true, it hasn't stopped people and entire specifications (*cough* OpenAPI) from using it as model definition and code generation. The entire purpose of this vocabulary is to fill that gap. Stating that this isn't what JSON Schema is designed for doesn't help.
Yes, I expect this vocab would define (or at least inform) the interface between JSON Schema and languages, both ways. I would love to see round-trip functionality where you start with a schema or a type, generate the other, then generate back to get the original.
As proposed, |
I'm still unconvinced that what you're describing is the right way to approach this. I am very strongly against defining a subset of JSON Schema, or worse a alternative dialect. I think people should be able to use the same schemas for codegen as the do for validation and they should not be limited in what validation features they can use because they also want to use the schema for codegen. I think the result of this effort should be a vocabulary of annotations that sits on top of full-featured JSON Schema where the annotations inform the codegen process.
I'm very much not arguing for jamming JSON Schema concepts into static type system features. JSON Schema does a lot of things that don't fit in a static type system. That's ok. I expect generated types to include only the things a type system can express. I don't expect the parts that don't fit to be jerry-rigged in somehow. As an example, by default, I'd expect a code generator that encounters an However, sometimes we do intend an inheritance-like relationship with an |
I would recommend messing around with the most popular OpenAPI code-gen tool (I did that 😄) to realize that they are creating extensions to compensate for the shortcomings of the JSON Schema spec. Honestly, @jdesrosiers responses are going over my head. Still, there is a need for identities and improved docs to leverage the JSON Schemas as the source of truth or code-gen properly to most programming languages out there. The same as having some sort of |
I've actually just this week reached out to a couple to invite them to this conversation.
We need another issue to discuss this. I'll open one. |
In our company's swagger tool, we may use {
"type": "number",
"enum": [1, 2, 3, 4],
"x-enumeration": [
{ "name": "HEARTS", "data": 1, "description": "♥" },
{ "name": "DIAMONDS", "data": 2, "description": "◇" },
{ "name": "CLUBS", "data": 3, "description": "♧" },
{ "name": "SPADES", "data": 4, "description": "♤" }
]
} |
@xiaoxiangmoe thanks for the info. One of the decisions we've recently made is how ad-hoc annotation-only keywords are handled. The new spec will allow any unknown keyword that starts with Since we're building a vocabulary here, we can't use That said, using the data content or something similar is likely what we'll go for. The difficulty is trying to align with what languages support so that we can go back and forth between languages and schemas. Imagine generating a schema from Typescript, then trying to generate C code from the schema. |
Related: json-schema-org/json-schema-spec#1386 (deprecating individual enum values) |
For my uses of JSON Schema, I don't use
Although I would prefer using I think JSON Schema already has good descriptive patterns for enums, if you explicitly enforce not actually using the |
https://github.com/Crell/enum-comparison
System.Drawing.SystemColors
, also flag support allows bitwise operationsThe link above has a good summary, grouping these into three categories.
I think for JSON Schema, the primary takeaway is that they are all lists of values. Some languages allow more nuanced and powerful behaviors, but JSON Schema is more concerned with the data aspect than anything. As such, I think the collection of names is the important part here, which all support.
The
enum
keyword could work, but it may not be sufficient if underlying values are desired. For example, in C#, an enum can support bitwise operations, but to enable that, it needs to generate a[Flags]
attribute and set all of the underlying integer values to powers of 2. Then it can also create named bitwise combinations. If just using a list of names, there's no way to describe this intent for proper code generation.The "descriptive enum" approach using the
anyOf
keyword could work for this because we're just defining names and annotations for those names. However, the subschemas would be required to be uniform, and we'd probably still need another keyword to tell the codegen engine that we're defining an enum.I recommend a new keyword (e.g.
enumeration
) to support this. It's still an array, but the items must either all bename
anddata
properties which give more explicit information for more complex supportThe second case becomes more complicated because of the different support among languages (even just the ones surveyed) for underlying data. Most support integer values, but not all, while some only support integer values. Some support more complex underlying data, while others don't support any underlying data.
I think the only resolution to this is that the schema can provide support for more complex needs, and those languages that don't support it can do what they deem appropriate, most likely just creating a list of names.
I also recommend the best practice of generating an "unknown" or "unset" enum value as the default.
In a validation context, the new
enumeration
keyword validates that the instance is either the string value of the item or the string value of thename
of the item, whichever is defined.Serialization
Another aspect to consider for enumerations is ensuring how things are serialized.
In C# circles there is often a debate as to whether an enum should be serialized by name or by the underlying integer. Historically, by integer is the default, which inevitably leads to someone adding a value in the middle of an enum, thereby changing the numbering for all the values that come after and screwing up deserialization of previously-serialized data.
The proposed solution to this is serializing by name, but that comes with its own risks, like name changes. Once a name is serialized somewhere, you pretty much need to support deserializing that name. As a result, spelling and other errors are forever persisted.
Do we want to provide guidance on this topic since we're effectively using schemas to define the serialized format?
The text was updated successfully, but these errors were encountered: