Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation #8

Open
norpan opened this issue Apr 5, 2017 · 8 comments
Open

validation #8

norpan opened this issue Apr 5, 2017 · 8 comments

Comments

@norpan
Copy link
Contributor

norpan commented Apr 5, 2017

I've started on a validator and incorporating the tests from https://github.com/json-schema-org/JSON-Schema-Test-Suite which is probably a good source of tests.

However I realized when using these tests that they have schemas like this:

{"items": [{"type": "integer"}]}

That is, it's clearly a "type": "array" schema, but the type is not written out. Which is a perfectly valid schema, because each validation keyword stands on it's own.

Here is another valid schema:

{"enum": [6, "foo", [], true, {"foo": 12}]}

So, how to handle these?

@jwoudenberg
Copy link
Contributor

Hey norpan!

With regards to the first schema: if is clear what should happen, that seems to me a challenge to make it happen in the decoder :). Can we add a fallback to the type decoder to check the existence of certain keywords if the type property is not set? Perhaps oneOf can be of help here.

With regards to the mixed enum case. Although a valid schema, I think a well designed API should have no need for this particular feature. So personally, I'm happy with dropping this one into the fallback Custom Value category we hypothesised about earlier, to prevent us from making the general API suffer for a less important use case. What are your thoughts?

@norpan
Copy link
Contributor Author

norpan commented Apr 6, 2017

I'm thinking maybe it can be acommodated, so that you can't create these schemas with the constructor functions, but you can decode them. Otherwise decoding will fail on valid schemas, or they will decode into a general type based on an arbitrary restriction in the data type. Of course, if you want to do #3 with any JSON schema you're out of luck then.

The main reason to handle these is that they are sort of the official test suite for JSON schema so it makes sense to be able to decode and validate them.

@jwoudenberg
Copy link
Contributor

Yeah, it would be nice if decoding worked for those too. You think we can decode them into the fallback schema case? So Schema = ... | Custom Value branch which can contain any JSON value?

@norpan
Copy link
Contributor Author

norpan commented Apr 7, 2017

Yes, of course we can decode into Custom Value, the problem is that something can be decoded either there or into a specific constructor based a somewhat arbitrary heuristic, and we still have to handle the Custom Value case in validation and encoding etc, so the question is what the need for the other constructors are.

If you can limit the construction of the generic Schema type, then does it really matter that all validation keywords are present at the same time?

I'm thinking Schema could be a type like this:

type Schema
    = Schema
        { typ : List Type
        , title : Maybe String
        , description : Maybe String
        , enum : List Value
        , properties : List ObjectProperty
        , items : Maybe Schema
        , minItems : Maybe Int
        , maxItems : Maybe Int
        , minimum : Maybe Float
        , maximum : Maybe Float
        , minLength : Maybe Int
        , maxLength : Maybe Int
        , pattern : Maybe String
        , format : Maybe StringFormat
        , oneOf : List Schema
        , anyOf : List Schema
        , allOf : List Schema
        , custom : Dict String Value -- all other, unknown, keys
        }
type Type
    = Object
    | Array
    | String
    | Integer
    | Number
    | Boolean
    | Null

Each validation keyword stands on it's own, pattern only validates if the type is a string and the pattern matches the string, for instance. Actually, that was incorrect, pattern validates if it's not a string too, crazy

@jwoudenberg
Copy link
Contributor

Hey norpan, I appreciate your thoughts here, but this feels like the wrong trade-off to me. I think the big advantage of building this in Elm is having the type system enforce our schema's to make sense. By pulling every keywords on it's own even though a lot of combinations of keywords don't make sense, we lose most that advantage. The only thing we win is making it easier to express some schema's that probably shouldn't be written in the first place. That feels like a bad deal to me.

I think the encoding use case actually does work with the Custom Value option. It's a matter of taking that Value and adding it back to the encoding structure.

I agree implementing a validator working on a Value seems like a loosing proposition. Personally, I don't feel strongly enough about supporting validation of these use cases, although I can see being able to use the entire testsuite would be nice. One possibility I see is to turn these incompatible schema's into compatible ones during the decoding process. In the case of the mixed enum example for instance, that could also be expressed by a anyOf with multiple branches, where each branch contains an enum with instances of one type. But here too, I wonder if investing this much time for something people probably shouldn't be doing in the first place is worth the effort.

What are your thoughts here? Are these corner cases important for your project or are we talking solely about being able to pass the full suite? If it's the latter, would it be acceptable to you to fail some of those cases and keep a doc explaining why we made the explicit decision not to support those use cases?

@norpan
Copy link
Contributor Author

norpan commented Apr 7, 2017

Well yes, but the type system only helps insofar as it limits the schemas you can create with the functions provided, and that limit can be imposed anyway even with the more generic data type. It doesn't limit the decoder (since it can use the Custom Value) and it doesn't limit the encoder since the encoder only encodes the datatype, it does not create it.

It's like other libraries, for instance elm-css or the new bootstrap, where you have a target that accepts a lot (Html) but want to limit that.

My use case is being able to use schemas from the outside world Elm, creating forms and validating data. It's probably ok to limit to the use cases allowed by the current data type, but I don't like the idea of failing decoding of valid schemas. And I don't like the idea of rewriting that enum case to use anyOf. I'll have to think about how to handle it.

So, the only reasonable options I see is to either change the data type to be more general, or fail schemas that don't (more or less) exactly match the allowed schemas. But I'm guessing this will lead to some other people making a more general JSON schema package. Not that that's necessarily a bad thing.

I guess from your use case perspective, having a limited data type is best because that will allow you to generate Elm code that can be manipulated as per #3. So I'll think about this. Maybe I will have to make my own version or maybe using this data type will be sufficient.

@jwoudenberg
Copy link
Contributor

Well yes, but the type system only helps insofar as it limits the schemas you can create with the functions provided, and that limit can be imposed anyway even with the more generic data type.

Are you sure? How would you set up something like the minLength function to only work in combination with a string Schema while the title function could work with all schema's? Currently using minLength on a non-string schema will result in a compilation error.

The way I see it could be done is to change the API to use simple functions which simply take all the possible properties as arguments, like textSchema : Maybe Int -> Maybe Int -> .... -> Schema. That wouldn't be considerably less nice to work with though.

It's like other libraries, for instance elm-css or the new bootstrap, where you have a target that accepts a lot (Html) but want to limit that.

Elm-css also limits pretty strictly which combination of properties are allowed. If you use a color property, that can only be followed with colors or you get a compilation error. This means that sometimes part of the spec is not supported and that some valid CSS constructs (like a random list of background properties) are not supported in elm-css. On the flip side of course, the css generated using elm-css has a much larger likelihood of being correct.

My use case is being able to use schemas from the outside world Elm, creating forms and validating data. It's probably ok to limit to the use cases allowed by the current data type, but I don't like the idea of failing decoding of valid schemas. And I don't like the idea of rewriting that enum case to use anyOf. I'll have to think about how to handle it.

I completely get that. I'm sorry I don't have a better solution here. Just to be clear, I'm not against making model changes in general, but I do feel it should be weighed against the value of the extra functionality, which I believe we both agree in this case is relatively low. Good luck figuring out the next step!

@norpan
Copy link
Contributor Author

norpan commented Apr 8, 2017

After some thinking I've come to the conclusion that the current limited data type is actually good enough for my purposes, so that's good.

I've also realized that it's not a proper operation to add a "type": "array" to

{ "items": [ { "type": "integer" } ] }

They are not the same thing.

So, failing the decoder is probably the right choice here. The error message could be better though. I'll think about how to improve that.

Really appreciate your taking the time to discuss this!

@norpan norpan mentioned this issue Apr 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants