-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity around array inputs? #373
Comments
I think they are, yes.
IMO, it means that your process is expecting at least one array with maximum 2 string items and support also, for this input, 2 arrays of maximum 2 string. |
If the first two examples are equivalent, which would make it easy to "translate" between to variants easily, then I'd assume that my third example is equivalent to: inputs:
example:
schema:
type: array
maxItems: 2
items:
type: array
maxItems: 2
items:
type: string But that would not match your description of it. |
No they are not equivalent ... using @m-mohr example... The first schema can result in an input like this:
The second schema can result in an input like this:
The "schema" object defines the schema of a single instance of an input value. If the single instance happens to be an array then so be it. The Both the input and output need more explanation and I have it on my todo list to update the specification which I am hoping to do before the code sprint. |
If I recall properly, when there is no So as you did not have the
Also the following is equivalent, in my understanding, to your second example (maxOccurs).
Sorry for saying at the first place that they were equivalent. It seems they are not completely. |
@pvretano How can the second schema (I just added numbers above, to be very explicit here) lead to an array of arrays? The schema is |
@m-mohr as I said, the |
@pvretano Sorry, but I don't get it: How can the following schema:
allow the following input?
Where does the inner array come from? I really don't understand it. |
@m-mohr sorry I may have gotten the schema numbers mixed up ... let me try again. This schema ...
leads to inputs like this:
This schema ...
leads to inputs like this:
So these two are equivalent. But this schema:
leads to inputs like this:
Does this help? |
Yes, this is what I expected, thanks. I'm not sure why OAP deviated from JSON Schema and added min/maxOccurs instead of just using the min/maxItems, but if I can just translate from min/maxOccurs to minItems/maxItems with an array type wrapper in JSON Schema, I guess it works for me. So if I spot schema 2, I'll just translate to schema 1 internally. |
For this schema:
There is still something missing, in case the example input takes only one value, it is not passed as an array. So we should add a To be complete, we also need to add the json object that can be used to pass a reference for an input. |
@gfenoy I could be wrong but I don't think that is correct. Then single instance of the input is defined as an array so it will always be an array. If you specify 1 value that you still need to use an array...
This is why I need to update the specification to clarify all this ... we have the schema in the specification but very little discussion about what they imply in relation to encoding an execute request. I hope the others, @jerstlouis @fmigneault, etc. chime in so we can get concensus about this before I start writing. This issue is also related to the email that I sent to @gfenoy and other about this question. Similar issues arrise with the outputs too. I'm working on a PR to try an clarify all of this in the specification that should be ready soon but I would appreciate the input of other too so that I capture the consensus position. |
Playing devil's advocate here, but why not just ditch min/maxOccurs and purely rely on JSON Schema? |
@m-mohr not opposed to that but lets see what the others say. For some reason though, there is somthing in the back of my mind that says we did this for a reason but I can't recall why. I will have to dig into my notes again. |
@pvretano I would be in favour of this move personally. Are these comments #168 (comment), opengeospatial/ogcapi-routes#17 (comment) related? |
That is not entirely true, because if using
When we discussed this originally for 1.0, it was already a huge step to adopt JSON schema at all (see #122, previously, it used a completely different set of properties to describe inputs and outputs) and the impression at the time was that dropping the separate minOccurs/maxOccurs was a step too far that would complicate things for clients. In hindsight, it probably makes things easier. We were recently discussing this in #363. My proposal for a 1.1 version was to deprecate minOccurs / maxOccurs and encourage use of a schema array type with minItems / maxItems for inputs with multiplicity instead (with the default minOccurs = 1 / maxOccurs = 1 when it is not specified). |
I would love to have this explicitly detailed in the specification. Because there was a lot of confusion in the past (as this thread shows) regarding input cardinality vs "single value array", CRIM's implementation evolved into trying to auto-detect Using exclusively schema:
oneOf:
- <original-schema>
- type: array
items: <original-schema>
minItems: 1
maxItems: 1 And processes using schema:
type: array
items: <original-schema>
minItems: 2 We need to consider very carefully how to handle cases of multiple nested arrays, so there is no ambiguity in process descriptions whether the I/O represent many single values or a single array value. |
@fmigneault its on my todo list ... both the inputs and output are under-described! |
This is why I am suggesting to simply deprecate it, as in discourage implementors to deploy new proceses that relies on minOccurs / maxOccurs (so they default to 1), and using instead |
Actually, I don't think we can get rid of
If we get rid of
Such a change is a breaking change and so we would have to go to V2.0. ... I think. Comments? |
@pvretano Having a default value could indicate that a parameter is optional. "inputs" : {
"myInput": {
"schema": {
"type": ["string", "null"],
"default": null
}
}, ...
} or "inputs" : {
"myInput": {
"schema": {
"type": "string",
"default": ""
}
}, ...
} |
If In order to reduce the ambiguity between array as a cardinality specifier and the JSON array container passed as a single value, we could disallow this kind of input for inputs:
input_single_value_array: [1,2,3] Instead, a single-value JSON array would have to be explicitly nested under inputs:
input_single_value_array:
value: [1,2,3] And with this, the only case where an array could be directly provided under the input/output ID would be to represent cardinality. The following would always be equivalent and would assume inputs:
input_min_occurs2_short_form: [1, 2, 3]
input_min_occurs2_long_form:
- value: 1
- value: 2
- value: 3 In the above example, each |
@fmigneault with my suggestion, Any other use would be deprecated along with maxOccurs. If you want an array with at least two elements, you would use |
It just dawned on me that there is one (very important) aspect of that non-JSON Schema input multiplicity that might benefit from the minOccurs / maxOccurs approach. The JSON Schema only applies to what goes into a direct value or qualified input value ( It does not apply to inputs that are i.e., the JSON schema was intended to represent a "single input" which could be replaced by an online reference, a collection, or a nested process. To clarify, if the JSON Schema says it's an array of multiple items, the type of the file at the href location, or what each collection or each process generates, would be multiple things. So if we rely on an array for that purpose, it mixes things up quite a bit. |
What's the relation between min/maxOccurs and the "special" types anyway? What happens if I can only accept a single collection but multiple hrefs (e.g. multiple COGs)? |
@m-mohr I don't think that would be possible. The HREF are references to the one thing that the So in that particular case, at least for 1.0, I would make maxOccurs unbounded, declare the schema to be binary |
That sounds like a confusing concept to me. How do I know anyway where I can use these special types (href, collection, process)? How do I know when I can pass one or multiple? It looks like the parameters don't describe it. Is it pure try&error? |
You can always use For process, you can also use this type anywhere if For collection, you can use this type for any collection input that is collection-compatible if
This is the topic of this issue isn't it? Currently in 1.0, it is with
No! :) |
I don't buy that, sorry. You answers are somewhat conflicting for me. If you set maxOccurs to unbounded for the example then it's not clear whether I can pass one url, multiple urls, one collection, multiple collections, one value, multiple values. So it's try&error. |
@m-mohr If the process description has an input with maxOccurs: unbounded, then it is clear that you can pass for that input any of: (as defined by Part 1: Core)
(as defined by Part 3: Workflows & Chaining)
There is no trial & error. It's all clear. |
My process only accepts one collection and multiple HREFs. I can't encode that, you proposed to use unbounded maxOccurs. So it's try&error for my users/client?! For me that looks like a flaw in the spec. |
@m-mohr As I explained above, that is not a supported use case. The collections and processes (and href) are a drop-in replacement for getting one input value. So if you can receive something from multiple GeoTIFFs or from one collection, your server should be able to also easily retrieve one TIFF per collection that is passed? (NOTE: You also need to support base64-encoded values for each TIFF, as crazy as that sounds :P I would complain more about that than about having to support multiple collections.) |
I somewhat agree with @m-mohr regarding The usual intent for anyone using a If the collection happens to return only one item, it is still going to be an array of a single My understanding of |
That is not at all what Part 3 - Collection Input is about. It's about bridging the OGC API data access Standards (Coverages, Features, Tiles, EDR, DGGS, Maps...) with OGC API - Processes, so that an OGC API data source can be the input to a process. It normally represents an infinite set of possible URL requests.
Part 3 extends Part 1 with additional types of inputs which are drop-in replacements for the value or href in Part 1 execution requests. Because collections are drop-in replacements for Part 1 value/href, the cardinality also means you can have one or multiple collections.
A collection input of Part 3 is written as It does not include items, and the collection URI must return a Collection Description as defined in OGC API - Common - Part 2, with one or more links to access mechanisms (/items, /coverage, /tiles, /dggs...). The server then mints its own URIs to request data as needed, which will take into consideration the area/resolution/time of interest (which facilitates also supporting Collection Output, since area/resolution/time of interest flows in from process-triggering requests also as OGC API data access mechanisms), and the overlap in capabilities in terms of formats and APIs supported by both ends of the hop. What it gets back from those minted URIs is defined by the relevant OGC API data access standards. A typical example of cardinality applied to Collection Input is our RenderMap process: https://maps.gnosis.earth/ogcapi/processes/RenderMap?f=json You can provide one or more layers as either embedded values or href of GeoTIFF, GeoJSON, or nested process, or collections. https://maps.gnosis.earth/ogcapi/processes/RenderMap/execution?response=collection |
I see, so Given that, I even further agree with @m-mohr. |
Maybe now we better understand each other but agree to disagree ;) It's about OGC API collection as first class objects that can be inputs to processes.
Correct, but the collection with id
Not all, only at least one. The more APIs (and formats, and CRS, and TileMatrixSets, and DGGRSs...) it supports, the more input collections it would be interoperable with.
That should not be the case if implementations conform to the relevant OGC API standards. I understand you have a different view on the best way to do this and are not a fan of "Collection Input" / "Collection Output" :) But I strongly believe in the great potential value of it. |
If the first process in a chain generates an output
That is extremely optimistic of you. 😄 I like the idea behind collections if they were more controlled. They have potential. At the moment, however, they feel like a |
The idea is that the end-user client sends the whole workflow chain to the process at the top of the workflow, and if executing with
Now the client sees whether it supports the supported API / formats listed in that collection description, and it itself will either be good with it or not. Now that client can actually be any one of the servers for the nested processes in that workflow. Any of the process servers receiving an input collection can also validate the advertised acceses mechanisms with what it itself supports. So each client along the chain has a simple single request, gets back a collection description, and is either happy with the OGC APIs / formats / CRS or not. That results in the end-user client either getting a 4xx or 303 (redirection to We need to test this more in actual implementations. I hope there's an opportunity in Testbed 20 to explore this further, and it would be great to work with you guys at CRIM to experiment further on this together as well.
If we have well defined requirements and abstract test suites, leading to good Executable Test Suites, surely there will be a good level of interoperability between implementations of the same standard :) I agree the original Processes - Part 1: Core still had quite a few ambiguities, but the point of this thread is to try to bring clarity and improve on the intended interpretation. @pvretano To summarize on this aspect of href/collections/processes, if we are to deprecate maxOccurs in 1.1/2.0, what we would need to do is clarify that when substituting an input by an href (or a process or collection in Part 3):
In the current 1.0, the href is a substitute only for the schema itself, without those slightly more complicated special rules, and the separate maxOccurs is used to handle that multiplicity. |
Not sure if I really agree with easily validated 😅 What I have in mind is the following Workflow (whichever representation used between openEO, CWL, or nested processes) :
When the workflow that encodes The only situation where I can foresee a
I agree. |
The thing to realize about the validation is that for each hop, the client (or server acting as a client) executes a simple operation that is well defined by an OGC API. Either it:
BTW please have a look at https://gitlab.ogc.org/ogc/T19-GDC/-/wikis/OpenEO/OGC-API-Processes-Part-3-comparison (Testbed-19/OGC GitLab registration required) |
13-NOV-2023: SWG concensus is to NOT deprecate minOccurs and maxOccurs in this version because there are some input cases that might not be handled clearly if we go to a pure schema approach. We need more implementation experienced with the schema approach so for now, we will proceed with minOccurs and maxOccurs in place. @pvretano will update PR #378 accordingly. @jerstlouis also mentione that we somehow need to emphasize that implementation must be prepared to handle inpput inline or by-reference. This is what Requirement 18 says! Also @pvretano will look into adding direct links to each requirement! |
If I want to provide an array of max. two strings, how am I supposed to do that?
Is it equivalent?
(Provided in YAML for simplicity)
The text was updated successfully, but these errors were encountered: