-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process to run EO Application Packages (CWL) #507
Comments
+1 we will probably start implementation work on this still in 2024 (I hope) The other difficult thing is how data goes in and out. STAC is for sure the solution, but it needs constraints to be usable. |
That sounds pretty reasonable. The return value should probably be a data cube (or the new stac subtype, see #485). Here's a reference to an old PR, which had similar aims and has some discussion already: #332
That's not a thing in openEO, primarily because not all programming language have a construct such as kwargs in Python. |
I was just wondering whether CWL could just be another UDF runtime and whether we could use run_udf? @clausmichele |
Maybe @jzvolensky can help, he's our OGC AP expert. I guess in this case we can't pass a single code block which contains everything, definition and input parameters to run an AP? |
@clausmichele I am not sure how that would work with the ADES. Since the CWL processes are stored in the ADES I suppose they could be read in a UDF and then you provide the input parameters in the UDF and then send the processing request to ADES? Maybe this is something we can look at/think about. |
What is ADES in our context here? I did assume that you'd specify a CWL file and there happened no interaction before to store the CWL. |
Sorry, ADES is the Application Deployment and Execution Service from the EOEPCA project. Basically a CWL execution engine which also supports managing CWLs (deploy, undeploy etc.). Our idea is to plug this into OpenEO so that with a process or possibly a UDF? we can then execute Application Packages. In this way we can have a set of predefined processes available to the user, or possibly allow the user to provide their own. |
The specification should be independant of the implementation. So ADES might be a data point, but we should probably focus on the underlying specification (i.e. OGC API - Processes - Part 2/3). Plugging that in makes sense, but in the end a CWL could also be just a specific "language" to express UDFs in, similar to Python or R. |
Hello, so I looked at the |
Yeah. If we are reusing run_udf instead of a new process, it could look as follows in a process graph: {
process_id: "run_udf",
arguments: {
udf: "... CWL as YAML or URL or string ...",
runtime: "cwl",
version: "1.2", // could be omitted as it's the default version, see below
context: {
cwl_param1: true,
cwl_param2: 99
}
}
} While {
title: "EO Application Packages (CWL)",
type: "language",
default: "1.2",
versions: {
"1.2": {
libraries: { ... } // not sure about this entry. I guess it could pre-loaded docker images or so?
}
}
} It's just an idea that doesn't need an explicit process. If people think it would make sense to have a separate process, we can also discuss that. But right now I don't see an explicit reason why that might be better. Please let me know if you have any reasons in mind. Somewhat related issue: #515 Also, run_udf is usually meant to be executed in datacube processes such as reduce_dimension. This would not be the case for EO Application packages I guess, which is somewhat against the best practice of UDFs. It's somewhat unclear how a mapping from the EO Application Packages and the openEO data types can be achieved and communicated to users. Related process: run_udf_externally |
run_ogc_application_package
Okay, the first part looks really neat with defining the workflow and inputs. in the second The last paragraph is interesting. I mean the Application Packages are fully standalone applications right. From this point of view a new process makes sense, because the application and execution of it is outside of your traditional process graph scope. All that we do is bind it together with the rest of openeo processes chain using a process graph (however in theory we don't need to use any other process to use it, so it really can be a standalone process). I do like the UDFs idea and if the UDF can support this with some minor best practices update or a general UDF use case extension then that is good, I suppose. |
Notes from the meeting today: Input/Output in CWL:
Ways of interacting with CWL in openEO:
|
The open questions to OGC have been posted here: opengeospatial/ogcapi-processes#428 |
See PR #520 for a proposal, please discuss further issues in the PR. |
run_ogc_application_package
Context
For the InterTwin project (and soon others), we would like to run an OGC Application Package inside an openEO process graph. The documentation for OGC Application Package is here: https://docs.ogc.org/bp/20-089r1.html
We see it as a process similar to
run_udf
.Summary
Description
Parameters
data
Optional: yes
Description
The data to be passed to the OGC Application Package execution engine. Optional since the input data could be already defined in the CWL file and therefore it wouldn't need any other inputs.
Data Type
Datacube
cwl
Optional: no
Description
Currently it's a YAML file. Either we pass is as pure text/string like for UDFs, or we pass an URL to it and the back-end loads it.
The schema could be the same as for the
udf
parameter ofrun_udf
with some changes.Data Type
string
cwl_params
Optional: no
Description
It's either a YAML or JSON file. Again, it could be passed in the same ways described for the previous one.
Data Type
string
Return Value
Description
The result should be made available as a STAC object, so a JSON string. In this way, in the back-end it's possible to continue the process graph using
load_stac
.Data Type
string
Links to additional resources (optional)
Examples
Currently in development:
interTwin-eu/HyDroForM: Hydrological Drought Forecasting Model with HydroMT and Wflow (github.com)
OR something like this:
(very experimental, uses sapporo service: sapporo-wes/sapporo-service: A standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification. (github.com) )
I put in cc the people from Eurac working on this @jzvolensky @iacopoff @aljacob
And I am aware VITO is also interested: @jdries @soxofaan
EODC @christophreimer
The text was updated successfully, but these errors were encountered: