-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processor parameters validator overwrites #1025
Comments
Or you just make a shallow copy BTW, for workflow validation also have a look at TaskSequence.
I agree – the error should be reported as early as possible. But keep in mind that sometimes only the processor instance itself really knows whether parameters do actually work. For example, there might be some semantic constraints (not schema backed) between parameters, or resource files will have to be (attempted to be) loaded. So early validation is good, but not sufficient – errors can still arise. I don't see the problem with querying the tool JSON from the processors at runtime though. Caching such tool JSONs can indeed cause inconsistency when processors get re-installed during the server runtime. But that's nothing that should keep us from doing it that way IMO. |
Yes, that's also possible and even more efficient.
This is exactly where I looked as an entry point from the CLI of This said, there are of course drawbacks. Such as not being able to validate an input file group if it was not mentioned in a previous step as an output group. This is problematic for
Sure, the idea is to cover as much as possible and fail early if possible - which may never be 100%. During run-time the processor validates the parameters against the tool json anyway.
Since we're not trying to cover 100%, this could still be a potential solution for the Processing Server. |
Indeed, we should probably rewrite this class to support both modes, online (against concrete METS) and offline.
True. But still, in offline mode, you can check for things like multiple steps trying to write to the same fileGrp, or the overall workflow reading from multiple input fileGrps (which should not be allowed, except in patial workflows). BTW, in the future we might add more granular checks to workflow dependencies. For example, currently, processors have requirements on the kind of image features they consume. The same could be done for structural features (e.g. "needs |
The processor parameter validator overwrites the parameters of the passed object with defaults when missing. This seems to be the expected (from the written tests) and desired behavior to simplify the CLI. However, this is not ideal when only validation of the passed arguments is required. The simplest (but yet not obvious to me) solution would be to have a way to disable the default overwriting.
Context:
I was experimenting to add more validation to the OtoN converter (currently, very outdated in the repo itself) based on the core validators:
I got:
Since this is going to populate the produced Nextflow files with undesired outputs, I have to back up by deep copying the parameters and restoring them back after the validation (which is not a big deal and doable).
Processing Server #974 could also benefit from having just a validation step of the received input against the ocrd tool schema without having to increase the size of the
OcrdProcessingMessage
. Moreover, the Processing Server can right away prevent sending processing messages that will potentially fail inside theProcessing Worker
due to wrong processor parameters.Another point, I am also using a simpler approach in the OtoN converter to validate processor parameters against their ocrd tool json without having to run or even have them installed on the environment by simply having the ocrd_all_tool.json file. IMO, this will be potentially also good for Processing Server #974! The drawback, however, is that the
ocrd_all_tool.json
may not always be the latest version available for all processors and have to be updated with every ocrd all release. IMO, this is still optimal in comparison to executing processors to get their tool json or even having them installed in the environment.The text was updated successfully, but these errors were encountered: