-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterations not shown explicitly #2
Comments
In email with Khalid: You are right in that the provenance is incomplete if a processor consumes a list that was created by list iteration, or consumes a single input when the upstream produces a list. These values will in the provenance currently appear to come out of nowhere, PROV-wise:
They would however be members of the list using prov:hadMember and prov:hadDictionaryMember (and that list has provenance):
A naive way of fixing this is to look within a whole Taverna provenance graph and find any list members which are not already generated by a processor - then they must (Taverna-wise) have been generated in the processor that generated the list. A philosophical way to solve this is to say that prov:hadMember is subproperty of dct:hasPart and so was already generated by that activity. -- note that http://www.w3.org/TR/prov-dc/ does not mantion prov:hadMember. A proper way to fix it is to inspect the workflow structure and detect those places where there is a mismatch, and assign generation of those lists to an activity which has the corresponding element of the iteration strategy as its prov:hadPlan, e.g. a Taverna cross product. Doing this properly would also mean that invisible lists can be captured - e.g. when there is an iteration over values from a previous iteration and no processor or workflow port consumes the list - these lists actually never do anything within Taverna, but without them we no longer have the index of the value within the iteration. So the processor iterations themselves should ideally also be represented, somehow grouping the processor activity (processor runs) with their index. Janus just propagated the index string naively, like "[1,12,4]" - this required some nasty parsing to get anything sensible out of, even if just to sort invocations. I am not sure if this should be done as a new activity, or perhaps just as a more specific plan that the Agent (e.g. the workflow engine) is also using when executing the processor plan. |
Taverna's list handling/iterations are implicit, but should be explicitly listed in the PROV output, to avoid list content appearing out of nowhere.
The text was updated successfully, but these errors were encountered: