Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterations not shown explicitly #2

Open
stain opened this issue Jul 20, 2012 · 1 comment
Open

Iterations not shown explicitly #2

stain opened this issue Jul 20, 2012 · 1 comment
Assignees

Comments

@stain
Copy link
Member

stain commented Jul 20, 2012

Taverna's list handling/iterations are implicit, but should be explicitly listed in the PROV output, to avoid list content appearing out of nowhere.

@ghost ghost assigned stain Sep 7, 2012
@stain
Copy link
Member Author

stain commented Dec 10, 2013

In email with Khalid:

You are right in that the provenance is incomplete if a processor consumes a list that was created by list iteration, or consumes a single input when the upstream produces a list.

These values will in the provenance currently appear to come out of nowhere, PROV-wise:

<http://ns.taverna.org.uk/2011/data/1fc12b45-ff08-4364-b964-b732874399fc/ref/52867c2b-da01-4484-830d-1af6cc7cec8a>
    tavernaprov:content  <outputs/Flatten_List_outputlist/3.txt> ;
    rdf:type             wfprov:Artifact ;
    rdf:type             prov:Entity .

They would however be members of the list using prov:hadMember and prov:hadDictionaryMember (and that list has provenance):

<http://ns.taverna.org.uk/2011/data/1fc12b45-ff08-4364-b964-b732874399fc/list/443a7f0f-e081-4085-9b0e-a9f08385f82b/false/1>
    rdf:type                     prov:Entity ;
    wfprov:wasOutputFrom         <http://ns.taverna.org.uk/2011/run/1fc12b45-ff08-4364-b964-b732874399fc/> ;    
    prov:wasGeneratedBy          <http://ns.taverna.org.uk/2011/run/1fc12b45-ff08-4364-b964-b732874399fc/process/47f79978-4dce-4264-9869-aa87552dff5d/> ;      

    prov:hadMember               <http://ns.taverna.org.uk/2011/data/1fc12b45-ff08-4364-b964-b732874399fc/ref/aa88bb43-14b1-4aac-b4a9-3ebddc65790b> ;
    prov:hadDictionaryMember     _:b386 ;
    prov:hadMember               <http://ns.taverna.org.uk/2011/data/1fc12b45-ff08-4364-b964-b732874399fc/ref/52867c2b-da01-4484-830d-1af6cc7cec8a> ;  

_:b386  prov:pairKey     "3"^^xsd:long ;
    prov:pairEntity  <http://ns.taverna.org.uk/2011/data/1fc12b45-ff08-4364-b964-b732874399fc/ref/52867c2b-da01-4484-830d-1af6cc7cec8a> ;
    rdf:type         prov:KeyEntityPair .

A naive way of fixing this is to look within a whole Taverna provenance graph and find any list members which are not already generated by a processor - then they must (Taverna-wise) have been generated in the processor that generated the list.

A philosophical way to solve this is to say that prov:hadMember is subproperty of dct:hasPart and so was already generated by that activity. -- note that http://www.w3.org/TR/prov-dc/ does not mantion prov:hadMember.

A proper way to fix it is to inspect the workflow structure and detect those places where there is a mismatch, and assign generation of those lists to an activity which has the corresponding element of the iteration strategy as its prov:hadPlan, e.g. a Taverna cross product.

Doing this properly would also mean that invisible lists can be captured - e.g. when there is an iteration over values from a previous iteration and no processor or workflow port consumes the list - these lists actually never do anything within Taverna, but without them we no longer have the index of the value within the iteration.

So the processor iterations themselves should ideally also be represented, somehow grouping the processor activity (processor runs) with their index. Janus just propagated the index string naively, like "[1,12,4]" - this required some nasty parsing to get anything sensible out of, even if just to sort invocations. I am not sure if this should be done as a new activity, or perhaps just as a more specific plan that the Agent (e.g. the workflow engine) is also using when executing the processor plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant