?
Introduction
The main goal of this showcase is to get the workflow run traces of a scientific experiment into the ROs specification and upload some of the already available workflows from other platforms (mainly Taverna and Wings) to populate the ROs at Wf4Ever.
A seconday goal of this showcase is the identification of possible uses of the workflow run provenance for integrity and authenticity funcionalities of the WF4Ever project. The Jira number associated to this showcase is: Jira-204
The general overview of the process is:
- Taverna .>PROV-O.> wfprov (main focus)
- Wings .> OPMW .> wfprov (main focus)
Identified sub-goals or sub-tasks:
- Conversion from Taverna workflow to prov-o
- Agent that recognizes prov-o provenance in RODL
- Agent to convert prov-o to wfprov
- Agent to upload wfprov to RODL
- Alignment between prov-o and wfprov [1]
- Taverna to embed checksums in prov-o
- Taverna to export provenance and workflow outputs in one go (solving the problem related with IDs pointed by Daniel)
- Identification of uses of provenance of a RO for integrity and authenticity
Example of use case
As a scientist I want to use in a future a workflow but by then it may not be as healthy as I is suppose to be. It can be due to some parts of the workflow are not working anymore and were not curated, it can be due tothe workflow has been upgraded and the results are not the same, it can be due it is executed in a different machine, etc. It is upon which of these things happend the RO and more specifically the workflow will be reproduceable, repeatable, replayable, resuable, etc. or none of those.
This example involves the following functionalities: curation of services (using the workflow template), reproducibility (using the instantiation and the provenance of the workflow results to check out that it performs in the same way as when was stored), adaptability (being able to replace some parts), etc.
Dependencies
This showcase does not depend on any other.
The following showcase depends on this one:
- showcase 3: Mining Decay from MyExp
- showcase 8: Implement Khalid's service substitution
Workflow Terminology
The next terminology habe been agreed by all the participants of the showcase and it can be considered stable.
- Workflow template: Description of the dependences and the steps of the workflow, the type of the steps of the workflow, etc. No inputs are bound. Modeled with wfdesc.Types of templates:
- Abstract workflow template: Workflow template where some or all of the steps of the workflow are not bound to a specific implementation.
- Concrete workflow template: Workflow template which has all steps instantiated with a service, script or tool.
- Workflow instance (or execution-ready workflow): Workflow concrete template with the inputs bound to data and steps bound to services. The workflow instance is what it is sent to the engine to be run. It could be modeled with wfdesc + wfprov (wfdesc describes the workflow template part and wfprov is used for the assignments of inputs to the first step of the workflow).
- Workflow run/Workflow Execution: The activity of running a workflow instance. The purpose of the workflow run is to run the workflow and get the final results. The workflow run generates provenance of the workflow results (see below). A Workflow Run is equivalent with a Workflow Execution - but has been decided to use Workflow Run for consensus. Modeled with wfprov.
- Provenance of the workflow results: provenance of the run of the workflow. It records some/all actions which occurs at the wf execution/run. It can include inputs, intermediate results, generated outputs and how do they relate to each other, who activated it, in which system was it executed, when did it start and end, etc. Modeled with wfprov.
A summary of the terminology can be found in the next table:
| Term | Abstract workflow template |
Concrete workflow template |
Workflow instance |
Workflow run |
Provenance of workflow results |
|---|---|---|---|---|---|
| Workflow Input |
No inputs bound |
No inputs bound |
Inputs bound and captured |
Inputs bound and captured |
Inputs bound and captured |
| Workflow Steps |
Some steps are not bound to an implementation |
All steps are bound to an implementation |
All steps bound to an implementation |
Intermediate steps exist, but not captured |
The instances of the processes are captured |
| Workflow Intermediate Results |
Not produced |
Not produced | Not produced | Exist, but not captured |
Produced and captured |
| Workflow Output |
Not produced |
Not produced | Not produced | Produced and captured | Produced and captured |
There are also some examples which have been created for better understanding of the terminology Workflow+templates+instances+runs.
RO workflow provenance for integrity and authenticity
This sections provide the results of the identification task for finging possible uses of provenance information related to the execution of a workflow.The main stories identified are three:
- Reproducibility: being able to replicate the experiment by following the same steps as the authors did and using the same methods (verifying that the scientific experiment can reproduce what is expected to do and the results are the same.
- Replayable: the opportunity of following the different steps of the execution of the scientific experiment in order to understand it better and see its step by step process.
- Repeatibility: being able to execute a scientific experiment from now in 5 years. The results don't have to be the same due to changes in the inputs or curation of services but they must be equivalent.
Other possible stories though they are not very clear at this point:
- Adaptability: looking automatically for possible changes and adapt to them
- Curation:the provenance of the results can be used to verify that once the workflow is curated its results are still the same
- Stability: if the workflow is stable trhoughout the time
During the task of identifying I&A funcionalities for the wf4ever project it has been also identified two groups of stories:
- Social stories: they are related with the social uses of the ROs inside a scientific comunity. For example giving credit to the author of a RO/WF whenever it is used by other people, or for citation purposes.
- Tecnical/Instrumental stories: these stories are the ones introduced above and are related with thinking about a workflow as an instrument which is going to be used later on for different purposes (reproducibility, replayability, repeatibility)
Taverna and Wings workflows into Wf4ever portal
It has been developed some prototypes for PROV-O export, and PROV-O to wfprov. However this is not at a production-level, and PROV-O export/conversion needs to be updated to use latest OWL.
It also has been developed the translation from wings to wfprov in order to include it in a RO.
Related Links and references
[1].- http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceFormalModel.html
[2].- https://github.com/wf4ever/ro/tree/master/mapping/prov-o/test
[3].- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification
[4].- http://www.wf4ever-project.org/wiki/display/docs/RO+interoperability
[6].-https://lists.isoco.net/pipermail/wf4ever/2012-February/003101.html
[7].-http://www.wf4ever-project.org/wiki/display/docs/RO+Provenance+integration+discussions
[8].- http://www.wf4ever-project.org/wiki/display/docs/RO+evolution
Summary of the showcase (08/03/2012)
Proposed next actions:
- Description of the different stories which have appeared during the evolution of this showcase
- Complete the transfer from Taverna and Wings to Wf4Ever portal (includes some updates due to new terminology)
- Make the provenance of workflow results available as queries
- Evaluation of I&A funcionalities using the provenance of workflow results
Logs
Kalhilds last open question (see complete conversation at [6]):
From the above, there is I think a question that applies not only to the example that Daniel highlighted, on provenance or workflows, but rather is more fundamental in the sense that it applies to the whole RO model, which can be formulated as follows:
"Consider a research object ro, and a resource r that is aggregated within ro. Consider now that r is composed of other resources, say r_1, r_2 and r_3. The question is, should r_1, r_2 and r_3 be defined as aggregated resources of ro, or would that simply introduce redundancy, since these resources, i.e., r_1,r_2 and r_3, can be accessed through the parent resource r?"
Comments (1)
mar 06, 2012
Marco Roos says:
Perhaps a motivation to take into consideration for use case examples is that of...Perhaps a motivation to take into consideration for use case examples is that of checking how the results of a workflow were obtained. When these results led to a new scientific insight, it is important to be able to gain insight in the method that led to it.
I suppose this means that I find it important to see the link between (annotated) results (data) and the methods, and in general to also consider this case from a data-oriented perspective.