Showcase RO workflow integration and provenance

Skip to end of metadata
Go to start of metadata

?

Introduction

The main goal of this showcase is to get the workflow run traces of a scientific experiment into the ROs specification and upload some of the already available workflows from other platforms (mainly Taverna and Wings) to populate the ROs at Wf4Ever.

A seconday goal of this showcase is the identification of possible uses of the workflow run provenance for integrity and authenticity funcionalities of the WF4Ever project. The Jira number associated to this showcase is: Jira-204

The general overview of the process is:

  • Taverna .>PROV-O.> wfprov (main focus)
  • Wings .> OPMW .> wfprov (main focus)

Identified sub-goals or sub-tasks:

  • Conversion from Taverna workflow to prov-o
    • Agent that recognizes prov-o provenance in RODL
    • Agent to convert prov-o to wfprov
    • Agent to upload wfprov to RODL
  • Alignment between prov-o and wfprov [1]
  • Taverna to embed checksums in prov-o
  • Taverna to export provenance and workflow outputs in one go (solving the problem related with IDs pointed by Daniel)
  • Identification of uses of provenance of a RO for integrity and authenticity

Example of use case

As a scientist I want to use in a future a workflow but by then it may not be as healthy as I is suppose to be. It can be due to some parts of the workflow are not working anymore and were not curated, it can be due tothe workflow has been upgraded and the results are not the same, it can be due it is executed in a different machine, etc. It is upon which of these things happend the RO and more specifically the workflow will be reproduceable, repeatable, replayable, resuable, etc. or none of those.

This example involves the following functionalities: curation of services (using the workflow template), reproducibility (using the instantiation and the provenance of the workflow results to check out that it performs in the same way as when was stored), adaptability (being able to replace some parts), etc.

Dependencies

This showcase does not depend on any other.

The following showcase depends on this one:

  • showcase 3: Mining Decay from MyExp
  • showcase 8: Implement Khalid's service substitution

Workflow Terminology

The next terminology habe been agreed by all the participants of the showcase and it can be considered stable.

  • Workflow template: Description of the dependences and the steps of the workflow, the type of the steps of the workflow, etc. No inputs are bound. Modeled with wfdesc.Types of templates:
    • Abstract workflow template: Workflow template where some or all of the steps of the workflow are not bound to a specific implementation.
    • Concrete workflow template: Workflow template which  has all steps instantiated with a service, script or tool.
  • Workflow instance (or execution-ready workflow): Workflow concrete template with the inputs bound to data and steps bound to services. The workflow instance is what it is sent to the engine to be run. It could be modeled with wfdesc + wfprov (wfdesc describes the workflow template part and wfprov is used for the assignments of inputs to the first step of the workflow).
  • Workflow run/Workflow Execution: The activity of running a workflow instance. The purpose of the workflow run is to run the workflow and get the final results. The workflow run generates provenance of the workflow results (see below). A Workflow Run is equivalent with a Workflow Execution - but has been decided to use Workflow Run for consensus. Modeled with wfprov.
  • Provenance of the workflow results: provenance of the run of the workflow. It records some/all actions which occurs at the wf execution/run. It can include inputs, intermediate results, generated outputs and how do they relate to each other, who activated it, in which system was it executed, when did it start and end, etc. Modeled with wfprov.

A summary of the terminology can be found in the next table:

Term Abstract workflow template
Concrete workflow template
Workflow instance
Workflow run
Provenance
of workflow results
Workflow Input
No inputs bound
No inputs bound
Inputs bound and
captured
Inputs bound
and captured
Inputs bound
and captured
Workflow Steps
Some steps are not bound
to an implementation
All steps are bound to an
implementation
All steps bound to
an implementation
Intermediate steps
exist, but not captured
The instances of the
processes are captured
Workflow Intermediate Results
Not produced
Not produced Not produced Exist, but not captured
Produced and captured
Workflow Output
Not produced
Not produced Not produced Produced and captured Produced and captured

There are also some examples which have been created for better understanding of the terminology Workflow+templates+instances+runs.

RO workflow provenance for integrity and authenticity

This sections provide the results of the identification task for finging possible uses of provenance information related to the execution of a workflow.The main stories identified are three:

  • Reproducibility: being able to replicate the experiment by following the same steps as the authors did and using the same methods (verifying that the scientific experiment can reproduce what is expected to do and the results are the same.
  • Replayable: the opportunity of following the different  steps of the execution of the scientific experiment in order to understand it better and see its step by step process.
  • Repeatibility: being able to execute a scientific experiment from now in 5 years. The results don't have to be the same due to changes in the inputs or curation of services but they must be equivalent.

Other possible stories though they are not very clear at this point:

  • Adaptability: looking automatically for possible changes and adapt to them
  • Curation:the provenance of the results can be used to verify that once the workflow is curated its results are still the same
  • Stability: if the workflow is stable trhoughout the time

During the task of identifying I&A funcionalities for the wf4ever project  it has been also identified two groups of stories:

  • Social stories: they are related with the social uses of the ROs inside a scientific comunity. For example giving credit to the author of a RO/WF whenever it is used by other people, or for citation purposes.
  • Tecnical/Instrumental stories: these stories are the ones introduced above and are related with thinking about a workflow as an instrument which is going to be used later on for different purposes (reproducibility, replayability, repeatibility)

Taverna and Wings workflows into Wf4ever portal

It has been developed some prototypes for PROV-O export, and PROV-O to wfprov. However this is not at a production-level, and PROV-O export/conversion needs to be updated to use latest OWL.

It also has been developed the translation from wings to wfprov in order to include it in a RO.

Related Links and references

[1].- http://dvcs.w3.org/hg/prov/raw-file/tip/ontology/ProvenanceFormalModel.html
[2].- https://github.com/wf4ever/ro/tree/master/mapping/prov-o/test

[3].- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification

[4].- http://www.wf4ever-project.org/wiki/display/docs/RO+interoperability

[5].- http://www.wf4ever-project.org/wiki/display/docs/Research+Object+Vocabulary+Specification+v0.1#ResearchObjectVocabularySpecificationv0.1-Workflowdefinition%28wfdesc%29

[6].-https://lists.isoco.net/pipermail/wf4ever/2012-February/003101.html

[7].-http://www.wf4ever-project.org/wiki/display/docs/RO+Provenance+integration+discussions

[8].- http://www.wf4ever-project.org/wiki/display/docs/RO+evolution

Summary of the showcase (08/03/2012)

Proposed next actions:

  • Description of the different stories which have appeared during the evolution of this showcase
  • Complete the transfer from Taverna and Wings to Wf4Ever portal (includes some updates due to new terminology)
  • Make the provenance of workflow results available as queries
  • Evaluation of I&A funcionalities using the provenance of workflow results

Logs

Kalhilds last open question (see complete conversation at [6]):

From the above, there is I think a question that applies not only to the example that Daniel highlighted, on provenance or workflows, but rather is more fundamental in the sense that it applies to the whole RO model, which can be formulated as follows:

"Consider a research object ro, and a resource r that is aggregated within ro. Consider now that r is composed of other resources, say r_1, r_2 and r_3. The question is, should r_1, r_2 and r_3 be defined as aggregated resources of ro, or would that simply introduce redundancy, since these resources, i.e., r_1,r_2 and r_3, can be accessed through the parent resource r?"

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. mar 06, 2012

    Marco Roos says:

    Perhaps a motivation to take into consideration for use case examples is that of...

    Perhaps a motivation to take into consideration for use case examples is that of checking how the results of a workflow were obtained. When these results led to a new scientific insight, it is important to be able to gain insight in the method that led to it.

    I suppose this means that I find it important to see the link between (annotated) results (data) and the methods, and in general to also consider this case from a data-oriented perspective.