- Main Goals
- Main Outcomes
- SPARQL endpoint
- Demo1: Algorithm to order a provenance trace by querying wfprov
- Demo2 : Algorithm to query provenance and obtain workflows with a common process
- Demo3 : Some general SPARQL queries
- Adding data to the 4Store endpoint
- Generating a RO including wfprov information from Wings (taken from Graham's mail)
- Discussion review/reflection
This page is a summary of the showcase 22 at sprint2 phase which corresponds with theJira story (WFE-204).
There exists different types of provenance which are being under work in the project  . Among others it is important to capture the trace of the workflow run in order to find out some characterisitics or properties of the research object or to allow its reviewing. The terminology related to that type of provenance can be found at  and it relies on wfdesc and wfprov ontologies . Some examples can be also found at  indicating where each one of the ontologies is applied and how.
The main purpose of this showcase is to provide a way of consulting that provenance of the workflow results by a SPARQL endpoint. This data will be used later, among other funcionalities, for addressing some of the issues related with the evaluation of the quality of a research object and some of its properties (e.g. reproducibility), or to provide deeper information about an specific execution of a scientific experiment and its information (e.g. to provide ofther workflows which also uses some of its processes).
There are also relations between this provenance and other types of provenance but the study of these relations is out of the scope of the showcase.
Regarding the conversion from the two main types of workflow sources , it has been choosen to use PROV-O  as bridge towards taverna -> wfprov conversion, and a direct OPMW -> wfprov for WINGS thouhg the alignment between OPMW -> PROV-O has also been studied.
- Export 1 example of provenance of workflow execution/run from Taverna and Wings. (Longer road would be to populate massively the wf4ever portal with the data)
- Allow to query the examples imported by using a SPARQL endpoint
- Identify the examples to be exported
- Selection/identification of the models to be used
- Incorporate the selected provenance into a concrete RO
- Allow a simple visualization format of the provenance of workflow execution/run
- WINGS -> OPMV -> wfprov 
- Taverna -> PROV-O -> wfprov (set of toolfs for conversion )
- Set of tools for ordering wfprov data 
- Identification of a provenance scenario 
- Populate the repository with 2 examples of wfprov and 1 complete RO (the population of wf4ever portal should be now more straihgtforward)
- Initial SPARQL endpoint 
- Queries over the examples 
- Algorithm to order the traces of provenance using wfprov format 
A workflow example that has been used in many demonstrations is a Protein Discovery workflow. It follows the basic text mining procedure to produce proteins that were found together with the terms in the input query in the abstracts of biomedical papers. It may be a useful example for proofs of concept: http://www.myexperiment.org/workflows/74.html. Make sure to set the maximum number of abstracts to parse to a low number for quick results. An example input is given in the input description.
Story: As a researcher, I will be able to select the output of a workflow run, and then obtain the information that shows that this output was used to provide output on the 12th of December 2011, that it did so as part of protein discovery workflow, that the purpose of this service was to suggest biological concepts involved in Metabolic Syndrome, and the steps taken to achieve those results were optimize_for_medline, aida_retrieve_documents_in_parts, etc. This could for instance follow after clicking on a workflow run reference in uvp1.1 
Technical implementation: The different items of the RO are listed showing their URIs in a human readable form. One of the items of the RO is the provenance of the workflow execution which is an annotation including the associated file. The provenance information is capture with wfprov ontology and captures the run of the experiment. As a user I would like to be able to see the different steps of the execution in a ordered form and also the generated intermediate and final results. All the processes and results have to be accesible trought a SPARQL endpoint and also have to be displayed in a printer format (or similar) for reviewing of the end-user (some information related with the run to be shown, timestamp, and agent who executed).
We have installed and used an SPARQL endpoint (4store ) in order to test different content related with the showcase (specially wfprov). The endpoint has available information about three different workflow executions: two of them come from WINGS  and the other one is a Taverna Workflow. The SPARQL endpoint is accesible at  and users can test some queries by using the its UI .
1) As a researcher I want get the information of a RO and select one of them to see/review its provenance, then I want to choose one of its executed processes and wants to see what other workflows contain also that process.
In order to test this story, which makes use of wfprov, we have developed an algorithm and a small Java app that shows a trace of provenance ordered by execution process.
The algorithm retrieves all the runned processes of a workflow execution. Once we have them we get the inputs and outputs of each process. Then we set the relations between them and we start to create a provenance trace based on the inputs that are available (the processes that are executed generate outputs that become available inputs for following processses that need them, and so on). The source code is stored at github .The app allows the user to select one of the three cases of execution stored at the repository and shows the results step by step. The executable app is also available at github .
The queries and results can be found at  ?and  for protein case obtained from taverna, but the output of following step by step the workflow run is also shown in the next. Each step is a process and it also gives information about the inputs that has used and its outputs.
The demo show the following scenario, as a researcher I want get the information of a RO and select one of them to see/review its provenance, then I want to choose one of its executed processes and wants to see what other workflows contain also that process. The queries and one example can be found at .
Here we show some SPARQL queries (which also can be consulted at ) that might be useful for users to test the content of the repository regarding the updated provenance data.
1) Get all the wfRuns stored at the endpoint
2) Get all the processes used in a wfRun
3) Get all the inputs/outputs that take part in a WfRun
4) Get all the inputs of a specific runned Process
5) Get all the outputs of a specific runned Process
6) Get the necessary inputs to get a specific Output
7) Get the wfRuns where a specific Input has been involved
8) Get the wfRuns and its runned processes where a specific input has been involved
9) Get the wfRun and its runned process which generate a specific Output
10) Get the process that describes a runned process.
11) Get all the wfRuns where a process has been involved
12) Get workflows that have a process
13) Get all the processes that are part of a workflow
14) Get all the processes and its executions from a workflow
How to add RDF's to the 4store repository:
Given RDF's that contain the needed information about a workflow (wfrpov and wfdesc) we can update the our database by adding them to the repository.
End the current running 4store (repository) and stop the backend:
Add the rdf's:
Rerun again the backend and the server:
More info about the repository at [x1] and about the Sparql server at [x2].
The sequence of queries used to extract wfdesc/wfprov information for the Wings workflow example have been automated using a bash script + cURL.
It's all part of the ro-catalogue entry at https://github.com/wf4ever/ro-catalogue/tree/master/v0.1/WingsProvenanceExample
The files needed to re-run the process are:
- getWingsData.sh (shell script)
- prefixes.sparql (file of common prefixes for SPARQL queries)
The scripts used to create an RO structure with these data are:
- makeresourcelists.sh (uses Jena tools for querying local RDF)
- make.sh (uses ro-manager to create RO structure for the wings example data)
These files are all part of the resulting RO.
 are two examples created by this procedure.
The validation and feedback provided by Maro can be found at: http://www.wf4ever-project.org/wiki/display/docs/2012/04/08/RO+provenance+query+tests+by+users+%28Showcase+22+validation+by+Marco%29
The next is the summay of the review/reflection:
In general all the team memebers we agreed in the following:
- Need of a better coordination at the begining and definition of tasks (some lack of communicacion initially)
- End-users stories more difficult to match than expected
- Towards the end, sprinting seemed to be working a lot better
- Problem of scarce resources over-committed elsewhere (Stian, Marco).
- Stand-ups worked well though some people was not able to attend (overcommited) but cathed up later.
- Open skype-chat seemed to work very well for updating and punctual questions. It turned out that it was sometimes the best way of sharing knowledge.
- Very good overall collaboration and team work.
Identified point than cat be applied for next sprints as improvements:
- Stand-ups and optionally to use telco too if seems neccessary
- Maintain open the skype chat in order to allow updates, to catch up, and puctual questions.
- Add google-calendar stand-ups schedule
Some technical issues:
- Resources of resources of a RO should be linked directly including them as part of the RO or indirectly through the resource? (Open question from Khalids) IMO Indirectly to avoid duplicates and overloading the RO with extra data which it might be to big.
- Taverna RO links created for I/O of the processs executed are not created (a techie approach should be adopted here)
- Sharing is very ad-hoc (inconsistencies in Wings data extraction - suggest more use of scripting for repeatability)
- To see more use of github
Agreed post-sprint actions:
- Finish users-feedback by developing a simple visual interface to show the two demos
- Wrap-up everything and specially the technical tools wich have been use to create the dataset of provenance of workflow results and procedures to do the demos/testing
- Gather proposals by team members for next possible actions to be done in order to achive
- Assisting detection and explanation of workflow decay
- RO checklisting (including wfprov possibilities)
- Provenance support for first steps towards replayability and reproducibility