- The Integrity & Authenticity component
- High level API
- RO Content
- Specific evaluation functions
- IQ Dimensions: implementation targets
- Matrix of relations between R's and IQ Dimensions
- Demonstrator
- Coordination with other Wf4Ever task groups
- References
- Appendix: relationship to overall Wf4Ever architecture
The Integrity & Authenticity component
The purpose of this wiki page is to document the design of the functionalities, API, use cases, implementation ideas, etc., of the Integrity and Authenticity component. We are working towards creating an Integrity & Authenticity evaluation service, which can be used to provide elements of information quality evaluation for Research Objects.
High level API
The main idea is to create a service that will expose the functionalities via a (probably RESTful, but implementation details to be agreed upon) API. The service is presented here as a suite of evaluation functions described informally in a technology-neutral fashion.
The service would be able to evaluate a given RO (something like "RO health" or a "RO assessment service") in terms of different metrics that stem from the IQ dimensions covered in D4.1, and from other criteria that come up for evaluation. (See dimensions below, and how we plan to tackle them.) From a high-level perspective, the service would reply with a value summarising its analysis, along with an indication of the factors considered for analysis. (Initially, this analysis may be done dynamically, while in the long term, some of it could be done in batch time). Especially interesting would be a proactive functionality that continuously monitors RO health as a background process and send notice (integrity or authenticity breach alarm) to interested scientists.
The API is described initially in terms of individual metrics to be evaluated, but the general pattern should also extend to composite metrics which may reflect specific "Quality views" (cf. [5]) that reflect research domain or individual researchers' requirements.
[Lourdes]: As a user I would like to have details on the factors considered for the analysis. In fact, it would be really an added value to have the possibility to customize this final value (e.g. if I dont mind about reputation, I would like to get an I&A value that ignores this dimension). I think this is also necessary to provide, as indicated in "Functions" : "Possible to follow overall RO health or to select the dimensions you are more interested in"
Functions:
- EvaluateRO<foo> - separate functions for evaluating different RO quality metrics, all following the same basic pattern. E.g. EvalutateROCompleteness, EvaluateROTimeliness, etc. Some of these will correspond to IQ dimensions noted below, and others may be domain specific and/or subjective (per observations by Matt Gamble - cf. [1]).
- RO status-change notification - subscription service. A higher-level functionality built over the evaluation service. Scientists can subscribe to this service to be updated on the I&A of a particular RO. Scientists become followers (a la Twitter) of such ROs. It should be possible to subscribe to notifications of overall RO health, or to select the dimensions you are more interested in. [GK] Might this not be a more general kind of RO function - a kind of change-notification service - not limited to I&A? I'm also wondering if this is the sort of capability that would most smoothly be provided via myExperiment. But Lourdes notes: I think that it will be good to provide also dimensions that many scientists might not have considered, in a way to "educate" them. This will force scientists into best practices.
- Quality-aware search. Another high-level service built over the basic evaluation service may be search; e.g. when performing a search, to select ROs that satisfy some quality evaluation criterion, or to prioritize higher quality ROs in search results.
Inputs:
- Research Object - (required) probably supplied by reference using its URI. See below for discussion of information that may be required in the RO.
- Selection - (optional) where evaluation is required for a particular component of a RO rather than the entire RO. E.g. to perform a liveness evaluation of a specific workflow within a RO.
- Context - (optional) details of evaluation criteria that may vary, for example, between users. These values could affect the computation of Trust and Utility metrics per [1]. (This replaces "user" in the original proposal, as I think there may be other factors that cannot be tied to a specific user. E.g. publishers or institutions may have their own quality assessment criteria, or execution environment information that is not user-specific.) . Exactly, one thing is the relevant IQ dimensions and a different one is the kind of conclusions a particular user can get through their evaluation. Probably, this should be an application tier on top of sheer I&A evaluation. Might be worth looking into this with users. [GK] Er... chicken and egg? The immediate push is to have *something* to show to users to elicit comment. I'd rather build up from component evaluations then we can decide what the overall "sheer" I&A evaluation might look like. I think the key point we agree on is that there are intrinsic factors (from the RO) and contextual/utility factors (supplied separately) - which reflects Matt G's work in [1].
- Detail requested - (optional) as noted below, outputs may include a summary assessment or score, and details of the elements that contributed to that value. The specific detail options available would depend on the particular dimension being evaluated.
Outputs:
- Evaluation summary - this may be a simple True/False or OK/not OK indicator, a numerical score, or some other simple value. Depends on the context and selected set of dimensions? [GK] Not sure about context. But does depend on dimension evaluated. The "selected set of dimensions" is handled here as a suite of functions (though as Guiullermo has noted, the overall evaluation can be layered later)
- Evaluation detail - additional details about how the summary was arrived at. The particular details available will depend on the dimension or feature being evaluated. For example, a completeness assessment may be a simple True/False value, but in the case of a False result the detail information may indicate which expected components were missing. This is critical in order to engage with users and get their trust on the I&A service (otherwise they may just not use it!). ?[GK] Ack.
- In particular, the evaluation detail should indicate the inputs upon which the analysis is based.
Notes:
- one requirement for quality evaluation was to be able to assess reproducibility of a workflow result in various execution environments. My current thinking is that a reference to an execution environment description may be part of the context information provided. Certain evaluation functions may use this to return, for example, an indication of whether a given workflow or result could be obtained in a different environment. This suggests that a resource usage record may be useful additional information that is provided alongside provenance. (cf. MalariaGen discussions - http://www.wf4ever-project.org/wiki/display/docs/MalariaGen+use-case+and+suggested+requirements)
- Question: How to convey the meaning of each IQ dimension to user scientists? examples? showing what happens if any such dimension breaks? [GK] I'd turn this around: if we focus on providing information that scientists tell us they want, this should be a non-issue.
RO Content
The intent of the above API is that as much as reasonably possible of the source information used for quality assessment is obtained, directly or indirectly, from the RO itself. This in turn may drive requirements on the RO content. Quality evaluation inputs not included in the RO are those values that may reasonably vary between different evaluation requests, or are in some sense independent of the RO itself. (Note: this is an API-level distinction, and does not preclude references within the RO to external information that is used in an evaluation.)
Details of exactly what the RO must contain will depend on the particular evaluation function being invoked. Some anticipated RO content used for quality evaluation are:
- Minimum information model - used to indicate what kind of information should be present in the RO, and in particular used as a guide for completeness evaluation. The MIM would typically be defined for some domain of use or some well-understood research purpose. E.g. see http://mibbi.org/. Should this be supported by the expressivity of the RO model through axioms like e.g. "if an RO has at lest one associated workflow and dataset, then completeness of the RO holds". And, consequently, should be extend the RO model with an I&A ontology module? [GK] without excluding what you suggest, I'd favour an approach that builds on accepted standards (MIMs) rather than ad-hoc axioms. That is, these axioms may be used, but would preferably be part of a package tuned to a particular research need. [EG adds an interesting idea taken from LV mail] It would be possible to define a set of specific domain classes (i.e. data +workflow+conslusions+...) and evalute the completeness based on the class to which the RO belongs to. This would be a specific domain information that could be embbeded into the more general standad MIM model? [GK] This idea of having a domain-specific class against which completeness evaluation is performed is implicit in the approach of using MIMs.
- Identification of workflow(s) within a RO - needed as input to a liveness evaluation.
- External dependencies associated with each workflow within an RO - needed as input to a liveness evaluation.
- Provenance graph - needed for integrity (?) evaluation, and also for other kinds of quality evaluation.
- Quality and rating annotations - used as inputs to some overall quality evaluation functions. Per [1] quality annotations should be based on an identified quality standard, while ratings may be subjective. Also known as Quality Evidence, per [5] p51.
RO content, especially for quality evaluation, is related with the metadata generation mechanisms in the preservation platform i.e. at what moments can we get provenance information from the user and system (1) ingestion of a new RO in the system, 2) upon reuse by another user, 3) upon modification...) [GK] Not only in the preservation platform, but also in the execution environment. e.g. I see a key future direction is to use Taverna-generated provenance traces (and resource usage traces - see above) . I think your point here to that we should be trying to focus on what we can usefully do with available information?
Specific evaluation functions
EvaluateROCompleteness
The purpose of this function is to evaluate completeness of data included in or attached to a Research Object with reference to a minimum information model (MIM) along the lines of the MIBBI models [9]. Sufficiency of data is a first step to reproducibility of results in a Research Object.
Inputs:
- Research object - this is expected to contain:
- a manifest describing the contained data sets, workflows, workflow execution accounts, annotations and other components.
- component elements according to to the minimum information description (see Context below)
- Selection - (not used)
- Context - reference a external MIM description that is to be satisfied. The MIM may be part of the RO itself, or external. See minim.rdf.
- Detail required - can request summary-only, or one of the output detail options indicated below.
Outputs:
- Evaluation summary - a simple complete/not-complete indication
- Evaluation detail options:
- a list of required components that are not present
- a complete summary of required and allowed components according to the minimum information description, with reference to any corresponding entity in the research object.
Notes:
- ...
(More functions to follow...)
IQ Dimensions: implementation targets
[?GA: New after Pozna? meeting] We should focus on the famous R's, hence linking the low-level calculations for the dimensions we're working at with the high-level R's, which should be the final outcome of the service. Matrix table to be created with the relations.
[?GA: New after Pozna? meeting] Paolo noted that our provenance-related scenarios would be handy for the W3C Prov WG.
The methodology for this step is the following: In order to kickstart it, based on the previous work on user and technical requirements, and the analysis done in D4.1, we will try to describe how we plan to tackle each of the IQ dimensions, explaining how we are going to obtain the metric values, also relating these with use cases that we can think of with the previous work in mind (for example, trying to link these with the notion of (re-)executable ROs). Then we will share this with the Users, in order to refine our approaches and obtain information for the cases/dimensions we haven't been able to think of a way to calculate yet.
There was a consensus to focus on the following four IQ dimensions for a first implementation because of their relevance to the needs of the users. We will focus initially on Completeness and Stability, as these are evaluations we can understand how to perform using available data, and will need us to put in place a metadata handling framework that will be needed for most if not all quality evaluations. With these implemented, we will have a tool that we can demonstrate with research users to elicit and prioritize further IQ dimensions to evaluate.
| Dimension | Idea(s) for calculation |
Use case(s) |
Could it be done now? |
Provenance information used |
|---|---|---|---|---|
| Liveness |
Check that the data and services that are included in the RO/Wf are available and working. |
Can a workflow be repeated now? | Would need common mechanism to evaluate service liveness from description. | None? |
| Completeness |
Compare the information contained in the RO with a template. |
A user has an ROs that are used with some given analysis tool, and wishes to know if all information required to perform an analysis is present. (Note that the tool the user handles would be the client invoking the I&A service, not the user himself. ???) |
With minimum information model | None. |
| Stability |
Use provenance information to detect if something important has been removed. |
A RO contains an interesting result; a researcher wants to know if all the original information within the RO used to obtain that result is still available. (This is a variation of liveness and compleness, but focusing on a specific internal values?) |
Would need full provenance graph for RO or specified component. | Provenance graph (inputs, executions, outputs). |
| Timeliness |
(Context-dependent, depending on needs of a particular user) - Check user ratings and analyse if there has been a drop in them (used to be good, and not anymore) @@TODO: think this through more clearly |
A user has found several ROs which might be relevant for his work, and he would like to use one which is currently useful, so the tool he's using might help him by indicating which ones haven't "decayed". |
Would need access to rating information. | Timestamp information. |
| ... |
|
Matrix of relations between R's and IQ Dimensions
[Aleix] In the following matrix we can see all the IQ dimensions of D4.1. As we've said before, we're now focusing on just 4 IQ dimensions. This matrix is simply an attempt to relate IQ dimensions with the R's. It may help to decide which IQ dimenions are most relevant to achieve users' needs (but it's only an intuitive mapping). It should be discussed if IQ dimensions really fit the R's or if the R's were a tool to help us to interact with users and setting up the scenarios.
This is a first draft of the "R-IQ Matrix" which corresponds with the R's and the IQs defined in the deliverable D4.1 of M6 and inR's .
| Reusable | Repeatible | Reproducible | Replayable | Repurposable | Reliable | Referenceable | Re-interpretable | Respectful and Respectable | Retrievable | Refreshable | Recoverable and Reparable | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Believability | X | X | X | |||||||||
| Objectivity | X | X | ||||||||||
| Reputation | X | X | X | X | X | X | ||||||
| Stability | X | X | X | X | X | |||||||
| Verifiability | X | X | X | |||||||||
| Accuracy | X | X | ||||||||||
| Completeness | X | X | X | X | X | |||||||
| Timeliness | X | X | ||||||||||
| Relevancy | X | X | X | |||||||||
| Amount of data | X | X | ||||||||||
| Understandability | X | X | X | |||||||||
| Interpretability | X | X | ||||||||||
| Accessibility | X | X | X | X | X | X | X | X | ||||
| Security | X |
Explanation
- Reusable (Idea: Use of the RO as a whole. Black-box)
- Believability: True and credible info so that the RO can be reused.
- Reputation: Credibility of the RO.
- Stability: Use of the RO without any inner conflicts.
- Completeness: Avoid missed information if the RO that is going to be reused.
- Accuracy: Data quality of the RO.
- Amount of data: Appropriate volume of data.
- Relevancy: Helpful for task at hand. Applicability.
- Repeatable (Idea: Run the RO with new data)
- Stability: Values do not conflict with each other.
- Understandability: Easy-to-follow parts of the RO.
- Completeness: The more complete the better.
- Relevancy: Helpful for task at hand.
- Accessibility: Easy access to different parts of the RO.
- Reproducible (Idea: Rerun to validate some data)
- Stability: Rerun without conflicts.
- Verifiability: The info can be checked.
- Amount of data: Appropriate volume of data to check.
- Accessibility: Easy access to the information.
- Understandability: Follow the process easily.
- Replayable (Idea: See what has happened. Follow the steps taken)
- Believability: Trustworthiness.
- Reputation: Source in high standing.
- Stability: No conflicts.
- Completeness: The RO is still complete.
- Understandability: Data easily comprehended by the consumer.
- Accessibility: Availability of the information.
- Accuracy: Correctness and precision.
- Repurposable (Idea: Reuse parts of a RO)
- Reputation: Good methods, good info, etc.
- Relevancy: Helpful for task at hand.
- Timeliness: Reusable parts are up-to date.
- Accessibility: Easy access to useful parts of the RO.
- Reliable (Idea: Trustworthy RO)
- Believability: True and credible information.
- Objectivity: Fair and honest information.
- Reputation: Credibility of the source.
- Referenceable (Idea: The RO can be cited)
- Accessibility: Access to the different parts of a RO.
- Re-interpretable (Idea: Cross-Boundary)
- Interpretability: Clarity of definition.
- Understandability: Data easily comprehended by the consumer.
- Reputation: Credibility.
- Respectful and respectable (Idea: Intellectual property, policies for reuse)
- Security: Guarantees privacy.
- Reputation: High standing of information. Known sources.
- Objectivity: Information should be unbiased and impartial.
- Retrievable (Idea: Find, discover, retrieve, etc.)
- Accessibility: Availability of the information. Quickly retrievable.
- Interpretability: Clarity of definition.
- Refreshable (Idea: RO remains valid even when its parts evolve in time)
- Verifiability: The data can be checked.
- Timeliness: The data is up-to-date.
- Stability: Compatibility between components of the RO throughout time.
- Completeness: The RO should not miss information during its evolution.
- Recoverable and reparable (Idea: Roll-back, diagnosis)
- Completeness: There is no information missed.
- Verifiability: Traceability of the RO.
- Accessibility: Quickly retrievable.
Demonstrator
Some ideas to demonstrate this service:
- Linking it from the RO cmd line tool. I (GK) propose to start by implementing a completeness evaluation as part of the RO manager command line tool [7] (agreed).
- Invoking it from a dedicated website...
- I would start with supporting a particular dimension at the RO cmd line tool now to touch base. [GK] OK - that's the intent of focusing on completeness, as it seems tractable. For the review, we should aim for a number of them at cmd line and pick one for a demo with a visually attractive GUI. [GK] I'm not sure how many I'll be able to do by then - in many ways, the first one will be the hardest as we'll need to figure how to connect with the other strands of the project (at last!). Doing pretty GUIs can also be time consuming. How much of Guillermo's effort is to be involved in this?
TODO
Plan for implementing completeness evaluation prototype:
- Identify representation for MIM - Matt G is working on something.
- Identify needed annotations for component identification - probably depends on MIM representation
- Read latest RO model - done
- Agree representation (ontology) for RO manifest and additional metadata (with RO TF)
- Set up test case RO in consultation with RO TF
- Question: does RO manager need to create ORE manifest?
- Enhance RO manager to create annotations required by test case RO
- Decide what to do with result of annotation function: store in RO or return separately?
- Add evaluate command to RO manager, with pluggable annotation function
- Implement completeness evaluation function
- Capture provenance of evaluation function execution
- In addition to collecting provenance for explanation of evaluation results, we should get provenance records about the ROs and workflow executions themselves. Any ideas about this? [GK] As a future direction, I agree that's crucial. This TODO is focused on a near-term development, and was predicated on the idea that the evaluation code itself should be able to easily record some basic provenance trace information. I'm expecting that Stian will be providing a suitable provenance trace from taverna executions at some point inthe not-to-distant future.
Coordination with other Wf4Ever task groups
This section summarizes some design issues that should be coordinated with other working groups within the project.
Research Objects (RO)
- Matt G is working on a vocabulary/ontology for describing minimum information models.
- What vocabulary might be used to refer to minimum information models that the RO is intended to reflect? (This is separate from the MIM itself that Matt G is working on.)
- What vocabulary would be used to describe the component data elements of an RO sufficiently to identify their role with reference to a MIM? (Probably the MIM vocabulary + ORE?)
- Is there a plan/schedule to produce an OWL ontology for the RO manifest? (This allows use of OWLDoc as reference manual)
- I'd like to create a "test case reference RO" for the purposes of completeness evaluation testing. I think this should be in collaboration with RO TF, at least to the extent of review. I think doing this would force a number of devilish details to be resolved, so would hopefully be useful for you.
- Though not strictly intertwined there will also be connections with evolution work by Raúl in WP3. I propose to start thinking about this once we have some initial protoype and conclusions. [GK] WP3 = reputation/recommendation.etc? This is allowed for in the overall model proposed by Matt G [1], as the rating functions would be "annotation function" values (or, approximately, "quality evidence" in the terms of Paolo's thesis). Again, it's not part of my immediate focus, though I'm not completely ignoring it.
- Provenance information also has to be considered by the RO model.
Architecture (ARCH)
- Should the local RO manager create a local ORE manifest locally, where it currently just uses the local file system content as a kind of manifest? This would introduce a "staging" operation to the RO manager ("ro add" was proposed), and might be useful for performing and/or recording the result of a local quality assessment operation.
References
(Mostly copied from Matt's email for now)
[1] Gamble, Matthew and Goble, Carole (2011). Quality, Trust, and Utility of Scientific Data on the Web: Towards a Joint Model. pp. 1-8. In: Proceedings of the ACM WebSci'11, June 14-17 2011, Koblenz, Germany. http://journal.webscience.org/443/
[2] Bizer, Christian. 2007. Quality-Driven Information Filtering in the Context of Web-Based Information Systems. Freie Universität Berlin.
[3] (2011) Quality, Trust, and Utility of Scientific Data on the Web: Towards a Joint Model Matthew Gamble, Carole Goble, Proceedings of the International Conference on Web Science 2011 (WebSci11)
[4] Naumann, Felix, and Claudia Rolker. 2000. Assessment methods for information quality criteria. Information Systems.
[5] Missier, Paolo. 2008. Modelling and Computing the Quality of Information in e-Science, PhD Thesis. http://www.cs.man.ac.uk/%7Epmissier/docs/Missier-Thesis.pdf
[6] Naumann, F, and C Rolker. 1999. Do Metadata Models meet IQ Requirements? Conference on Information Quality (IQ).
[7] RO command line tool. https://github.com/wf4ever/ro-manager
[8] Qurator data quality ontology in OWL (referenced by [5]): http://users.cs.cf.ac.uk/A.D.Preece/qurator/resources/DQOntology.owl
[9] MIBBI: Minimum Information for Biological and Biomedical Investigations. http://mibbi.org/
[10] JERM: Just Enough Results Model. http://www.sysmo-db.org/jerm
Appendix: relationship to overall Wf4Ever architecture
The "Integrity & Authenticity evaluation service", according to the project decoupled architecture, corresponds to one of the red boxes in image below, which was captured from the project Architecture face-to-face meeting in Oxford:
