2011-09-26 Annotation model considerations

Skip to end of metadata
Go to start of metadata
Archived considerations
This page summarises the considerations on choosing a model for annotations. In particular it is an archive of how the Annotating-section of the RO model wiki page was until the recommendation to go for AO was chosen.

Annotating Research objects

Instead of designing a new model for annotating research objects and their constituent resources, we are investigating two RDF-based models, namely the Open Annotation and Collaboration (OAC) and the Annotation Ontology (AO).

Open Annotation and Collaboration (OAC)

The description below mainly focuses on OAC as this is the model we investigated first. A comparison with a similar approach using AO follows.

OAC specifies an approach for associating resources with annotations. The annotation model adopted by OAC is illustrated below. An annotation is defined as a document, identified by an URI, which describes the association created between two resources, a body and a target. The annotation body is a resource that "is somehow about" the resource designated by the annotation target.


Figure extracted from http://www.openannotation.org/spec/beta/. A1 is an annotation linking the annotation body B1 with the annotation target T1, meaning that B1 is describing T1.

The OAC specification does not put any requirements to the body, it could be any kind of resource (like a text document, video, etc) which in some way describes or talks about the target resource.

Using OAC, one can annotate the research object and its content. In particular such annotations can be used to describe the aggregated resources as well as specifying relationships between those resources.

In the beta OAC specification, the subclass oac:DataAnnotation is introduced to indicate that the annotation body is a structured data annotation meant for computer consumption. As research objects require structured annotations with relationships and use of controlled vocabularies, we utilize OAC data annotations for denoting annotation bodies containing RDF graphs.

Other "classic OAC" annotations could still be added to the research object, for instance "This PDF talks about this spreadsheet". OAC also allows the use of constrains and fragments to narrow the annotation target to a particular selection of the document. Although such annotations would be limited to "talks about" relationships compared to using rich RO vocabularies in a data annotation, they would allow clients with an understanding of only OAC to relate the two resources, in addition to providing a mechanism for relating to sub-selections of a resource. It is currently out of scope for this document to explore this aspect further.

As an example of a data annotation, the following illustrate how the research object itself can be associated with an annotation body specifying a title. This resource also includes metadata about who created the annotation body, ie. who gave the title.

# Default graph of all annotations
{
 :title a oac:Annotation, oac:DataAnnotation ;
	oac:hasTarget manifest:ro ;
	oac:hasBody ann:titleByStian .

 ann:titleByStian dcterms:created "2011-07-14T15:05:43"^^xsd:dateTime ;
	dcterms:creator _:stian .
}

# Named graph with the actual annotation body.
# Retrieved or embedded separately in formats lacking support for named graphs.
ann:titleByStian {
	manifest:ro dc:title "My research object"@en .
	manifest:ro dc:description "This RO showcases my example workflow"@en .
}

Note that the body of the annotation is here a named graph containing two statements used to associate the research object with a title and a description. Using a named graph representation like TriG allows us to include the annotation body directly, while other representations like RDF/XML would require a separate retrieval of the annotation body resource, RDF reification or Content in RDF.

This separation of annotation metadata and annotation bodies allows multiple structured (and potentially inconsistent) annotations to be asserted about resources in the research object. The annotation bodies are not limited to describe only aggregated resources, but may also relate them to other resources and controlled vocabularies.

OAC allows multiple targets of an annotation, this enables us to specify relationships between resources that compose the research object. As an example, the following specifies the relationship between the proxies manifest:inputProxy, manifest:outputProxy and manifest:workflowProxy. It specifies that there exists a workflow run that is an instance of manifest:workflowProxy, that consumed manifest:inputProxy and produced manifest:inputProxy. In this example we don't know much more about the workflow run itself (it might be described in depth using provenance ontologies), so it's here only given as an anonymous node within the annotation body. ("there exists a run such that..)

{
 :roRelation a oac:Annotation, oac:DataAnnotation ;
	oac:hasTarget manifest:inputProxy, manifest:workflowProxy,
	              manifest:outputProxy ;
	oac:hasBody ann:roRelation.
}

ann:roRelation {

 # Classifies the resources within the research object
 manifest:inputProxy a rel:Input .
 manifest:outputProxy a rel:Output .
 manifest:workflowProxy a rel:Workflow .

 # An (anonymous) workflow run relating them to each-other
 _:run a rel:Run ;
	rel:hasInput manifest:inputProxy ;
	rel:hasOutput manifest:outputProxy ;
	rel:hasWorkflow manifest:workflowProxy .
}


Example RDF Graph of Research Object using OAC data annotations. Also available as PDF, OmniGraffle

Annotation Ontology (AO)

As an alternative to OAC, annotations could be specified using the Annotation Ontology. The Annotation Ontology provides a common model for document metadata derived from text mining and manual annotation of scientific papers. Specifically, it provides the means for annotating electronic documents or parts of electronic documents. Different kinds of annotations can be classified, e.g., comment, notes, examples, erratum, etc.

AO is normally applied such that an annotation is the creation of a link between an annotated document, and an annotation topic.


From http://code.google.com/p/annotation-ontology/wiki/Annotation, an annotated document has been annotated to have the topic of the enzyme beta-secretase 1. Additional metadata about when the document was retrieved is provided using PAV.

Note that the statements within the "annotation topic" box (ie. name) is not part of the annotation, that is just additional information about a term from a controller vocabulary. The annotation is in AO viewed as a "yellow marker"-type entity that links a (previously known) topic with a document.

From this example above (using aot:Qualifier) one could strictly argue that for our annotation bodies, AO should be applied 'opposite' to how we used OAC, as the annotation bodies have the aggregated resources as their topics. We feel that this is somewhat counter-intuitive, as our motivation was to find a mechanism for attaching rich descriptions to aggregated resources. However, AO encourages specialisation through subclassing ao:Annotation, for instance an aot:Note relates an ann:body as a free-text note describing (a sub-selection of) the annotated document.

AO can therefore be used in a manner similar to OAC data annotations, by introducing a new subclass ro:DataAnnotation. In this style, ao:annotatesResource (rather than aof:annotatesDocument as we're not sure if our resource can be considered a foaf:Document) is used to indicate the annotated resource, and ao:body is used to indicate the content of the annotation, here as well referenced as a named graph or separate resource.

No ao:context is provided unless the annotation is talking about a selection of the resource. An advantage over OAC here is that clients who don't understand the AO selection can simply always follow ao:annotatesResource to find the aggregated resource - in OAC these are declared by indirection of the oac:hasTarget as a constrained target or a fragment selector.

AO example assuming ro:DataAnnotation using named graphs in TriG format:

{
 :description a ao:Annotation, ro:DataAnnotation ;
    ao:annotatesResource manifest:ro ;
    ao:body an:titleByStian ;
    pav:createdBy _:stian ;
    pav:createdOn "2011-07-14T15:02:14Z"^^xsd:dateTime .
}

an:titleByStian {
	manifest:ro dc:title "My research object"@en .
	manifest:ro dc:description "This RO showcases my example workflow"@en .
}


Example RDF Graph of Research Object using AO annotations as nested graphs. Also available as PDF, OmniGraffle

Example RDF Graph of Research Object using AO annotations as nested graphs. Also available as PDF, OmniGraffle

Update:

2012-03-15 Open Annotation

An effort to merge the AO and OAC models into the "Open Annotation" model has started as part of the W3C Open Annotation Community Group.

On 2012-03-15 the AO and OAC communities met in Boston for a Open Annotation Technical Meeting. Many agreements were reached on merging the two models.

Here is a picture of Robert Sanderson presenting the outcome of the meeting, the "OA" model:

  • A1 is the oa:Annotation. It has has a minimum one oa:hasTarget, the resource(s) this annotation is about.
  • oa:hasBody points to an (optional) resource which is somewhat about the target(s).  Thus this annotation says that S1 is about ST. S1 might be a retrievable web resource, or have an UUID with the (typically lightweight) body embedded using Content in RDF. 
  • The annotation can indicate a series of oa:hasSemanticTag. This is a scruffy version to say that an ontological term is related to the target. 
  • The annotation can be an instance of the subclass oa:DataAnnotation, this indicates that the (now required) body is intended for computational processing
  • The subclass oa:GraphAnnotation is a subclass of oa:DataAnnotation, indicating that the body can be seen as an RDF graph. The graph can either be retrieved from the URI of the body, be embedded as a known RDF serialisation using Content in RDF, or be a named graph (if the annotation is within a quad-serialization/store).
  • The target can be represented as any resource directly (oa:hasTarget <http://example.com/>) or by an intermediate oa:SpecificTarget node, which indicates the resource using* the required *oa:hasSource. The annotation (and its body) can then talk about the specific target rather than the whole resource, for instance a part of an image instead of the whole image.
  • oa:hasSetup *can specify 0 or more alternate *oa:Setups for specifying how the resource is to be retrieved and prepared before applying the selectors. For instance a subclass of ao:Setup could specify HTTP accept and language headers for retrieval, while a more specific subclass could specify how to rotate a 3d molecule. Multiple setups indicate alternative setups - applications SHOULD try to apply one of the setup, but choose their own preferences between them.
  • oa:hasSelector indicates 0 or more alternate oa:Selectors for specifying a subset of the resource - for instance a paragraph in an HTML page or a rectangle cutout from an image. This is an extension point, the OA core does not specify any seectors, except possibly the oa:AllSelector (?). Multiple selectors indicate alternative ways to select the same intended selection - although they are not required to be one-to-one matching. (for instance: rectangle selector vs. polygon selectors). As an extension point, different domains will make different selectors, and applications can pick the selectors they understand (if any), and will have their own preferences in case multiple can be applied. 
  • oa:hasStyle indicates 0 or more alternate oa:Styles for specifying how the annotator would prefer the target to be rendered/shown. For instance in a CSS selector, specifying border: red can make a red border appear around the selection. 
  • The application is free to find which combination of setup, selector and style it can apply, in particular for rendering an annotation.
  • Annotation metadata should be standardized to specify the generator, creator, created - but no decision was made as to if these should be taken from existing vocabularies like Dublin Core Terms or PAV. The distinction between creator and generator is that the generator is what made the RDF (typically software), creator is who made the annotation (typically a person). These could also be specified on the body if they are different from the OA creator/generator/created.
  • The oa:modelVersion (called 'version' in picture) specifies the version of OA used. 

Overall the new proposed model should be a good fit for our case, in particular because oa:GraphAnnotation matches our ro:GraphAnnotation.

It would only be minimal changes to our current use of AO to use the new OA instead.

For example, our current RO 0.2 approach using AO 2:

:ann1 a ao:GraphAnnotation ;
  ao:annotatesResource <aworkflow.t2flow> ;
  ao:body <ann1.rdf> .

becomes:

:ann1 a oa:GraphAnnotation ;
  oa:hasTarget <aworkflow.t2flow> ;
  oa:hasBody <ann1.rdf> .

But the devil is in the detail, and the DC/PAV style metadata (creator, date) has not yet been agreed on.

Labels

oac oac Delete
ao ao Delete
annotation annotation Delete
ontology ontology Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.