OLiA core concepts for linguistic annotations.
2008-01-13 created
2010-04-06 removed deprecated Category (equiv UnitOfAnnotation) category
2010-04-14 added AnnotationProcess (cf. DCR process directory)
2011-07-15 replaced base url by purl
2011-07-27 added hasTagMatching with full support for XSLT-style regular expressions
2013-06-27 added ISOcat reference for LinguisticAnnotation
Christian Chiarcos, chiarcos@uni-potsdam.de
TODO: LinguisticAnnotation disjoint
Linguistic annotations pertain to either structural entities (words, tokens, constituents, sentences => UnitOfAnnotation), relations between these (dependency, dominance, coreference, etc. => Relation), or grammatical features attached to these (case, gender, number, agreement, tense, mood, aspect, ... => Feature).
http://www.isocat.org/datcat/DC-1857
label: Text attached to an element
A UnitOfAnnotation is a structural entity that can be annotated, e.g., a word, token, constituent, or another types of markable.
Word classes, etc., are then modelled as indirect children of UnitOfAnnotation. The division between Features and classes of UnitsOfAnnotation follows conventional standards.
UnitsOfAnnotation and Relations can be described in a more detailed way by the features that are attached to them, e.g., case, number, or aspect. Features are, however, not subject to further annotations (by means of hasFeature), they are thus disjoint from Relations and UnitsOfAnnotation.
1
1
Between instances of two Categories, a Relation can be established that links these together, e.g., a dominance relation (constituent X is grammatical subject of sentence Y), a dependency relation (token X is a modifier of token Y), a discourse relation (discourse unit X is in contrast to discourse unit Y), an anaphoric relation (referring expression X is coreferent with referring expressing Y), an alignment relation (word X expresses the same meaning as word Y).
Note that Relation and UnitOfAnnotation are not disjoint, because in some approaches, establishing a Relation between two concepts entails that a structural entity is formed, consisting of Relation and the Categories connected by the Relation, e.g., in Rhetorical Structure Theory (Mann and Thompson 1987).
1
1
A Relation is a directed edge between a source and a target concept.
A Relation is a directed edge between a source and a target concept.
A UnitOfAnnotation or a Relation can carry Features that express annotations attached to them. By convention, (tags that represent) Features can be linked with Feature individuals, as well. Because of this reflexivity, a predicate like hasDegree(positive) allows to retrieve the individual positive as well. (This is necessary if positive represents a tag on its own, rather than being a property of a complex tag.)
Assigns a Linguistic Annotation a String representation, e.g., a particular Part of Speech tag, the respective abbreviation of the grammatical cases used in an annotation scheme, etc.
implicit semantics: hasTag is to be used if the tag is equal to the string value, otherwise use hasTagContaining, hasTagStartingWith, hasTagEndingWith
As opposed to hasTag proper, the string representation defines a substring of a concrete annotation.
The respective linguistic annotation then applies to every element whose annotation (tag) contains this substring.
As opposed to hasTag proper, the string representation specifies only the beginning of a concrete annotation.
The respective linguistic annotation then applies to every element whose annotation (tag) startsWith this substring.
As opposed to hasTag proper, the string representation defines the final subsequence of a concrete annotation.
The respective linguistic annotation then applies to every element whose annotation (tag) ends with this substring.
hasTagMatching is a subproperty of hasTag, so that results can be retrieved if the regular expression match is requested, but an exact value with reserved characters is defined
hasTagMatching allows to provide regular expressions as those used in XSLT and XPath, see http://www.w3.org/TR/xquery-operators/#func-matches
Assigns a linguistic annotation a string representation of the annotation layer ("tier", "level") where it is to be found, e.g., "pos" for Part of Speech annotation, "gloss" for linguistic glosses, etc.
DCR annotation and editing operations ignored, e.g., add first vowel http://www.isocat.org/datcat/DC-2199
categories for annotation and editing operations added to account for the "Processes" profile in the DCR