A flexible representation of heterogeneous annotation data

Richard Johansson, Alessandro Moschitti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information - from low-level features to high-level semantics - in a multimodal data collection containing both text and images.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages3712-3715
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

Fingerprint

semantics
Annotation
Data Collection
software
File
Software
Metadata

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Johansson, R., & Moschitti, A. (2010). A flexible representation of heterogeneous annotation data. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 3712-3715). European Language Resources Association (ELRA).

A flexible representation of heterogeneous annotation data. / Johansson, Richard; Moschitti, Alessandro.

Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. p. 3712-3715.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Johansson, R & Moschitti, A 2010, A flexible representation of heterogeneous annotation data. in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), pp. 3712-3715, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/5/10.
Johansson R, Moschitti A. A flexible representation of heterogeneous annotation data. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 3712-3715
Johansson, Richard ; Moschitti, Alessandro. / A flexible representation of heterogeneous annotation data. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. pp. 3712-3715
@inproceedings{01cff9d2fa184b10bef1d6ade9e9c121,
title = "A flexible representation of heterogeneous annotation data",
abstract = "This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information - from low-level features to high-level semantics - in a multimodal data collection containing both text and images.",
author = "Richard Johansson and Alessandro Moschitti",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "3712--3715",
booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - A flexible representation of heterogeneous annotation data

AU - Johansson, Richard

AU - Moschitti, Alessandro

PY - 2010/1/1

Y1 - 2010/1/1

N2 - This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information - from low-level features to high-level semantics - in a multimodal data collection containing both text and images.

AB - This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information - from low-level features to high-level semantics - in a multimodal data collection containing both text and images.

UR - http://www.scopus.com/inward/record.url?scp=84889939664&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889939664&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84889939664

SP - 3712

EP - 3715

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

PB - European Language Resources Association (ELRA)

ER -