Towards robust linguistic analysis using ontonotes

Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen Zhang, Zhi Zhong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

Original languageEnglish
Title of host publicationCoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages143-152
Number of pages10
ISBN (Electronic)9781937284701
Publication statusPublished - 1 Jan 2013
Event17th Conference on Computational Natural Language Learning, CoNLL 2013 - Sofia, Bulgaria
Duration: 8 Aug 20139 Aug 2013

Publication series

NameCoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference17th Conference on Computational Natural Language Learning, CoNLL 2013
CountryBulgaria
CitySofia
Period8/8/139/8/13

Fingerprint

Linguistics
Semantics
semantics
Syntactics
linguistics
genre
evaluation
discourse
integrated system
language
syntax
performance
bank

ASJC Scopus subject areas

  • Linguistics and Language
  • Artificial Intelligence
  • Human-Computer Interaction

Cite this

Pradhan, S., Moschitti, A., Xue, N., Ng, H. T., Björkelund, A., Uryupina, O., ... Zhong, Z. (2013). Towards robust linguistic analysis using ontonotes. In CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings (pp. 143-152). (CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL).

Towards robust linguistic analysis using ontonotes. / Pradhan, Sameer; Moschitti, Alessandro; Xue, Nianwen; Ng, Hwee Tou; Björkelund, Anders; Uryupina, Olga; Zhang, Yuchen; Zhong, Zhi.

CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2013. p. 143-152 (CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pradhan, S, Moschitti, A, Xue, N, Ng, HT, Björkelund, A, Uryupina, O, Zhang, Y & Zhong, Z 2013, Towards robust linguistic analysis using ontonotes. in CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings. CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), pp. 143-152, 17th Conference on Computational Natural Language Learning, CoNLL 2013, Sofia, Bulgaria, 8/8/13.
Pradhan S, Moschitti A, Xue N, Ng HT, Björkelund A, Uryupina O et al. Towards robust linguistic analysis using ontonotes. In CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL). 2013. p. 143-152. (CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings).
Pradhan, Sameer ; Moschitti, Alessandro ; Xue, Nianwen ; Ng, Hwee Tou ; Björkelund, Anders ; Uryupina, Olga ; Zhang, Yuchen ; Zhong, Zhi. / Towards robust linguistic analysis using ontonotes. CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings. Association for Computational Linguistics (ACL), 2013. pp. 143-152 (CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings).
@inproceedings{6a057681c7d144bc9b910aeff18c0a58,
title = "Towards robust linguistic analysis using ontonotes",
abstract = "Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.",
author = "Sameer Pradhan and Alessandro Moschitti and Nianwen Xue and Ng, {Hwee Tou} and Anders Bj{\"o}rkelund and Olga Uryupina and Yuchen Zhang and Zhi Zhong",
year = "2013",
month = "1",
day = "1",
language = "English",
series = "CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings",
publisher = "Association for Computational Linguistics (ACL)",
pages = "143--152",
booktitle = "CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings",

}

TY - GEN

T1 - Towards robust linguistic analysis using ontonotes

AU - Pradhan, Sameer

AU - Moschitti, Alessandro

AU - Xue, Nianwen

AU - Ng, Hwee Tou

AU - Björkelund, Anders

AU - Uryupina, Olga

AU - Zhang, Yuchen

AU - Zhong, Zhi

PY - 2013/1/1

Y1 - 2013/1/1

N2 - Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

AB - Large-scale linguistically annotated corpora have played a crucial role in advancing the state of the art of key natural language technologies such as syntactic, semantic and discourse analyzers, and they serve as training data as well as evaluation benchmarks. Up till now, however, most of the evaluation has been done on monolithic corpora such as the Penn Treebank, the Proposition Bank. As a result, it is still unclear how the state-of-the-art analyzers perform in general on data from a variety of genres or domains. The completion of the OntoNotes corpus, a large-scale, multi-genre, multilingual corpus manually annotated with syntactic, semantic and discourse information, makes it possible to perform such an evaluation. This paper presents an analysis of the performance of publicly available, state-of-the-art tools on all layers and languages in the OntoNotes v5.0 corpus. This should set the benchmark for future development of various NLP components in syntax and semantics, and possibly encourage research towards an integrated system that makes use of the various layers jointly to improve overall performance.

UR - http://www.scopus.com/inward/record.url?scp=85072757969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072757969&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85072757969

T3 - CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings

SP - 143

EP - 152

BT - CoNLL 2013 - 17th Conference on Computational Natural Language Learning, Proceedings

PB - Association for Computational Linguistics (ACL)

ER -