Sentence-based active learning strategies for information extraction

Andrea Esuli, Diego Marcheggiani, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction. Copyright owned by the authors.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Pages41-45
Number of pages5
Volume560
Publication statusPublished - 2010
Externally publishedYes
Event1st Italian Information Retrieval Workshop, IIR 2010 - Padua, Italy
Duration: 27 Jan 201028 Jan 2010

Other

Other1st Italian Information Retrieval Workshop, IIR 2010
CountryItaly
CityPadua
Period27/1/1028/1/10

Fingerprint

Classifiers
Problem-Based Learning

Keywords

  • Active learning
  • Information extraction
  • Named entity recognition
  • Selective sampling

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Esuli, A., Marcheggiani, D., & Sebastiani, F. (2010). Sentence-based active learning strategies for information extraction. In CEUR Workshop Proceedings (Vol. 560, pp. 41-45). CEUR-WS.

Sentence-based active learning strategies for information extraction. / Esuli, Andrea; Marcheggiani, Diego; Sebastiani, Fabrizio.

CEUR Workshop Proceedings. Vol. 560 CEUR-WS, 2010. p. 41-45.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Esuli, A, Marcheggiani, D & Sebastiani, F 2010, Sentence-based active learning strategies for information extraction. in CEUR Workshop Proceedings. vol. 560, CEUR-WS, pp. 41-45, 1st Italian Information Retrieval Workshop, IIR 2010, Padua, Italy, 27/1/10.
Esuli A, Marcheggiani D, Sebastiani F. Sentence-based active learning strategies for information extraction. In CEUR Workshop Proceedings. Vol. 560. CEUR-WS. 2010. p. 41-45
Esuli, Andrea ; Marcheggiani, Diego ; Sebastiani, Fabrizio. / Sentence-based active learning strategies for information extraction. CEUR Workshop Proceedings. Vol. 560 CEUR-WS, 2010. pp. 41-45
@inproceedings{f7ec4b87b4b941edbfc853a21b32c2c6,
title = "Sentence-based active learning strategies for information extraction",
abstract = "Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction. Copyright owned by the authors.",
keywords = "Active learning, Information extraction, Named entity recognition, Selective sampling",
author = "Andrea Esuli and Diego Marcheggiani and Fabrizio Sebastiani",
year = "2010",
language = "English",
volume = "560",
pages = "41--45",
booktitle = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Sentence-based active learning strategies for information extraction

AU - Esuli, Andrea

AU - Marcheggiani, Diego

AU - Sebastiani, Fabrizio

PY - 2010

Y1 - 2010

N2 - Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction. Copyright owned by the authors.

AB - Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction. Copyright owned by the authors.

KW - Active learning

KW - Information extraction

KW - Named entity recognition

KW - Selective sampling

UR - http://www.scopus.com/inward/record.url?scp=84888269328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84888269328&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84888269328

VL - 560

SP - 41

EP - 45

BT - CEUR Workshop Proceedings

PB - CEUR-WS

ER -