Wide-coverage spanish named entity extraction

Xavier Carreras, Lluis Marques, Lluís Padró

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper presents a proposal for wide-coverage Named EntityExtraction for Spanish. The extraction of named entities is treated using robust Machine Learning techniques (AdaBoost) and simple attributes requiring non-linguisticallypro cessed corpora, complemented with external information sources (a list of trigger words and a gazetteer). A thorough evaluation of the task on real corpora is presented in order to validate the appropriateness of the approach. The non linguistic nature of used features makes the approach easilyp ortable to other languages.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages674-683
Number of pages10
Volume2527 LNAI
Publication statusPublished - 1 Dec 2002
Externally publishedYes
Event8th Ibero-American Conference on Artificial Intelligence, IBERAMIA 2002 - Seville, Spain
Duration: 12 Nov 200215 Nov 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2527 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other8th Ibero-American Conference on Artificial Intelligence, IBERAMIA 2002
CountrySpain
CitySeville
Period12/11/0215/11/02

Fingerprint

Adaptive boosting
Linguistics
Learning systems
Coverage
AdaBoost
Trigger
Machine Learning
Attribute
Evaluation
Corpus
Language

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Carreras, X., Marques, L., & Padró, L. (2002). Wide-coverage spanish named entity extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2527 LNAI, pp. 674-683). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2527 LNAI).

Wide-coverage spanish named entity extraction. / Carreras, Xavier; Marques, Lluis; Padró, Lluís.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2527 LNAI 2002. p. 674-683 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2527 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Carreras, X, Marques, L & Padró, L 2002, Wide-coverage spanish named entity extraction. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2527 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2527 LNAI, pp. 674-683, 8th Ibero-American Conference on Artificial Intelligence, IBERAMIA 2002, Seville, Spain, 12/11/02.
Carreras X, Marques L, Padró L. Wide-coverage spanish named entity extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2527 LNAI. 2002. p. 674-683. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Carreras, Xavier ; Marques, Lluis ; Padró, Lluís. / Wide-coverage spanish named entity extraction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2527 LNAI 2002. pp. 674-683 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{dc3b5908135f4892842802b8c0516570,
title = "Wide-coverage spanish named entity extraction",
abstract = "This paper presents a proposal for wide-coverage Named EntityExtraction for Spanish. The extraction of named entities is treated using robust Machine Learning techniques (AdaBoost) and simple attributes requiring non-linguisticallypro cessed corpora, complemented with external information sources (a list of trigger words and a gazetteer). A thorough evaluation of the task on real corpora is presented in order to validate the appropriateness of the approach. The non linguistic nature of used features makes the approach easilyp ortable to other languages.",
author = "Xavier Carreras and Lluis Marques and Llu{\'i}s Padr{\'o}",
year = "2002",
month = "12",
day = "1",
language = "English",
isbn = "354000131X",
volume = "2527 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "674--683",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Wide-coverage spanish named entity extraction

AU - Carreras, Xavier

AU - Marques, Lluis

AU - Padró, Lluís

PY - 2002/12/1

Y1 - 2002/12/1

N2 - This paper presents a proposal for wide-coverage Named EntityExtraction for Spanish. The extraction of named entities is treated using robust Machine Learning techniques (AdaBoost) and simple attributes requiring non-linguisticallypro cessed corpora, complemented with external information sources (a list of trigger words and a gazetteer). A thorough evaluation of the task on real corpora is presented in order to validate the appropriateness of the approach. The non linguistic nature of used features makes the approach easilyp ortable to other languages.

AB - This paper presents a proposal for wide-coverage Named EntityExtraction for Spanish. The extraction of named entities is treated using robust Machine Learning techniques (AdaBoost) and simple attributes requiring non-linguisticallypro cessed corpora, complemented with external information sources (a list of trigger words and a gazetteer). A thorough evaluation of the task on real corpora is presented in order to validate the appropriateness of the approach. The non linguistic nature of used features makes the approach easilyp ortable to other languages.

UR - http://www.scopus.com/inward/record.url?scp=79952269721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952269721&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:79952269721

SN - 354000131X

SN - 9783540001317

VL - 2527 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 674

EP - 683

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -