An improved automatic term recognition method for spanish

Alberto Barron, Gerardo Sierra, Patrick Drouin, Sophia Ananiadou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages125-136
Number of pages12
Volume5449 LNCS
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: 1 Mar 20097 Mar 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
CountryMexico
CityMexico City
Period1/3/097/3/09

Fingerprint

Term
Linguistics
Hybrid Approach
Maximise
Necessary
Output

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Barron, A., Sierra, G., Drouin, P., & Ananiadou, S. (2009). An improved automatic term recognition method for spanish. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5449 LNCS, pp. 125-136). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5449 LNCS). https://doi.org/10.1007/978-3-642-00382-0_10

An improved automatic term recognition method for spanish. / Barron, Alberto; Sierra, Gerardo; Drouin, Patrick; Ananiadou, Sophia.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS 2009. p. 125-136 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5449 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barron, A, Sierra, G, Drouin, P & Ananiadou, S 2009, An improved automatic term recognition method for spanish. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5449 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5449 LNCS, pp. 125-136, 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009, Mexico City, Mexico, 1/3/09. https://doi.org/10.1007/978-3-642-00382-0_10
Barron A, Sierra G, Drouin P, Ananiadou S. An improved automatic term recognition method for spanish. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS. 2009. p. 125-136. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-00382-0_10
Barron, Alberto ; Sierra, Gerardo ; Drouin, Patrick ; Ananiadou, Sophia. / An improved automatic term recognition method for spanish. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS 2009. pp. 125-136 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{607311825a2946358105862e22f75313,
title = "An improved automatic term recognition method for spanish",
abstract = "The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.",
author = "Alberto Barron and Gerardo Sierra and Patrick Drouin and Sophia Ananiadou",
year = "2009",
doi = "10.1007/978-3-642-00382-0_10",
language = "English",
isbn = "3642003818",
volume = "5449 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "125--136",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - An improved automatic term recognition method for spanish

AU - Barron, Alberto

AU - Sierra, Gerardo

AU - Drouin, Patrick

AU - Ananiadou, Sophia

PY - 2009

Y1 - 2009

N2 - The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.

AB - The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the C-value calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.

UR - http://www.scopus.com/inward/record.url?scp=67650561675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650561675&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00382-0_10

DO - 10.1007/978-3-642-00382-0_10

M3 - Conference contribution

SN - 3642003818

SN - 9783642003813

VL - 5449 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 125

EP - 136

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -