Automatic expansion of domain-specific lexicons by term categorization

Henri Avancini, Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each ci in a predefined set C = {c1 , . . . , cm} of semantic domains, an initial lexicon L0i into a larger lexicon L 1i. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a wellknown large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

Original languageEnglish
Pages (from-to)1-30
Number of pages30
JournalACM Transactions on Speech and Language Processing
Volume3
Issue number1
DOIs
Publication statusPublished - 2006
Externally publishedYes

Fingerprint

Categorization
Classifiers
Supervised learning
Term
Labeling
Experiments
Semantics
Classifier
Text Categorization
Supervised Learning
Boosting
Large Set
Experiment
Benchmark

Keywords

  • Lexicons
  • Machine learning
  • Text classification

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computational Mathematics

Cite this

Automatic expansion of domain-specific lexicons by term categorization. / Avancini, Henri; Lavelli, Alberto; Sebastiani, Fabrizio; Zanoli, Roberto.

In: ACM Transactions on Speech and Language Processing, Vol. 3, No. 1, 2006, p. 1-30.

Research output: Contribution to journalArticle

Avancini, Henri ; Lavelli, Alberto ; Sebastiani, Fabrizio ; Zanoli, Roberto. / Automatic expansion of domain-specific lexicons by term categorization. In: ACM Transactions on Speech and Language Processing. 2006 ; Vol. 3, No. 1. pp. 1-30.
@article{dddf51dd323c4889ac494368a3d23cc8,
title = "Automatic expansion of domain-specific lexicons by term categorization",
abstract = "We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each ci in a predefined set C = {c1 , . . . , cm} of semantic domains, an initial lexicon L0i into a larger lexicon L 1i. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a wellknown large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.",
keywords = "Lexicons, Machine learning, Text classification",
author = "Henri Avancini and Alberto Lavelli and Fabrizio Sebastiani and Roberto Zanoli",
year = "2006",
doi = "10.1145/1138379.1138380",
language = "English",
volume = "3",
pages = "1--30",
journal = "ACM Transactions on Speech and Language Processing",
issn = "1550-4875",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - Automatic expansion of domain-specific lexicons by term categorization

AU - Avancini, Henri

AU - Lavelli, Alberto

AU - Sebastiani, Fabrizio

AU - Zanoli, Roberto

PY - 2006

Y1 - 2006

N2 - We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each ci in a predefined set C = {c1 , . . . , cm} of semantic domains, an initial lexicon L0i into a larger lexicon L 1i. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a wellknown large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

AB - We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each ci in a predefined set C = {c1 , . . . , cm} of semantic domains, an initial lexicon L0i into a larger lexicon L 1i. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a wellknown large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

KW - Lexicons

KW - Machine learning

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=33746443840&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746443840&partnerID=8YFLogxK

U2 - 10.1145/1138379.1138380

DO - 10.1145/1138379.1138380

M3 - Article

VL - 3

SP - 1

EP - 30

JO - ACM Transactions on Speech and Language Processing

JF - ACM Transactions on Speech and Language Processing

SN - 1550-4875

IS - 1

ER -