Automatic expansion of domain-specific lexicons by term categorization

Henri Avancini, Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each ci in a predefined set C = {c1 , . . . , cm} of semantic domains, an initial lexicon L0i into a larger lexicon L 1i. Our approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a wellknown large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

Original languageEnglish
Pages (from-to)1-30
Number of pages30
JournalACM Transactions on Speech and Language Processing
Volume3
Issue number1
DOIs
Publication statusPublished - 2006
Externally publishedYes

    Fingerprint

Keywords

  • Lexicons
  • Machine learning
  • Text classification

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computational Mathematics

Cite this