Expanding domain-specific lexicons by term categorization

Henri Avancini, Alberto Lavelli, Bernardo Magnini, Fabrizio Sebastiani, Roberto Zanoli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Citations (Scopus)

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,...,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

Original languageEnglish
Title of host publicationProceedings of the ACM Symposium on Applied Computing
EditorsG. Lamont
Pages793-797
Number of pages5
Publication statusPublished - 2003
Externally publishedYes
EventProceedings of the 2003 ACM Symposium on Applied Computing - Melbourne, FL
Duration: 9 Mar 200312 Mar 2003

Other

OtherProceedings of the 2003 ACM Symposium on Applied Computing
CityMelbourne, FL
Period9/3/0312/3/03

Fingerprint

Information retrieval
Labeling
Learning systems
Labels

Keywords

  • Lexicon generation
  • Term categorization
  • WordNet

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Avancini, H., Lavelli, A., Magnini, B., Sebastiani, F., & Zanoli, R. (2003). Expanding domain-specific lexicons by term categorization. In G. Lamont (Ed.), Proceedings of the ACM Symposium on Applied Computing (pp. 793-797)

Expanding domain-specific lexicons by term categorization. / Avancini, Henri; Lavelli, Alberto; Magnini, Bernardo; Sebastiani, Fabrizio; Zanoli, Roberto.

Proceedings of the ACM Symposium on Applied Computing. ed. / G. Lamont. 2003. p. 793-797.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Avancini, H, Lavelli, A, Magnini, B, Sebastiani, F & Zanoli, R 2003, Expanding domain-specific lexicons by term categorization. in G Lamont (ed.), Proceedings of the ACM Symposium on Applied Computing. pp. 793-797, Proceedings of the 2003 ACM Symposium on Applied Computing, Melbourne, FL, 9/3/03.
Avancini H, Lavelli A, Magnini B, Sebastiani F, Zanoli R. Expanding domain-specific lexicons by term categorization. In Lamont G, editor, Proceedings of the ACM Symposium on Applied Computing. 2003. p. 793-797
Avancini, Henri ; Lavelli, Alberto ; Magnini, Bernardo ; Sebastiani, Fabrizio ; Zanoli, Roberto. / Expanding domain-specific lexicons by term categorization. Proceedings of the ACM Symposium on Applied Computing. editor / G. Lamont. 2003. pp. 793-797
@inproceedings{a412b7b37d3c4c22bd840cfeaf2ad0c3,
title = "Expanding domain-specific lexicons by term categorization",
abstract = "We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,...,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.",
keywords = "Lexicon generation, Term categorization, WordNet",
author = "Henri Avancini and Alberto Lavelli and Bernardo Magnini and Fabrizio Sebastiani and Roberto Zanoli",
year = "2003",
language = "English",
pages = "793--797",
editor = "G. Lamont",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",

}

TY - GEN

T1 - Expanding domain-specific lexicons by term categorization

AU - Avancini, Henri

AU - Lavelli, Alberto

AU - Magnini, Bernardo

AU - Sebastiani, Fabrizio

AU - Zanoli, Roberto

PY - 2003

Y1 - 2003

N2 - We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,...,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

AB - We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,...,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

KW - Lexicon generation

KW - Term categorization

KW - WordNet

UR - http://www.scopus.com/inward/record.url?scp=0038675278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038675278&partnerID=8YFLogxK

M3 - Conference contribution

SP - 793

EP - 797

BT - Proceedings of the ACM Symposium on Applied Computing

A2 - Lamont, G.

ER -