Experiments on the use of feature selection and negative evidence in automated text categorization

Luigi Galavotti, Fabrizio Sebastiani, Maria Simi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

123 Citations (Scopus)

Abstract

We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages59-68
Number of pages10
Volume1923
ISBN (Print)3540410236, 9783540410232
Publication statusPublished - 2000
Externally publishedYes
Event4th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2000 - Lisbon, Portugal
Duration: 18 Sep 200020 Sep 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1923
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other4th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2000
CountryPortugal
CityLisbon
Period18/9/0020/9/00

Fingerprint

Text Categorization
Feature Selection
Feature extraction
Classifiers
Classifier
Proof by induction
Experiment
Experiments
Experimentation
Exploitation
Statistics
Benchmark
Distinct
Subset
Evidence

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Galavotti, L., Sebastiani, F., & Simi, M. (2000). Experiments on the use of feature selection and negative evidence in automated text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1923, pp. 59-68). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1923). Springer Verlag.

Experiments on the use of feature selection and negative evidence in automated text categorization. / Galavotti, Luigi; Sebastiani, Fabrizio; Simi, Maria.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1923 Springer Verlag, 2000. p. 59-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1923).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Galavotti, L, Sebastiani, F & Simi, M 2000, Experiments on the use of feature selection and negative evidence in automated text categorization. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1923, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1923, Springer Verlag, pp. 59-68, 4th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2000, Lisbon, Portugal, 18/9/00.
Galavotti L, Sebastiani F, Simi M. Experiments on the use of feature selection and negative evidence in automated text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1923. Springer Verlag. 2000. p. 59-68. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Galavotti, Luigi ; Sebastiani, Fabrizio ; Simi, Maria. / Experiments on the use of feature selection and negative evidence in automated text categorization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1923 Springer Verlag, 2000. pp. 59-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a26d131a78c443d5878580608e5fab67,
title = "Experiments on the use of feature selection and negative evidence in automated text categorization",
abstract = "We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark",
author = "Luigi Galavotti and Fabrizio Sebastiani and Maria Simi",
year = "2000",
language = "English",
isbn = "3540410236",
volume = "1923",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "59--68",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Experiments on the use of feature selection and negative evidence in automated text categorization

AU - Galavotti, Luigi

AU - Sebastiani, Fabrizio

AU - Simi, Maria

PY - 2000

Y1 - 2000

N2 - We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark

AB - We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r' « r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of automatically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark

UR - http://www.scopus.com/inward/record.url?scp=84937392228&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937392228&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540410236

SN - 9783540410232

VL - 1923

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 59

EP - 68

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -