Using micro-documents for feature selection: The case of ordinal text classification

Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Most popular feature selection methods for text classification such as information gain (also known as "mutual information"), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or "term") in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training "micro-documents", each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

Original languageEnglish
Pages (from-to)4687-4696
Number of pages10
JournalExpert Systems with Applications
Volume40
Issue number11
DOIs
Publication statusPublished - 1 Sep 2013
Externally publishedYes

Fingerprint

Feature extraction
Support vector machines
Experiments

Keywords

  • Feature selection
  • Ordinal regression
  • Supervised learning
  • Text classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Using micro-documents for feature selection : The case of ordinal text classification. / Baccianella, Stefano; Esuli, Andrea; Sebastiani, Fabrizio.

In: Expert Systems with Applications, Vol. 40, No. 11, 01.09.2013, p. 4687-4696.

Research output: Contribution to journalArticle

Baccianella, Stefano ; Esuli, Andrea ; Sebastiani, Fabrizio. / Using micro-documents for feature selection : The case of ordinal text classification. In: Expert Systems with Applications. 2013 ; Vol. 40, No. 11. pp. 4687-4696.
@article{acbba15832a24b14b114f66bd8bdf8c6,
title = "Using micro-documents for feature selection: The case of ordinal text classification",
abstract = "Most popular feature selection methods for text classification such as information gain (also known as {"}mutual information{"}), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or {"}term{"}) in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training {"}micro-documents{"}, each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.",
keywords = "Feature selection, Ordinal regression, Supervised learning, Text classification",
author = "Stefano Baccianella and Andrea Esuli and Fabrizio Sebastiani",
year = "2013",
month = "9",
day = "1",
doi = "10.1016/j.eswa.2013.02.010",
language = "English",
volume = "40",
pages = "4687--4696",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "11",

}

TY - JOUR

T1 - Using micro-documents for feature selection

T2 - The case of ordinal text classification

AU - Baccianella, Stefano

AU - Esuli, Andrea

AU - Sebastiani, Fabrizio

PY - 2013/9/1

Y1 - 2013/9/1

N2 - Most popular feature selection methods for text classification such as information gain (also known as "mutual information"), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or "term") in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training "micro-documents", each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

AB - Most popular feature selection methods for text classification such as information gain (also known as "mutual information"), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or "term") in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training "micro-documents", each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

KW - Feature selection

KW - Ordinal regression

KW - Supervised learning

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84876036807&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876036807&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2013.02.010

DO - 10.1016/j.eswa.2013.02.010

M3 - Article

AN - SCOPUS:84876036807

VL - 40

SP - 4687

EP - 4696

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 11

ER -