Encoding ordinal features into binary features for text classification

Andrea Esuli, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two differentlearners based on "boosting", and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages771-775
Number of pages5
Volume5478 LNCS
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event31th European Conference on Information Retrieval, ECIR 2009 - Toulouse
Duration: 6 Apr 20099 Apr 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5478 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other31th European Conference on Information Retrieval, ECIR 2009
CityToulouse
Period6/4/099/4/09

Fingerprint

Text Classification
Supervised learning
Learning algorithms
Encoding
Binary
Boosting
Supervised Learning
Term
Learning Algorithm
Integer

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Esuli, A., & Sebastiani, F. (2009). Encoding ordinal features into binary features for text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 771-775). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5478 LNCS). https://doi.org/10.1007/978-3-642-00958-7_83

Encoding ordinal features into binary features for text classification. / Esuli, Andrea; Sebastiani, Fabrizio.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5478 LNCS 2009. p. 771-775 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5478 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Esuli, A & Sebastiani, F 2009, Encoding ordinal features into binary features for text classification. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5478 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5478 LNCS, pp. 771-775, 31th European Conference on Information Retrieval, ECIR 2009, Toulouse, 6/4/09. https://doi.org/10.1007/978-3-642-00958-7_83
Esuli A, Sebastiani F. Encoding ordinal features into binary features for text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5478 LNCS. 2009. p. 771-775. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-00958-7_83
Esuli, Andrea ; Sebastiani, Fabrizio. / Encoding ordinal features into binary features for text classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5478 LNCS 2009. pp. 771-775 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{6fb6d9d4efc14d5b9799c69378f7e74d,
title = "Encoding ordinal features into binary features for text classification",
abstract = "We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two differentlearners based on {"}boosting{"}, and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.",
author = "Andrea Esuli and Fabrizio Sebastiani",
year = "2009",
doi = "10.1007/978-3-642-00958-7_83",
language = "English",
isbn = "3642009573",
volume = "5478 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "771--775",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Encoding ordinal features into binary features for text classification

AU - Esuli, Andrea

AU - Sebastiani, Fabrizio

PY - 2009

Y1 - 2009

N2 - We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two differentlearners based on "boosting", and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.

AB - We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two differentlearners based on "boosting", and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.

UR - http://www.scopus.com/inward/record.url?scp=67650700765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650700765&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00958-7_83

DO - 10.1007/978-3-642-00958-7_83

M3 - Conference contribution

SN - 3642009573

SN - 9783642009570

VL - 5478 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 771

EP - 775

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -