An analysis of the relative hardness of reuters-21578 subsets

Franca Debole, Fabrizio Sebastiani

Research output: Contribution to journalArticle

105 Citations (Scopus)

Abstract

The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an Indirect means for comparing TC systems that have, or will be, tested on these different subsets.

Original languageEnglish
Pages (from-to)584-596
Number of pages13
JournalJournal of the American Society for Information Science and Technology
Volume56
Issue number6
DOIs
Publication statusPublished - Apr 2005
Externally publishedYes

Fingerprint

Hardness
Information retrieval
Availability
information retrieval
acceptance
Benchmark
Text categorization

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

An analysis of the relative hardness of reuters-21578 subsets. / Debole, Franca; Sebastiani, Fabrizio.

In: Journal of the American Society for Information Science and Technology, Vol. 56, No. 6, 04.2005, p. 584-596.

Research output: Contribution to journalArticle

@article{562e0ab6a5ec4f199f27c7e446ea854b,
title = "An analysis of the relative hardness of reuters-21578 subsets",
abstract = "The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have {"}carved{"} different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an Indirect means for comparing TC systems that have, or will be, tested on these different subsets.",
author = "Franca Debole and Fabrizio Sebastiani",
year = "2005",
month = "4",
doi = "10.1002/asi.20147",
language = "English",
volume = "56",
pages = "584--596",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "6",

}

TY - JOUR

T1 - An analysis of the relative hardness of reuters-21578 subsets

AU - Debole, Franca

AU - Sebastiani, Fabrizio

PY - 2005/4

Y1 - 2005/4

N2 - The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an Indirect means for comparing TC systems that have, or will be, tested on these different subsets.

AB - The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an Indirect means for comparing TC systems that have, or will be, tested on these different subsets.

UR - http://www.scopus.com/inward/record.url?scp=17644390231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17644390231&partnerID=8YFLogxK

U2 - 10.1002/asi.20147

DO - 10.1002/asi.20147

M3 - Article

VL - 56

SP - 584

EP - 596

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 6

ER -