A utility-theoretic ranking method for semi-automated text classification

Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In Semi-Automated Text Classification (SATC) an automatic classifier F labels a set of unlabelled documents D, following which a human annotator inspects (and corrects when appropriate) the labels attributed by F to a subset D' of D, with the aim of improving the overall quality of the labelling. An automated system can support this process by ranking the automatically labelled documents in a way that maximizes the expected increase in effectiveness that derives from inspecting D. An obvious strategy is to rank D so that the documents that F has classified with the lowest confidence are top-ranked. In this work we show that this strategy is suboptimal. We develop a new utility-theoretic ranking method based on the notion of inspection gain, defined as the improvement in classification effectiveness that would derive by inspecting and correcting a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially inspecting a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method above, and according to the proposed measure, our ranking method can achieve substantially higher expected reductions in classification error.

Original languageEnglish
Title of host publicationSIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages961-970
Number of pages10
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012 - Portland, OR, United States
Duration: 12 Aug 201216 Aug 2012

Other

Other35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012
CountryUnited States
CityPortland, OR
Period12/8/1216/8/12

    Fingerprint

Keywords

  • cost-sensitive learning
  • ranking
  • semi-automated text classification
  • supervised learning
  • text classification

ASJC Scopus subject areas

  • Information Systems

Cite this

Berardi, G., Esuli, A., & Sebastiani, F. (2012). A utility-theoretic ranking method for semi-automated text classification. In SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 961-970) https://doi.org/10.1145/2348283.2348411