On the statistical consistency of algorithms for binary classification under class imbalance

Aditya Krishna Menon, Harikrishna Narasimhan, Shivani Agarwal, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Experimental results confirm our consistency theorems.

Original languageEnglish
Title of host publication30th International Conference on Machine Learning, ICML 2013
PublisherInternational Machine Learning Society (IMLS)
Pages1640-1648
Number of pages9
EditionPART 2
Publication statusPublished - 2013
Externally publishedYes
Event30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA
Duration: 16 Jun 201321 Jun 2013

Other

Other30th International Conference on Machine Learning, ICML 2013
CityAtlanta, GA
Period16/6/1321/6/13

Fingerprint

Learning systems
performance
learning

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Sociology and Political Science

Cite this

Menon, A. K., Narasimhan, H., Agarwal, S., & Chawla, S. (2013). On the statistical consistency of algorithms for binary classification under class imbalance. In 30th International Conference on Machine Learning, ICML 2013 (PART 2 ed., pp. 1640-1648). International Machine Learning Society (IMLS).

On the statistical consistency of algorithms for binary classification under class imbalance. / Menon, Aditya Krishna; Narasimhan, Harikrishna; Agarwal, Shivani; Chawla, Sanjay.

30th International Conference on Machine Learning, ICML 2013. PART 2. ed. International Machine Learning Society (IMLS), 2013. p. 1640-1648.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Menon, AK, Narasimhan, H, Agarwal, S & Chawla, S 2013, On the statistical consistency of algorithms for binary classification under class imbalance. in 30th International Conference on Machine Learning, ICML 2013. PART 2 edn, International Machine Learning Society (IMLS), pp. 1640-1648, 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, 16/6/13.
Menon AK, Narasimhan H, Agarwal S, Chawla S. On the statistical consistency of algorithms for binary classification under class imbalance. In 30th International Conference on Machine Learning, ICML 2013. PART 2 ed. International Machine Learning Society (IMLS). 2013. p. 1640-1648
Menon, Aditya Krishna ; Narasimhan, Harikrishna ; Agarwal, Shivani ; Chawla, Sanjay. / On the statistical consistency of algorithms for binary classification under class imbalance. 30th International Conference on Machine Learning, ICML 2013. PART 2. ed. International Machine Learning Society (IMLS), 2013. pp. 1640-1648
@inproceedings{fb24dd59b6724dca993ae01f52e91218,
title = "On the statistical consistency of algorithms for binary classification under class imbalance",
abstract = "Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Experimental results confirm our consistency theorems.",
author = "Menon, {Aditya Krishna} and Harikrishna Narasimhan and Shivani Agarwal and Sanjay Chawla",
year = "2013",
language = "English",
pages = "1640--1648",
booktitle = "30th International Conference on Machine Learning, ICML 2013",
publisher = "International Machine Learning Society (IMLS)",
edition = "PART 2",

}

TY - GEN

T1 - On the statistical consistency of algorithms for binary classification under class imbalance

AU - Menon, Aditya Krishna

AU - Narasimhan, Harikrishna

AU - Agarwal, Shivani

AU - Chawla, Sanjay

PY - 2013

Y1 - 2013

N2 - Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Experimental results confirm our consistency theorems.

AB - Class imbalance situations, where one class is rare compared to the other, arise frequently in machine learning applications. It is well known that the usual misclassification error is ill-suited for measuring performance in such settings. A wide range of performance measures have been proposed for this problem. However, despite the large number of studies on this problem, little is understood about the statistical consistency of the algorithms proposed with respect to the performance measures of interest. In this paper, we study consistency with respect to one such performance measure, namely the arithmetic mean of the true positive and true negative rates (AM), and establish that some practically popular approaches, such as applying an empirically determined threshold to a suitable class probability estimate or performing an empirically balanced form of risk minimization, are in fact consistent with respect to the AM (under mild conditions on the underlying distribution). Experimental results confirm our consistency theorems.

UR - http://www.scopus.com/inward/record.url?scp=84897506611&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897506611&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1640

EP - 1648

BT - 30th International Conference on Machine Learning, ICML 2013

PB - International Machine Learning Society (IMLS)

ER -