Estimation of false negatives in classification

Sandeep Mane, Jaideep Srivastava, San Yih Hwang, Jamshid Vayghan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.

Original languageEnglish
Title of host publicationProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
EditorsR. Rastogi, K. Morik, M. Bramer, X. Wu
Pages475-478
Number of pages4
DOIs
Publication statusPublished - 2004
Externally publishedYes
EventProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 - Brighton, United Kingdom
Duration: 1 Nov 20044 Nov 2004

Other

OtherProceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004
CountryUnited Kingdom
CityBrighton
Period1/11/044/11/04

Fingerprint

Classifiers
Learning systems
Costs

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Mane, S., Srivastava, J., Hwang, S. Y., & Vayghan, J. (2004). Estimation of false negatives in classification. In R. Rastogi, K. Morik, M. Bramer, & X. Wu (Eds.), Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004 (pp. 475-478) https://doi.org/10.1109/ICDM.2004.10048

Estimation of false negatives in classification. / Mane, Sandeep; Srivastava, Jaideep; Hwang, San Yih; Vayghan, Jamshid.

Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. ed. / R. Rastogi; K. Morik; M. Bramer; X. Wu. 2004. p. 475-478.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mane, S, Srivastava, J, Hwang, SY & Vayghan, J 2004, Estimation of false negatives in classification. in R Rastogi, K Morik, M Bramer & X Wu (eds), Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. pp. 475-478, Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, Brighton, United Kingdom, 1/11/04. https://doi.org/10.1109/ICDM.2004.10048
Mane S, Srivastava J, Hwang SY, Vayghan J. Estimation of false negatives in classification. In Rastogi R, Morik K, Bramer M, Wu X, editors, Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. 2004. p. 475-478 https://doi.org/10.1109/ICDM.2004.10048
Mane, Sandeep ; Srivastava, Jaideep ; Hwang, San Yih ; Vayghan, Jamshid. / Estimation of false negatives in classification. Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004. editor / R. Rastogi ; K. Morik ; M. Bramer ; X. Wu. 2004. pp. 475-478
@inproceedings{631366d579f54df3b1668fb1d50da7e0,
title = "Estimation of false negatives in classification",
abstract = "In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the {"}predicted false{"} class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.",
author = "Sandeep Mane and Jaideep Srivastava and Hwang, {San Yih} and Jamshid Vayghan",
year = "2004",
doi = "10.1109/ICDM.2004.10048",
language = "English",
isbn = "0769521428",
pages = "475--478",
editor = "R. Rastogi and K. Morik and M. Bramer and X. Wu",
booktitle = "Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004",

}

TY - GEN

T1 - Estimation of false negatives in classification

AU - Mane, Sandeep

AU - Srivastava, Jaideep

AU - Hwang, San Yih

AU - Vayghan, Jamshid

PY - 2004

Y1 - 2004

N2 - In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.

AB - In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier. However, the high costs as well as time constraints on an expert's time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI Machine Learning Repository are presented.

UR - http://www.scopus.com/inward/record.url?scp=19544363935&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19544363935&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2004.10048

DO - 10.1109/ICDM.2004.10048

M3 - Conference contribution

SN - 0769521428

SN - 9780769521428

SP - 475

EP - 478

BT - Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004

A2 - Rastogi, R.

A2 - Morik, K.

A2 - Bramer, M.

A2 - Wu, X.

ER -