Class confidence weighted kNN algorithms for imbalanced data sets

Wei Liu, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

73 Citations (Scopus)

Abstract

In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages345-356
Number of pages12
Volume6635 LNAI
EditionPART 2
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011 - Shenzhen
Duration: 24 May 201127 May 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6635 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011
CityShenzhen
Period24/5/1127/5/11

Fingerprint

Distance measurement
Confidence
Nearest Neighbor
Labels
Distance Measurement
Experiments
Class
Weighting
Theoretical Analysis
Attribute
Prototype
Tend
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Liu, W., & Chawla, S. (2011). Class confidence weighted kNN algorithms for imbalanced data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 2 ed., Vol. 6635 LNAI, pp. 345-356). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6635 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-20847-8-29

Class confidence weighted kNN algorithms for imbalanced data sets. / Liu, Wei; Chawla, Sanjay.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6635 LNAI PART 2. ed. 2011. p. 345-356 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6635 LNAI, No. PART 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, W & Chawla, S 2011, Class confidence weighted kNN algorithms for imbalanced data sets. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 edn, vol. 6635 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 6635 LNAI, pp. 345-356, 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011, Shenzhen, 24/5/11. https://doi.org/10.1007/978-3-642-20847-8-29
Liu W, Chawla S. Class confidence weighted kNN algorithms for imbalanced data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 ed. Vol. 6635 LNAI. 2011. p. 345-356. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). https://doi.org/10.1007/978-3-642-20847-8-29
Liu, Wei ; Chawla, Sanjay. / Class confidence weighted kNN algorithms for imbalanced data sets. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6635 LNAI PART 2. ed. 2011. pp. 345-356 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).
@inproceedings{9b3555a3aebc4b7a85b4685eae2685b1,
title = "Class confidence weighted kNN algorithms for imbalanced data sets",
abstract = "In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.",
author = "Wei Liu and Sanjay Chawla",
year = "2011",
doi = "10.1007/978-3-642-20847-8-29",
language = "English",
isbn = "9783642208461",
volume = "6635 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 2",
pages = "345--356",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 2",

}

TY - GEN

T1 - Class confidence weighted kNN algorithms for imbalanced data sets

AU - Liu, Wei

AU - Chawla, Sanjay

PY - 2011

Y1 - 2011

N2 - In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.

AB - In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.

UR - http://www.scopus.com/inward/record.url?scp=79957967238&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957967238&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-20847-8-29

DO - 10.1007/978-3-642-20847-8-29

M3 - Conference contribution

SN - 9783642208461

VL - 6635 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 345

EP - 356

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -