Ensemble-based wrapper methods for feature selection and class imbalance learning

Pengyi Yang, Wei Liu, Bing B. Zhou, Sanjay Chawla, Albert Y. Zomaya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm "wrapped" in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages544-555
Number of pages12
Volume7818 LNAI
EditionPART 1
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD
Duration: 14 Apr 201317 Apr 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7818 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
CityGold Coast, QLD
Period14/4/1317/4/13

Fingerprint

Wrapper
Feature Selection
Feature extraction
Ensemble
Sampling
Set theory
Subset
Data Classification
Evaluate
Classifiers
Search Algorithm
High-dimensional
Classifier
Class
Learning
Experimental Results

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Yang, P., Liu, W., Zhou, B. B., Chawla, S., & Zomaya, A. Y. (2013). Ensemble-based wrapper methods for feature selection and class imbalance learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 ed., Vol. 7818 LNAI, pp. 544-555). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7818 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-37453-1_45

Ensemble-based wrapper methods for feature selection and class imbalance learning. / Yang, Pengyi; Liu, Wei; Zhou, Bing B.; Chawla, Sanjay; Zomaya, Albert Y.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7818 LNAI PART 1. ed. 2013. p. 544-555 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7818 LNAI, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yang, P, Liu, W, Zhou, BB, Chawla, S & Zomaya, AY 2013, Ensemble-based wrapper methods for feature selection and class imbalance learning. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 edn, vol. 7818 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 7818 LNAI, pp. 544-555, 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013, Gold Coast, QLD, 14/4/13. https://doi.org/10.1007/978-3-642-37453-1_45
Yang P, Liu W, Zhou BB, Chawla S, Zomaya AY. Ensemble-based wrapper methods for feature selection and class imbalance learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 ed. Vol. 7818 LNAI. 2013. p. 544-555. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-37453-1_45
Yang, Pengyi ; Liu, Wei ; Zhou, Bing B. ; Chawla, Sanjay ; Zomaya, Albert Y. / Ensemble-based wrapper methods for feature selection and class imbalance learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7818 LNAI PART 1. ed. 2013. pp. 544-555 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{91bff3627e0e4d9fa386b94c637fbf87,
title = "Ensemble-based wrapper methods for feature selection and class imbalance learning",
abstract = "The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm {"}wrapped{"} in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification.",
author = "Pengyi Yang and Wei Liu and Zhou, {Bing B.} and Sanjay Chawla and Zomaya, {Albert Y.}",
year = "2013",
doi = "10.1007/978-3-642-37453-1_45",
language = "English",
isbn = "9783642374524",
volume = "7818 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "544--555",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - Ensemble-based wrapper methods for feature selection and class imbalance learning

AU - Yang, Pengyi

AU - Liu, Wei

AU - Zhou, Bing B.

AU - Chawla, Sanjay

AU - Zomaya, Albert Y.

PY - 2013

Y1 - 2013

N2 - The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm "wrapped" in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification.

AB - The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm "wrapped" in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification.

UR - http://www.scopus.com/inward/record.url?scp=84893568864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893568864&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-37453-1_45

DO - 10.1007/978-3-642-37453-1_45

M3 - Conference contribution

SN - 9783642374524

VL - 7818 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 544

EP - 555

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -