Robust feature selection technique using rank aggregation

Chandrima Sarkar, Sarah Cooley, Jaideep Srivastava

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

Original languageEnglish
Pages (from-to)243-257
Number of pages15
JournalApplied Artificial Intelligence
Volume28
Issue number3
DOIs
Publication statusPublished - 16 Mar 2014
Externally publishedYes

Fingerprint

Feature extraction
Classifiers
Agglomeration
Tumors
Internet

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Robust feature selection technique using rank aggregation. / Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep.

In: Applied Artificial Intelligence, Vol. 28, No. 3, 16.03.2014, p. 243-257.

Research output: Contribution to journalArticle

Sarkar, Chandrima ; Cooley, Sarah ; Srivastava, Jaideep. / Robust feature selection technique using rank aggregation. In: Applied Artificial Intelligence. 2014 ; Vol. 28, No. 3. pp. 243-257.
@article{557211e488244805b92e6e18c67032c9,
title = "Robust feature selection technique using rank aggregation",
abstract = "Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4{\%} in a dataset with fewer than 500 features and by more than 5{\%} in a dataset with more than 500 features, across a wide range of classifiers. {\circledC} 2014",
author = "Chandrima Sarkar and Sarah Cooley and Jaideep Srivastava",
year = "2014",
month = "3",
day = "16",
doi = "10.1080/08839514.2014.883903",
language = "English",
volume = "28",
pages = "243--257",
journal = "Applied Artificial Intelligence",
issn = "0883-9514",
publisher = "Taylor and Francis Ltd.",
number = "3",

}

TY - JOUR

T1 - Robust feature selection technique using rank aggregation

AU - Sarkar, Chandrima

AU - Cooley, Sarah

AU - Srivastava, Jaideep

PY - 2014/3/16

Y1 - 2014/3/16

N2 - Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

AB - Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique that produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases, whereas classifiers exploit different statistical properties of data for evaluation. In numerous situations, this can put researchers into dilemma with regard to which feature selection method and classifiers to choose from a vast range of choices. In this article, we propose a technique that aggregates the consensus properties of various feature selection methods in order to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable toward achieving similar and, ideally, higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the robustness index (RI). We perform an extensive empirical evaluation of our technique on eight datasets with different dimensions, including arrythmia, lung cancer, Madelon, mfeat-fourier, Internet ads, leukemia-3c, embryonal tumor, and a real-world dataset, vis., acute myeloid leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that, compared with other techniques, our algorithm improves the classification accuracy by approximately 3-4% in a dataset with fewer than 500 features and by more than 5% in a dataset with more than 500 features, across a wide range of classifiers. © 2014

UR - http://www.scopus.com/inward/record.url?scp=84896352368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896352368&partnerID=8YFLogxK

U2 - 10.1080/08839514.2014.883903

DO - 10.1080/08839514.2014.883903

M3 - Article

AN - SCOPUS:84896352368

VL - 28

SP - 243

EP - 257

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

SN - 0883-9514

IS - 3

ER -