Improving bagging performance through multi-algorithm ensembles

Kuo Wei Hsu, Jaideep Srivastava

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.

Original languageEnglish
Pages (from-to)498-512
Number of pages15
JournalFrontiers of Computer Science in China
Volume6
Issue number5
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Bagging
Ensemble
Ensemble Methods
Majority Voting
Classification Algorithm
Nonlinear Function
Margin
Bootstrap
Expand
Experiment
Classifiers
Experiments
Classifier
Benchmark
Evaluation

Keywords

  • bagging
  • classification
  • diversity
  • ensemble

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Improving bagging performance through multi-algorithm ensembles. / Hsu, Kuo Wei; Srivastava, Jaideep.

In: Frontiers of Computer Science in China, Vol. 6, No. 5, 2012, p. 498-512.

Research output: Contribution to journalArticle

@article{78d646d434fb41518f8d6cdaaa55d43c,
title = "Improving bagging performance through multi-algorithm ensembles",
abstract = "Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.",
keywords = "bagging, classification, diversity, ensemble",
author = "Hsu, {Kuo Wei} and Jaideep Srivastava",
year = "2012",
doi = "10.1007/s11704-012-1163-6",
language = "English",
volume = "6",
pages = "498--512",
journal = "Frontiers of Computer Science",
issn = "2095-2228",
publisher = "Springer Science + Business Media",
number = "5",

}

TY - JOUR

T1 - Improving bagging performance through multi-algorithm ensembles

AU - Hsu, Kuo Wei

AU - Srivastava, Jaideep

PY - 2012

Y1 - 2012

N2 - Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.

AB - Working as an ensemble method that establishes a committee of classifiers first and then aggregates their outcomes through majority voting, bagging has attracted considerable research interest and been applied in various application domains. It has demonstrated several advantages, but in its present form, bagging has been found to be less accurate than some other ensemble methods. To unlock its power and expand its user base, we propose an approach that improves bagging through the use of multi-algorithm ensembles. In a multi-algorithm ensemble, multiple classification algorithms are employed. Starting from a study of the nature of diversity, we show that compared to using different training sets alone, using heterogeneous algorithms together with different training sets increases diversity in ensembles, and hence we provide a fundamental explanation for research utilizing heterogeneous algorithms. In addition, we partially address the problem of the relationship between diversity and accuracy by providing a non-linear function that describes the relationship between diversity and correlation. Furthermore, after realizing that the bootstrap procedure is the exclusive source of diversity in bagging, we use heterogeneity as another source of diversity and propose an approach utilizing heterogeneous algorithms in bagging. For evaluation, we consider several benchmark data sets from various application domains. The results indicate that, in terms of F1-measure, our approach outperforms most of the other state-of-the-art ensemble methods considered in experiments and, in terms of mean margin, our approach is superior to all the others considered in experiments.

KW - bagging

KW - classification

KW - diversity

KW - ensemble

UR - http://www.scopus.com/inward/record.url?scp=84867476478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867476478&partnerID=8YFLogxK

U2 - 10.1007/s11704-012-1163-6

DO - 10.1007/s11704-012-1163-6

M3 - Article

AN - SCOPUS:84867476478

VL - 6

SP - 498

EP - 512

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

SN - 2095-2228

IS - 5

ER -