A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data

J. Luo, M. Schumacher, A. Scherer, D. Sanoudou, D. Megherbi, T. Davison, T. Shi, W. Tong, L. Shi, H. Hong, C. Zhao, F. Elloumi, W. Shi, R. Thomas, S. Lin, G. Tillinghast, G. Liu, Y. Zhou, D. Herman, Y. LiY. Deng, H. Fang, P. Bushel, M. Woods, J. Zhang

Research output: Contribution to journalArticle

119 Citations (Scopus)

Abstract

Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

Original languageEnglish
Pages (from-to)278-291
Number of pages14
JournalPharmacogenomics Journal
Volume10
Issue number4
DOIs
Publication statusPublished - 1 Aug 2010
Externally publishedYes

Fingerprint

Quality Control
Gene Expression
Datasets

Keywords

  • batch effect
  • batch effect removal
  • cross-batch prediction
  • gene expression
  • MAQC-II
  • microarray

ASJC Scopus subject areas

  • Pharmacology
  • Molecular Medicine
  • Genetics

Cite this

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. / Luo, J.; Schumacher, M.; Scherer, A.; Sanoudou, D.; Megherbi, D.; Davison, T.; Shi, T.; Tong, W.; Shi, L.; Hong, H.; Zhao, C.; Elloumi, F.; Shi, W.; Thomas, R.; Lin, S.; Tillinghast, G.; Liu, G.; Zhou, Y.; Herman, D.; Li, Y.; Deng, Y.; Fang, H.; Bushel, P.; Woods, M.; Zhang, J.

In: Pharmacogenomics Journal, Vol. 10, No. 4, 01.08.2010, p. 278-291.

Research output: Contribution to journalArticle

Luo, J, Schumacher, M, Scherer, A, Sanoudou, D, Megherbi, D, Davison, T, Shi, T, Tong, W, Shi, L, Hong, H, Zhao, C, Elloumi, F, Shi, W, Thomas, R, Lin, S, Tillinghast, G, Liu, G, Zhou, Y, Herman, D, Li, Y, Deng, Y, Fang, H, Bushel, P, Woods, M & Zhang, J 2010, 'A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data', Pharmacogenomics Journal, vol. 10, no. 4, pp. 278-291. https://doi.org/10.1038/tpj.2010.57
Luo, J. ; Schumacher, M. ; Scherer, A. ; Sanoudou, D. ; Megherbi, D. ; Davison, T. ; Shi, T. ; Tong, W. ; Shi, L. ; Hong, H. ; Zhao, C. ; Elloumi, F. ; Shi, W. ; Thomas, R. ; Lin, S. ; Tillinghast, G. ; Liu, G. ; Zhou, Y. ; Herman, D. ; Li, Y. ; Deng, Y. ; Fang, H. ; Bushel, P. ; Woods, M. ; Zhang, J. / A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. In: Pharmacogenomics Journal. 2010 ; Vol. 10, No. 4. pp. 278-291.
@article{c2a0c3b666b641b8aa8eade4dc01ebb1,
title = "A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data",
abstract = "Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75{\%} of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.",
keywords = "batch effect, batch effect removal, cross-batch prediction, gene expression, MAQC-II, microarray",
author = "J. Luo and M. Schumacher and A. Scherer and D. Sanoudou and D. Megherbi and T. Davison and T. Shi and W. Tong and L. Shi and H. Hong and C. Zhao and F. Elloumi and W. Shi and R. Thomas and S. Lin and G. Tillinghast and G. Liu and Y. Zhou and D. Herman and Y. Li and Y. Deng and H. Fang and P. Bushel and M. Woods and J. Zhang",
year = "2010",
month = "8",
day = "1",
doi = "10.1038/tpj.2010.57",
language = "English",
volume = "10",
pages = "278--291",
journal = "Pharmacogenomics Journal",
issn = "1470-269X",
publisher = "Nature Publishing Group",
number = "4",

}

TY - JOUR

T1 - A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data

AU - Luo, J.

AU - Schumacher, M.

AU - Scherer, A.

AU - Sanoudou, D.

AU - Megherbi, D.

AU - Davison, T.

AU - Shi, T.

AU - Tong, W.

AU - Shi, L.

AU - Hong, H.

AU - Zhao, C.

AU - Elloumi, F.

AU - Shi, W.

AU - Thomas, R.

AU - Lin, S.

AU - Tillinghast, G.

AU - Liu, G.

AU - Zhou, Y.

AU - Herman, D.

AU - Li, Y.

AU - Deng, Y.

AU - Fang, H.

AU - Bushel, P.

AU - Woods, M.

AU - Zhang, J.

PY - 2010/8/1

Y1 - 2010/8/1

N2 - Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

AB - Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

KW - batch effect

KW - batch effect removal

KW - cross-batch prediction

KW - gene expression

KW - MAQC-II

KW - microarray

UR - http://www.scopus.com/inward/record.url?scp=77955131271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955131271&partnerID=8YFLogxK

U2 - 10.1038/tpj.2010.57

DO - 10.1038/tpj.2010.57

M3 - Article

C2 - 20676067

AN - SCOPUS:77955131271

VL - 10

SP - 278

EP - 291

JO - Pharmacogenomics Journal

JF - Pharmacogenomics Journal

SN - 1470-269X

IS - 4

ER -