Reducing confounding and suppression effects in TCGA data

an integrated analysis of chemotherapy response in ovarian cancer.

Fang Han Hsu, Erchin Serpedin, Tzu Hung Hsiao, Alexander J R Bishop, Edward R. Dougherty, Yidong Chen

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.

Original languageEnglish
Article numberS13
JournalBMC Genomics
Volume13 Suppl 6
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Atlases
Ovarian Neoplasms
Genome
Drug Therapy
Neoplasms
Carboplatin
Paclitaxel
Methylation
Cluster Analysis
Genes
Biological Phenomena
Gene Dosage
Adjuvant Chemotherapy
Sample Size
Disease-Free Survival
Therapeutics
Gene Expression
Recurrence
Research
Datasets

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Hsu, F. H., Serpedin, E., Hsiao, T. H., Bishop, A. J. R., Dougherty, E. R., & Chen, Y. (2012). Reducing confounding and suppression effects in TCGA data: an integrated analysis of chemotherapy response in ovarian cancer. BMC Genomics, 13 Suppl 6, [S13].

Reducing confounding and suppression effects in TCGA data : an integrated analysis of chemotherapy response in ovarian cancer. / Hsu, Fang Han; Serpedin, Erchin; Hsiao, Tzu Hung; Bishop, Alexander J R; Dougherty, Edward R.; Chen, Yidong.

In: BMC Genomics, Vol. 13 Suppl 6, S13, 2012.

Research output: Contribution to journalArticle

Hsu, Fang Han ; Serpedin, Erchin ; Hsiao, Tzu Hung ; Bishop, Alexander J R ; Dougherty, Edward R. ; Chen, Yidong. / Reducing confounding and suppression effects in TCGA data : an integrated analysis of chemotherapy response in ovarian cancer. In: BMC Genomics. 2012 ; Vol. 13 Suppl 6.
@article{3e2f7922d9ad4127a3c05fad98efe381,
title = "Reducing confounding and suppression effects in TCGA data: an integrated analysis of chemotherapy response in ovarian cancer.",
abstract = "Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.",
author = "Hsu, {Fang Han} and Erchin Serpedin and Hsiao, {Tzu Hung} and Bishop, {Alexander J R} and Dougherty, {Edward R.} and Yidong Chen",
year = "2012",
language = "English",
volume = "13 Suppl 6",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Reducing confounding and suppression effects in TCGA data

T2 - an integrated analysis of chemotherapy response in ovarian cancer.

AU - Hsu, Fang Han

AU - Serpedin, Erchin

AU - Hsiao, Tzu Hung

AU - Bishop, Alexander J R

AU - Dougherty, Edward R.

AU - Chen, Yidong

PY - 2012

Y1 - 2012

N2 - Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.

AB - Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.

UR - http://www.scopus.com/inward/record.url?scp=84876082047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876082047&partnerID=8YFLogxK

M3 - Article

VL - 13 Suppl 6

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - S13

ER -