CrossLink

A novel method for cross-condition classification of cancer subtypes

Chifeng Ma, Seetharama S. Konduru, Mario Flore, Salah Gehani, Issam Al-Bozom, Yusheng Feng, Erchin Serpedin, Lotfi Chouchane, Yidong Chen, Yufei Huang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. Methods: To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. Results: We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. Conclusions: A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

Original languageEnglish
Article number549
JournalBMC Genomics
Volume17
DOIs
Publication statusPublished - 22 Aug 2016

Fingerprint

Neoplasms
Breast Neoplasms
Transcriptome
Genes
Cluster Analysis
Datasets
Population

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

CrossLink : A novel method for cross-condition classification of cancer subtypes. / Ma, Chifeng; Konduru, Seetharama S.; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei.

In: BMC Genomics, Vol. 17, 549, 22.08.2016.

Research output: Contribution to journalArticle

Ma, Chifeng ; Konduru, Seetharama S. ; Flore, Mario ; Gehani, Salah ; Al-Bozom, Issam ; Feng, Yusheng ; Serpedin, Erchin ; Chouchane, Lotfi ; Chen, Yidong ; Huang, Yufei. / CrossLink : A novel method for cross-condition classification of cancer subtypes. In: BMC Genomics. 2016 ; Vol. 17.
@article{3258089f3f4a415ea754eafc2e4a12c1,
title = "CrossLink: A novel method for cross-condition classification of cancer subtypes",
abstract = "Background: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. Methods: To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. Results: We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 {\%}, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. Conclusions: A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.",
author = "Chifeng Ma and Konduru, {Seetharama S.} and Mario Flore and Salah Gehani and Issam Al-Bozom and Yusheng Feng and Erchin Serpedin and Lotfi Chouchane and Yidong Chen and Yufei Huang",
year = "2016",
month = "8",
day = "22",
doi = "10.1186/s12864-016-2903-z",
language = "English",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - CrossLink

T2 - A novel method for cross-condition classification of cancer subtypes

AU - Ma, Chifeng

AU - Konduru, Seetharama S.

AU - Flore, Mario

AU - Gehani, Salah

AU - Al-Bozom, Issam

AU - Feng, Yusheng

AU - Serpedin, Erchin

AU - Chouchane, Lotfi

AU - Chen, Yidong

AU - Huang, Yufei

PY - 2016/8/22

Y1 - 2016/8/22

N2 - Background: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. Methods: To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. Results: We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. Conclusions: A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

AB - Background: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. Methods: To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. Results: We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. Conclusions: A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84983035921&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983035921&partnerID=8YFLogxK

U2 - 10.1186/s12864-016-2903-z

DO - 10.1186/s12864-016-2903-z

M3 - Article

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 549

ER -