Model comparison for breast cancer prognosis based on clinical data

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.

Original languageEnglish
Article numbere0146413
JournalPLoS One
Volume11
Issue number1
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

breast neoplasms
prognosis
Breast Neoplasms
Least-Squares Analysis
Linear Models
Neoplasms
Incus
neoplasms
Radio
Estrogen Receptors
least squares
Therapeutics
Lymph Nodes
linear models
forest trees
Oncology
radio
neural networks
lymph nodes
Support vector machines

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Model comparison for breast cancer prognosis based on clinical data. / Boughorbel, Sabri; Al-Ali, Rashid J.; Elkum, Naser.

In: PLoS One, Vol. 11, No. 1, e0146413, 01.01.2016.

Research output: Contribution to journalArticle

@article{393d8be31f4e45b49ce3f6bb133589c7,
title = "Model comparison for breast cancer prognosis based on clinical data",
abstract = "We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.",
author = "Sabri Boughorbel and Al-Ali, {Rashid J.} and Naser Elkum",
year = "2016",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0146413",
language = "English",
volume = "11",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "1",

}

TY - JOUR

T1 - Model comparison for breast cancer prognosis based on clinical data

AU - Boughorbel, Sabri

AU - Al-Ali, Rashid J.

AU - Elkum, Naser

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.

AB - We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.

UR - http://www.scopus.com/inward/record.url?scp=84955448634&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955448634&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0146413

DO - 10.1371/journal.pone.0146413

M3 - Article

VL - 11

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 1

M1 - e0146413

ER -