Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities

Florian Mittag, Finja Büchel, Mohamad Saad, Andreas Jahn, Claudia Schulte, Zoltan Bochdanovits, Javier Simón-Sánchez, Mike A. Nalls, Margaux Keller, Dena G. Hernandez, J. Raphael Gibbs, Suzanne Lesage, Alexis Brice, Peter Heutink, Maria Martinez, Nicholas W. Wood, John Hardy, Andrew B. Singleton, Andreas Zell, Thomas GasserManu Sharma

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/.

Original languageEnglish
Pages (from-to)1708-1718
Number of pages11
JournalHuman Mutation
Volume33
Issue number12
DOIs
Publication statusPublished - 1 Dec 2012
Externally publishedYes

Fingerprint

Genome-Wide Association Study
Parkinson Disease
Type 1 Diabetes Mellitus
Area Under Curve
Software
Support Vector Machine
ROC Curve
Single Nucleotide Polymorphism

Keywords

  • Disease risk prediction
  • Genome-wide association studies
  • Machine learning
  • Parkinson disease
  • Support vector machines

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Use of support vector machines for disease risk prediction in genome-wide association studies : Concerns and opportunities. / Mittag, Florian; Büchel, Finja; Saad, Mohamad; Jahn, Andreas; Schulte, Claudia; Bochdanovits, Zoltan; Simón-Sánchez, Javier; Nalls, Mike A.; Keller, Margaux; Hernandez, Dena G.; Gibbs, J. Raphael; Lesage, Suzanne; Brice, Alexis; Heutink, Peter; Martinez, Maria; Wood, Nicholas W.; Hardy, John; Singleton, Andrew B.; Zell, Andreas; Gasser, Thomas; Sharma, Manu.

In: Human Mutation, Vol. 33, No. 12, 01.12.2012, p. 1708-1718.

Research output: Contribution to journalArticle

Mittag, F, Büchel, F, Saad, M, Jahn, A, Schulte, C, Bochdanovits, Z, Simón-Sánchez, J, Nalls, MA, Keller, M, Hernandez, DG, Gibbs, JR, Lesage, S, Brice, A, Heutink, P, Martinez, M, Wood, NW, Hardy, J, Singleton, AB, Zell, A, Gasser, T & Sharma, M 2012, 'Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities', Human Mutation, vol. 33, no. 12, pp. 1708-1718. https://doi.org/10.1002/humu.22161
Mittag, Florian ; Büchel, Finja ; Saad, Mohamad ; Jahn, Andreas ; Schulte, Claudia ; Bochdanovits, Zoltan ; Simón-Sánchez, Javier ; Nalls, Mike A. ; Keller, Margaux ; Hernandez, Dena G. ; Gibbs, J. Raphael ; Lesage, Suzanne ; Brice, Alexis ; Heutink, Peter ; Martinez, Maria ; Wood, Nicholas W. ; Hardy, John ; Singleton, Andrew B. ; Zell, Andreas ; Gasser, Thomas ; Sharma, Manu. / Use of support vector machines for disease risk prediction in genome-wide association studies : Concerns and opportunities. In: Human Mutation. 2012 ; Vol. 33, No. 12. pp. 1708-1718.
@article{aef13d6665ab462286ef4cd3bb7afd3c,
title = "Use of support vector machines for disease risk prediction in genome-wide association studies: Concerns and opportunities",
abstract = "The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5{\%}) as well as rare variants (frequency <1{\%}) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90{\%}). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38{\%}). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/.",
keywords = "Disease risk prediction, Genome-wide association studies, Machine learning, Parkinson disease, Support vector machines",
author = "Florian Mittag and Finja B{\"u}chel and Mohamad Saad and Andreas Jahn and Claudia Schulte and Zoltan Bochdanovits and Javier Sim{\'o}n-S{\'a}nchez and Nalls, {Mike A.} and Margaux Keller and Hernandez, {Dena G.} and Gibbs, {J. Raphael} and Suzanne Lesage and Alexis Brice and Peter Heutink and Maria Martinez and Wood, {Nicholas W.} and John Hardy and Singleton, {Andrew B.} and Andreas Zell and Thomas Gasser and Manu Sharma",
year = "2012",
month = "12",
day = "1",
doi = "10.1002/humu.22161",
language = "English",
volume = "33",
pages = "1708--1718",
journal = "Human Mutation",
issn = "1059-7794",
publisher = "Wiley-Liss Inc.",
number = "12",

}

TY - JOUR

T1 - Use of support vector machines for disease risk prediction in genome-wide association studies

T2 - Concerns and opportunities

AU - Mittag, Florian

AU - Büchel, Finja

AU - Saad, Mohamad

AU - Jahn, Andreas

AU - Schulte, Claudia

AU - Bochdanovits, Zoltan

AU - Simón-Sánchez, Javier

AU - Nalls, Mike A.

AU - Keller, Margaux

AU - Hernandez, Dena G.

AU - Gibbs, J. Raphael

AU - Lesage, Suzanne

AU - Brice, Alexis

AU - Heutink, Peter

AU - Martinez, Maria

AU - Wood, Nicholas W.

AU - Hardy, John

AU - Singleton, Andrew B.

AU - Zell, Andreas

AU - Gasser, Thomas

AU - Sharma, Manu

PY - 2012/12/1

Y1 - 2012/12/1

N2 - The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/.

AB - The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ∼0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ∼0.56; heritability ∼38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/.

KW - Disease risk prediction

KW - Genome-wide association studies

KW - Machine learning

KW - Parkinson disease

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=84869085776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84869085776&partnerID=8YFLogxK

U2 - 10.1002/humu.22161

DO - 10.1002/humu.22161

M3 - Article

C2 - 22777693

AN - SCOPUS:84869085776

VL - 33

SP - 1708

EP - 1718

JO - Human Mutation

JF - Human Mutation

SN - 1059-7794

IS - 12

ER -