On the potential of models for location and scale for genome-wide DNA methylation data

Simone Wahl, Nora Fenske, Sonja Zeilinger, Karsten Suhre, Christian Gieger, Melanie Waldenberger, Harald Grallert, Matthias Schmid

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter.Results: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample.Conclusions: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.

Original languageEnglish
Article number232
JournalBMC Bioinformatics
Volume15
Issue number1
DOIs
Publication statusPublished - 3 Jul 2014
Externally publishedYes

Fingerprint

DNA Methylation
Methylation
Genome
Genes
Logit
Regression
Gaussian Model
Performance Model
Association reactions
Cancer
Modeling
Binary
Generalized Additive Models
Test of Hypothesis
Model
Type I Error Rate
Environmental Factors
Scale Parameter
Resampling
Square root

Keywords

  • Beta regression
  • DNA methylation
  • EWAS
  • GAMLSS
  • Infinium HumanMethylation450k BeadChip
  • Model comparison
  • Model performance
  • Modeling variability
  • Models for location and scale
  • Resampling

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Wahl, S., Fenske, N., Zeilinger, S., Suhre, K., Gieger, C., Waldenberger, M., ... Schmid, M. (2014). On the potential of models for location and scale for genome-wide DNA methylation data. BMC Bioinformatics, 15(1), [232]. https://doi.org/10.1186/1471-2105-15-232

On the potential of models for location and scale for genome-wide DNA methylation data. / Wahl, Simone; Fenske, Nora; Zeilinger, Sonja; Suhre, Karsten; Gieger, Christian; Waldenberger, Melanie; Grallert, Harald; Schmid, Matthias.

In: BMC Bioinformatics, Vol. 15, No. 1, 232, 03.07.2014.

Research output: Contribution to journalArticle

Wahl, S, Fenske, N, Zeilinger, S, Suhre, K, Gieger, C, Waldenberger, M, Grallert, H & Schmid, M 2014, 'On the potential of models for location and scale for genome-wide DNA methylation data', BMC Bioinformatics, vol. 15, no. 1, 232. https://doi.org/10.1186/1471-2105-15-232
Wahl, Simone ; Fenske, Nora ; Zeilinger, Sonja ; Suhre, Karsten ; Gieger, Christian ; Waldenberger, Melanie ; Grallert, Harald ; Schmid, Matthias. / On the potential of models for location and scale for genome-wide DNA methylation data. In: BMC Bioinformatics. 2014 ; Vol. 15, No. 1.
@article{4457114b2acc4e8f9745c7ac816d1328,
title = "On the potential of models for location and scale for genome-wide DNA methylation data",
abstract = "Background: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter.Results: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample.Conclusions: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.",
keywords = "Beta regression, DNA methylation, EWAS, GAMLSS, Infinium HumanMethylation450k BeadChip, Model comparison, Model performance, Modeling variability, Models for location and scale, Resampling",
author = "Simone Wahl and Nora Fenske and Sonja Zeilinger and Karsten Suhre and Christian Gieger and Melanie Waldenberger and Harald Grallert and Matthias Schmid",
year = "2014",
month = "7",
day = "3",
doi = "10.1186/1471-2105-15-232",
language = "English",
volume = "15",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - On the potential of models for location and scale for genome-wide DNA methylation data

AU - Wahl, Simone

AU - Fenske, Nora

AU - Zeilinger, Sonja

AU - Suhre, Karsten

AU - Gieger, Christian

AU - Waldenberger, Melanie

AU - Grallert, Harald

AU - Schmid, Matthias

PY - 2014/7/3

Y1 - 2014/7/3

N2 - Background: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter.Results: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample.Conclusions: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.

AB - Background: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter.Results: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample.Conclusions: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.

KW - Beta regression

KW - DNA methylation

KW - EWAS

KW - GAMLSS

KW - Infinium HumanMethylation450k BeadChip

KW - Model comparison

KW - Model performance

KW - Modeling variability

KW - Models for location and scale

KW - Resampling

UR - http://www.scopus.com/inward/record.url?scp=84907714293&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907714293&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-15-232

DO - 10.1186/1471-2105-15-232

M3 - Article

VL - 15

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 232

ER -