pulver

An R package for parallel ultra-rapid p-value computation for linear regression interaction terms

Sophie Molnos, Clemens Baumbach, Simone Wahl, Martina Müller-Nurasyid, Konstantin Strauch, Rui Wang-Sattler, Melanie Waldenberger, Thomas Meitinger, Jerzy Adamski, Gabi Kastenmüller, Karsten Suhre, Annette Peters, Harald Grallert, Fabian J. Theis, Christian Gieger

Research output: Contribution to journalArticle

Abstract

Background: Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results: We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions: The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/.

Original languageEnglish
Article number429
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
Publication statusPublished - 29 Sep 2017

Fingerprint

p-Value
Linear regression
Linear Models
Single nucleotide Polymorphism
Term
Nucleotides
Metabolites
Linear Regression Model
Polymorphism
C++
Interaction
Single Nucleotide Polymorphism
C (programming language)
Programming Languages
Inborn Genetic Diseases
Genome-Wide Association Study
Arbitrary
DNA Methylation
Rearrangement
Metabolism

Keywords

  • Algorithm
  • Linear regression interaction term
  • SNP-CpG interaction
  • Software

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Molnos, S., Baumbach, C., Wahl, S., Müller-Nurasyid, M., Strauch, K., Wang-Sattler, R., ... Gieger, C. (2017). pulver: An R package for parallel ultra-rapid p-value computation for linear regression interaction terms. BMC Bioinformatics, 18(1), [429]. https://doi.org/10.1186/s12859-017-1838-y

pulver : An R package for parallel ultra-rapid p-value computation for linear regression interaction terms. / Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J.; Gieger, Christian.

In: BMC Bioinformatics, Vol. 18, No. 1, 429, 29.09.2017.

Research output: Contribution to journalArticle

Molnos, S, Baumbach, C, Wahl, S, Müller-Nurasyid, M, Strauch, K, Wang-Sattler, R, Waldenberger, M, Meitinger, T, Adamski, J, Kastenmüller, G, Suhre, K, Peters, A, Grallert, H, Theis, FJ & Gieger, C 2017, 'pulver: An R package for parallel ultra-rapid p-value computation for linear regression interaction terms', BMC Bioinformatics, vol. 18, no. 1, 429. https://doi.org/10.1186/s12859-017-1838-y
Molnos S, Baumbach C, Wahl S, Müller-Nurasyid M, Strauch K, Wang-Sattler R et al. pulver: An R package for parallel ultra-rapid p-value computation for linear regression interaction terms. BMC Bioinformatics. 2017 Sep 29;18(1). 429. https://doi.org/10.1186/s12859-017-1838-y
Molnos, Sophie ; Baumbach, Clemens ; Wahl, Simone ; Müller-Nurasyid, Martina ; Strauch, Konstantin ; Wang-Sattler, Rui ; Waldenberger, Melanie ; Meitinger, Thomas ; Adamski, Jerzy ; Kastenmüller, Gabi ; Suhre, Karsten ; Peters, Annette ; Grallert, Harald ; Theis, Fabian J. ; Gieger, Christian. / pulver : An R package for parallel ultra-rapid p-value computation for linear regression interaction terms. In: BMC Bioinformatics. 2017 ; Vol. 18, No. 1.
@article{0ece76ff973249d6a307b7d989e2ca18,
title = "pulver: An R package for parallel ultra-rapid p-value computation for linear regression interaction terms",
abstract = "Background: Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different {"}omics{"} layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results: We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different {"}omics{"} layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions: The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/.",
keywords = "Algorithm, Linear regression interaction term, SNP-CpG interaction, Software",
author = "Sophie Molnos and Clemens Baumbach and Simone Wahl and Martina M{\"u}ller-Nurasyid and Konstantin Strauch and Rui Wang-Sattler and Melanie Waldenberger and Thomas Meitinger and Jerzy Adamski and Gabi Kastenm{\"u}ller and Karsten Suhre and Annette Peters and Harald Grallert and Theis, {Fabian J.} and Christian Gieger",
year = "2017",
month = "9",
day = "29",
doi = "10.1186/s12859-017-1838-y",
language = "English",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - pulver

T2 - An R package for parallel ultra-rapid p-value computation for linear regression interaction terms

AU - Molnos, Sophie

AU - Baumbach, Clemens

AU - Wahl, Simone

AU - Müller-Nurasyid, Martina

AU - Strauch, Konstantin

AU - Wang-Sattler, Rui

AU - Waldenberger, Melanie

AU - Meitinger, Thomas

AU - Adamski, Jerzy

AU - Kastenmüller, Gabi

AU - Suhre, Karsten

AU - Peters, Annette

AU - Grallert, Harald

AU - Theis, Fabian J.

AU - Gieger, Christian

PY - 2017/9/29

Y1 - 2017/9/29

N2 - Background: Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results: We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions: The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/.

AB - Background: Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. Results: We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. Conclusions: The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/.

KW - Algorithm

KW - Linear regression interaction term

KW - SNP-CpG interaction

KW - Software

UR - http://www.scopus.com/inward/record.url?scp=85030241332&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030241332&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1838-y

DO - 10.1186/s12859-017-1838-y

M3 - Article

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 429

ER -