Bayesian independent component analysis recovers pathway signatures from blood metabolomics data

Jan Krumsiek, Karsten Suhre, Thomas Illig, Jerzy Adamski, Fabian J. Theis

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Interpreting the complex interplay of metabolites in heterogeneous biosamples still poses a challenging task. In this study, we propose independent component analysis (ICA) as a multivariate analysis tool for the interpretation of large-scale metabolomics data. In particular, we employ a Bayesian ICA method based on a mean-field approach, which allows us to statistically infer the number of independent components to be reconstructed. The advantage of ICA over correlation-based methods like principal component analysis (PCA) is the utilization of higher order statistical dependencies, which not only yield additional information but also allow a more meaningful representation of the data with fewer components. We performed the described ICA approach on a large-scale metabolomics data set of human serum samples, comprising a total of 1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles using a weighted enrichment algorithm, we observe strong enrichment of specific metabolic pathways in all components. This includes signatures from amino acid metabolism, energy-related processes, carbohydrate metabolism, and lipid metabolism. Our results imply that the human blood metabolome is composed of a distinct set of overlaying, statistically independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each of the study probands. Correlating these values with plasma high-density lipoprotein (HDL) levels, we establish a novel association between HDL plasma levels and the branched-chain amino acid pathway. We conclude that the Bayesian ICA methodology has the power and flexibility to replace many of the nowadays common PCA and clustering-based analyses common in the research field.

Original languageEnglish
Pages (from-to)4120-4131
Number of pages12
JournalJournal of Proteome Research
Volume11
Issue number8
DOIs
Publication statusPublished - 3 Aug 2012
Externally publishedYes

Fingerprint

Metabolomics
Independent component analysis
HDL Lipoproteins
Principal Component Analysis
Blood
Branched Chain Amino Acids
Metabolome
Carbohydrate Metabolism
Metabolites
Metabolic Networks and Pathways
Lipid Metabolism
Energy Metabolism
Cluster Analysis
Multivariate Analysis
Amino Acids
Principal component analysis
Serum
Research
Plasmas
Datasets

Keywords

  • Bayesian
  • bioinformatics
  • blood serum
  • independent component analysis
  • metabolomics
  • population cohorts
  • systems biology

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

Bayesian independent component analysis recovers pathway signatures from blood metabolomics data. / Krumsiek, Jan; Suhre, Karsten; Illig, Thomas; Adamski, Jerzy; Theis, Fabian J.

In: Journal of Proteome Research, Vol. 11, No. 8, 03.08.2012, p. 4120-4131.

Research output: Contribution to journalArticle

Krumsiek, Jan ; Suhre, Karsten ; Illig, Thomas ; Adamski, Jerzy ; Theis, Fabian J. / Bayesian independent component analysis recovers pathway signatures from blood metabolomics data. In: Journal of Proteome Research. 2012 ; Vol. 11, No. 8. pp. 4120-4131.
@article{b6ddf1d274744dfdbee28f9e6d8b153c,
title = "Bayesian independent component analysis recovers pathway signatures from blood metabolomics data",
abstract = "Interpreting the complex interplay of metabolites in heterogeneous biosamples still poses a challenging task. In this study, we propose independent component analysis (ICA) as a multivariate analysis tool for the interpretation of large-scale metabolomics data. In particular, we employ a Bayesian ICA method based on a mean-field approach, which allows us to statistically infer the number of independent components to be reconstructed. The advantage of ICA over correlation-based methods like principal component analysis (PCA) is the utilization of higher order statistical dependencies, which not only yield additional information but also allow a more meaningful representation of the data with fewer components. We performed the described ICA approach on a large-scale metabolomics data set of human serum samples, comprising a total of 1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles using a weighted enrichment algorithm, we observe strong enrichment of specific metabolic pathways in all components. This includes signatures from amino acid metabolism, energy-related processes, carbohydrate metabolism, and lipid metabolism. Our results imply that the human blood metabolome is composed of a distinct set of overlaying, statistically independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each of the study probands. Correlating these values with plasma high-density lipoprotein (HDL) levels, we establish a novel association between HDL plasma levels and the branched-chain amino acid pathway. We conclude that the Bayesian ICA methodology has the power and flexibility to replace many of the nowadays common PCA and clustering-based analyses common in the research field.",
keywords = "Bayesian, bioinformatics, blood serum, independent component analysis, metabolomics, population cohorts, systems biology",
author = "Jan Krumsiek and Karsten Suhre and Thomas Illig and Jerzy Adamski and Theis, {Fabian J.}",
year = "2012",
month = "8",
day = "3",
doi = "10.1021/pr300231n",
language = "English",
volume = "11",
pages = "4120--4131",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "8",

}

TY - JOUR

T1 - Bayesian independent component analysis recovers pathway signatures from blood metabolomics data

AU - Krumsiek, Jan

AU - Suhre, Karsten

AU - Illig, Thomas

AU - Adamski, Jerzy

AU - Theis, Fabian J.

PY - 2012/8/3

Y1 - 2012/8/3

N2 - Interpreting the complex interplay of metabolites in heterogeneous biosamples still poses a challenging task. In this study, we propose independent component analysis (ICA) as a multivariate analysis tool for the interpretation of large-scale metabolomics data. In particular, we employ a Bayesian ICA method based on a mean-field approach, which allows us to statistically infer the number of independent components to be reconstructed. The advantage of ICA over correlation-based methods like principal component analysis (PCA) is the utilization of higher order statistical dependencies, which not only yield additional information but also allow a more meaningful representation of the data with fewer components. We performed the described ICA approach on a large-scale metabolomics data set of human serum samples, comprising a total of 1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles using a weighted enrichment algorithm, we observe strong enrichment of specific metabolic pathways in all components. This includes signatures from amino acid metabolism, energy-related processes, carbohydrate metabolism, and lipid metabolism. Our results imply that the human blood metabolome is composed of a distinct set of overlaying, statistically independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each of the study probands. Correlating these values with plasma high-density lipoprotein (HDL) levels, we establish a novel association between HDL plasma levels and the branched-chain amino acid pathway. We conclude that the Bayesian ICA methodology has the power and flexibility to replace many of the nowadays common PCA and clustering-based analyses common in the research field.

AB - Interpreting the complex interplay of metabolites in heterogeneous biosamples still poses a challenging task. In this study, we propose independent component analysis (ICA) as a multivariate analysis tool for the interpretation of large-scale metabolomics data. In particular, we employ a Bayesian ICA method based on a mean-field approach, which allows us to statistically infer the number of independent components to be reconstructed. The advantage of ICA over correlation-based methods like principal component analysis (PCA) is the utilization of higher order statistical dependencies, which not only yield additional information but also allow a more meaningful representation of the data with fewer components. We performed the described ICA approach on a large-scale metabolomics data set of human serum samples, comprising a total of 1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles using a weighted enrichment algorithm, we observe strong enrichment of specific metabolic pathways in all components. This includes signatures from amino acid metabolism, energy-related processes, carbohydrate metabolism, and lipid metabolism. Our results imply that the human blood metabolome is composed of a distinct set of overlaying, statistically independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each of the study probands. Correlating these values with plasma high-density lipoprotein (HDL) levels, we establish a novel association between HDL plasma levels and the branched-chain amino acid pathway. We conclude that the Bayesian ICA methodology has the power and flexibility to replace many of the nowadays common PCA and clustering-based analyses common in the research field.

KW - Bayesian

KW - bioinformatics

KW - blood serum

KW - independent component analysis

KW - metabolomics

KW - population cohorts

KW - systems biology

UR - http://www.scopus.com/inward/record.url?scp=84864595849&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864595849&partnerID=8YFLogxK

U2 - 10.1021/pr300231n

DO - 10.1021/pr300231n

M3 - Article

VL - 11

SP - 4120

EP - 4131

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 8

ER -