Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm

Mustafa Alshawaqfeh, Ahmad Bashaireh, Erchin Serpedin, Jan Suchodolski

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Biomarker detection presents itself as a major means of translating biological data into clinical applications. Due to the recent advances in high throughput sequencing technologies, an increased number of metagenomics studies have suggested the dysbiosis in microbial communities as potential biomarker for certain diseases. The reproducibility of the results drawn from metagenomic data is crucial for clinical applications and to prevent incorrect biological conclusions. The variability in the sample size and the subjects participating in the experiments induce diversity, which may drastically change the outcome of biomarker detection algorithms. Therefore, a robust biomarker detection algorithm that ensures the consistency of the results irrespective of the natural diversity present in the samples is needed. Results: Toward this end, this paper proposes a novel Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD models the bacterial abundance data as a superposition between a sparse matrix and a low-rank matrix, which account for the differentially and non-differentially abundant microbes, respectively. Hence, the biomarker detection problem is cast as a matrix decomposition problem. In order to yield more consistent and solid biological conclusions, RegLRSD incorporates the prior knowledge that the irrelevant microbes do not exhibit significant variation between samples belonging to different phenotypes. Moreover, an efficient algorithm to extract the sparse matrix is proposed. Comprehensive comparisons of RegLRSD with the state-of-the-art algorithms on three realistic datasets are presented. The obtained results demonstrate that RegLRSD consistently outperforms the other algorithms in terms of reproducibility performance and provides a marker list with high classification accuracy. Conclusions: The proposed RegLRSD algorithm for biomarker detection provides high reproducibility and classification accuracy performance regardless of the dataset complexity and the number of selected biomarkers. This renders RegLRSD as a reliable and powerful tool for identifying potential metagenomic biomarkers.

Original languageEnglish
Article number328
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
Publication statusPublished - 10 Jul 2017

Fingerprint

Metagenomics
Decomposition Algorithm
Biomarkers
Decomposition
Reproducibility
Decompose
Sparse matrix
Dysbiosis
Low-rank Matrices
Matrix Decomposition
Prior Knowledge
Reproducibility of Results
Phenotype
Sample Size
Sequencing
High Throughput
Superposition
Efficient Algorithms
Throughput
Technology

Keywords

  • Alternating direction method of multipliers
  • Augmented Lagrangian
  • Biomarker detection
  • Matrix decomposition
  • Metagenomics

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm. / Alshawaqfeh, Mustafa; Bashaireh, Ahmad; Serpedin, Erchin; Suchodolski, Jan.

In: BMC Bioinformatics, Vol. 18, No. 1, 328, 10.07.2017.

Research output: Contribution to journalArticle

Alshawaqfeh, Mustafa ; Bashaireh, Ahmad ; Serpedin, Erchin ; Suchodolski, Jan. / Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm. In: BMC Bioinformatics. 2017 ; Vol. 18, No. 1.
@article{28ecd8e4fa88400ca0a04a3a8b161131,
title = "Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm",
abstract = "Background: Biomarker detection presents itself as a major means of translating biological data into clinical applications. Due to the recent advances in high throughput sequencing technologies, an increased number of metagenomics studies have suggested the dysbiosis in microbial communities as potential biomarker for certain diseases. The reproducibility of the results drawn from metagenomic data is crucial for clinical applications and to prevent incorrect biological conclusions. The variability in the sample size and the subjects participating in the experiments induce diversity, which may drastically change the outcome of biomarker detection algorithms. Therefore, a robust biomarker detection algorithm that ensures the consistency of the results irrespective of the natural diversity present in the samples is needed. Results: Toward this end, this paper proposes a novel Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD models the bacterial abundance data as a superposition between a sparse matrix and a low-rank matrix, which account for the differentially and non-differentially abundant microbes, respectively. Hence, the biomarker detection problem is cast as a matrix decomposition problem. In order to yield more consistent and solid biological conclusions, RegLRSD incorporates the prior knowledge that the irrelevant microbes do not exhibit significant variation between samples belonging to different phenotypes. Moreover, an efficient algorithm to extract the sparse matrix is proposed. Comprehensive comparisons of RegLRSD with the state-of-the-art algorithms on three realistic datasets are presented. The obtained results demonstrate that RegLRSD consistently outperforms the other algorithms in terms of reproducibility performance and provides a marker list with high classification accuracy. Conclusions: The proposed RegLRSD algorithm for biomarker detection provides high reproducibility and classification accuracy performance regardless of the dataset complexity and the number of selected biomarkers. This renders RegLRSD as a reliable and powerful tool for identifying potential metagenomic biomarkers.",
keywords = "Alternating direction method of multipliers, Augmented Lagrangian, Biomarker detection, Matrix decomposition, Metagenomics",
author = "Mustafa Alshawaqfeh and Ahmad Bashaireh and Erchin Serpedin and Jan Suchodolski",
year = "2017",
month = "7",
day = "10",
doi = "10.1186/s12859-017-1738-1",
language = "English",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Reliable Biomarker discovery from Metagenomic data via RegLRSD algorithm

AU - Alshawaqfeh, Mustafa

AU - Bashaireh, Ahmad

AU - Serpedin, Erchin

AU - Suchodolski, Jan

PY - 2017/7/10

Y1 - 2017/7/10

N2 - Background: Biomarker detection presents itself as a major means of translating biological data into clinical applications. Due to the recent advances in high throughput sequencing technologies, an increased number of metagenomics studies have suggested the dysbiosis in microbial communities as potential biomarker for certain diseases. The reproducibility of the results drawn from metagenomic data is crucial for clinical applications and to prevent incorrect biological conclusions. The variability in the sample size and the subjects participating in the experiments induce diversity, which may drastically change the outcome of biomarker detection algorithms. Therefore, a robust biomarker detection algorithm that ensures the consistency of the results irrespective of the natural diversity present in the samples is needed. Results: Toward this end, this paper proposes a novel Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD models the bacterial abundance data as a superposition between a sparse matrix and a low-rank matrix, which account for the differentially and non-differentially abundant microbes, respectively. Hence, the biomarker detection problem is cast as a matrix decomposition problem. In order to yield more consistent and solid biological conclusions, RegLRSD incorporates the prior knowledge that the irrelevant microbes do not exhibit significant variation between samples belonging to different phenotypes. Moreover, an efficient algorithm to extract the sparse matrix is proposed. Comprehensive comparisons of RegLRSD with the state-of-the-art algorithms on three realistic datasets are presented. The obtained results demonstrate that RegLRSD consistently outperforms the other algorithms in terms of reproducibility performance and provides a marker list with high classification accuracy. Conclusions: The proposed RegLRSD algorithm for biomarker detection provides high reproducibility and classification accuracy performance regardless of the dataset complexity and the number of selected biomarkers. This renders RegLRSD as a reliable and powerful tool for identifying potential metagenomic biomarkers.

AB - Background: Biomarker detection presents itself as a major means of translating biological data into clinical applications. Due to the recent advances in high throughput sequencing technologies, an increased number of metagenomics studies have suggested the dysbiosis in microbial communities as potential biomarker for certain diseases. The reproducibility of the results drawn from metagenomic data is crucial for clinical applications and to prevent incorrect biological conclusions. The variability in the sample size and the subjects participating in the experiments induce diversity, which may drastically change the outcome of biomarker detection algorithms. Therefore, a robust biomarker detection algorithm that ensures the consistency of the results irrespective of the natural diversity present in the samples is needed. Results: Toward this end, this paper proposes a novel Regularized Low Rank-Sparse Decomposition (RegLRSD) algorithm. RegLRSD models the bacterial abundance data as a superposition between a sparse matrix and a low-rank matrix, which account for the differentially and non-differentially abundant microbes, respectively. Hence, the biomarker detection problem is cast as a matrix decomposition problem. In order to yield more consistent and solid biological conclusions, RegLRSD incorporates the prior knowledge that the irrelevant microbes do not exhibit significant variation between samples belonging to different phenotypes. Moreover, an efficient algorithm to extract the sparse matrix is proposed. Comprehensive comparisons of RegLRSD with the state-of-the-art algorithms on three realistic datasets are presented. The obtained results demonstrate that RegLRSD consistently outperforms the other algorithms in terms of reproducibility performance and provides a marker list with high classification accuracy. Conclusions: The proposed RegLRSD algorithm for biomarker detection provides high reproducibility and classification accuracy performance regardless of the dataset complexity and the number of selected biomarkers. This renders RegLRSD as a reliable and powerful tool for identifying potential metagenomic biomarkers.

KW - Alternating direction method of multipliers

KW - Augmented Lagrangian

KW - Biomarker detection

KW - Matrix decomposition

KW - Metagenomics

UR - http://www.scopus.com/inward/record.url?scp=85021905437&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021905437&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1738-1

DO - 10.1186/s12859-017-1738-1

M3 - Article

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 328

ER -