Biomarker discovery in inflammatory bowel diseases using network-based feature selection

Mostafa Abbas, John Matta, Thanh Le, Halima Bensmail, Tayo Obafemi-Ajayi, Vasant Honavar, Yasser EL-Manzalawy

Research output: Contribution to journalArticle

Abstract

Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.

Original languageEnglish
Article numbere0225382
JournalPloS one
Volume14
Issue number11
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

inflammatory bowel disease
Biomarkers
Inflammatory Bowel Diseases
Feature extraction
biomarkers
Metagenomics
selection methods
Pediatrics
Biopsy
artificial intelligence
early diagnosis
Learning systems
Early Diagnosis
biopsy
Classifiers
methodology

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Cite this

Abbas, M., Matta, J., Le, T., Bensmail, H., Obafemi-Ajayi, T., Honavar, V., & EL-Manzalawy, Y. (2019). Biomarker discovery in inflammatory bowel diseases using network-based feature selection. PloS one, 14(11), [e0225382]. https://doi.org/10.1371/journal.pone.0225382

Biomarker discovery in inflammatory bowel diseases using network-based feature selection. / Abbas, Mostafa; Matta, John; Le, Thanh; Bensmail, Halima; Obafemi-Ajayi, Tayo; Honavar, Vasant; EL-Manzalawy, Yasser.

In: PloS one, Vol. 14, No. 11, e0225382, 01.01.2019.

Research output: Contribution to journalArticle

Abbas, M, Matta, J, Le, T, Bensmail, H, Obafemi-Ajayi, T, Honavar, V & EL-Manzalawy, Y 2019, 'Biomarker discovery in inflammatory bowel diseases using network-based feature selection', PloS one, vol. 14, no. 11, e0225382. https://doi.org/10.1371/journal.pone.0225382
Abbas, Mostafa ; Matta, John ; Le, Thanh ; Bensmail, Halima ; Obafemi-Ajayi, Tayo ; Honavar, Vasant ; EL-Manzalawy, Yasser. / Biomarker discovery in inflammatory bowel diseases using network-based feature selection. In: PloS one. 2019 ; Vol. 14, No. 11.
@article{0effefad2c8645d2973010de20080df3,
title = "Biomarker discovery in inflammatory bowel diseases using network-based feature selection",
abstract = "Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.",
author = "Mostafa Abbas and John Matta and Thanh Le and Halima Bensmail and Tayo Obafemi-Ajayi and Vasant Honavar and Yasser EL-Manzalawy",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0225382",
language = "English",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "11",

}

TY - JOUR

T1 - Biomarker discovery in inflammatory bowel diseases using network-based feature selection

AU - Abbas, Mostafa

AU - Matta, John

AU - Le, Thanh

AU - Bensmail, Halima

AU - Obafemi-Ajayi, Tayo

AU - Honavar, Vasant

AU - EL-Manzalawy, Yasser

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.

AB - Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.

UR - http://www.scopus.com/inward/record.url?scp=85075461276&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075461276&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0225382

DO - 10.1371/journal.pone.0225382

M3 - Article

C2 - 31756219

AN - SCOPUS:85075461276

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 11

M1 - e0225382

ER -