RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes

RaghvenPhDa Mall, Luigi Cerulo, Luciano Garofano, Veronique Frattini, Khalid Kunji, Halima Bensmail, Thais S. Sabedot, Houtan Noushmehr, Anna Lasorella, Antonio Iavarone, Michele Ceccarelli

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.

Original languageEnglish
Pages (from-to)e39
JournalNucleic Acids Research
Volume46
Issue number7
DOIs
Publication statusPublished - 20 Apr 2018

Fingerprint

Gene Regulatory Networks
Regulator Genes
Glioblastoma
Transcriptome
Brain Neoplasms
Glioma
Transcription Factors
Weights and Measures
Genes
Machine Learning
Forests

ASJC Scopus subject areas

  • Genetics

Cite this

RGBM : regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. / Mall, RaghvenPhDa; Cerulo, Luigi; Garofano, Luciano; Frattini, Veronique; Kunji, Khalid; Bensmail, Halima; Sabedot, Thais S.; Noushmehr, Houtan; Lasorella, Anna; Iavarone, Antonio; Ceccarelli, Michele.

In: Nucleic Acids Research, Vol. 46, No. 7, 20.04.2018, p. e39.

Research output: Contribution to journalArticle

Mall, RaghvenPhDa ; Cerulo, Luigi ; Garofano, Luciano ; Frattini, Veronique ; Kunji, Khalid ; Bensmail, Halima ; Sabedot, Thais S. ; Noushmehr, Houtan ; Lasorella, Anna ; Iavarone, Antonio ; Ceccarelli, Michele. / RGBM : regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes. In: Nucleic Acids Research. 2018 ; Vol. 46, No. 7. pp. e39.
@article{f70657e1659b49b8a0d6483ebdca9392,
title = "RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes",
abstract = "We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.",
author = "RaghvenPhDa Mall and Luigi Cerulo and Luciano Garofano and Veronique Frattini and Khalid Kunji and Halima Bensmail and Sabedot, {Thais S.} and Houtan Noushmehr and Anna Lasorella and Antonio Iavarone and Michele Ceccarelli",
year = "2018",
month = "4",
day = "20",
doi = "10.1093/nar/gky015",
language = "English",
volume = "46",
pages = "e39",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "7",

}

TY - JOUR

T1 - RGBM

T2 - regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes

AU - Mall, RaghvenPhDa

AU - Cerulo, Luigi

AU - Garofano, Luciano

AU - Frattini, Veronique

AU - Kunji, Khalid

AU - Bensmail, Halima

AU - Sabedot, Thais S.

AU - Noushmehr, Houtan

AU - Lasorella, Anna

AU - Iavarone, Antonio

AU - Ceccarelli, Michele

PY - 2018/4/20

Y1 - 2018/4/20

N2 - We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.

AB - We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.

UR - http://www.scopus.com/inward/record.url?scp=85045385400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045385400&partnerID=8YFLogxK

U2 - 10.1093/nar/gky015

DO - 10.1093/nar/gky015

M3 - Article

C2 - 29361062

AN - SCOPUS:85045385400

VL - 46

SP - e39

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 7

ER -