A negative selection heuristic to predict new transcriptional targets

Luigi Cerulo, Vincenzo Paduano, Pietro Zoppoli, Michele Ceccarelli

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low.Results: In this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60%.Conclusions: The availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets.

Original languageEnglish
Article numberS3
JournalBMC Bioinformatics
Volume14
Issue numberSUPPL.1
DOIs
Publication statusPublished - 14 Jan 2013
Externally publishedYes

Fingerprint

Negative Selection
Gene Regulatory Networks
Classifiers
Classifier
Heuristics
Predict
Target
Germinal Center
Data Mining
Regulator Genes
Proteomics
B-Lymphocytes
Learning
Escherichia coli
Genes
B Cells
Gene Regulatory Network
Text Mining
Supervised Learning
Escherichia Coli

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

A negative selection heuristic to predict new transcriptional targets. / Cerulo, Luigi; Paduano, Vincenzo; Zoppoli, Pietro; Ceccarelli, Michele.

In: BMC Bioinformatics, Vol. 14, No. SUPPL.1, S3, 14.01.2013.

Research output: Contribution to journalArticle

Cerulo, Luigi ; Paduano, Vincenzo ; Zoppoli, Pietro ; Ceccarelli, Michele. / A negative selection heuristic to predict new transcriptional targets. In: BMC Bioinformatics. 2013 ; Vol. 14, No. SUPPL.1.
@article{a84adbc63e5a417eb3448a972a7e0156,
title = "A negative selection heuristic to predict new transcriptional targets",
abstract = "Background: Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low.Results: In this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60{\%}.Conclusions: The availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets.",
author = "Luigi Cerulo and Vincenzo Paduano and Pietro Zoppoli and Michele Ceccarelli",
year = "2013",
month = "1",
day = "14",
doi = "10.1186/1471-2105-14-S1-S3",
language = "English",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.1",

}

TY - JOUR

T1 - A negative selection heuristic to predict new transcriptional targets

AU - Cerulo, Luigi

AU - Paduano, Vincenzo

AU - Zoppoli, Pietro

AU - Ceccarelli, Michele

PY - 2013/1/14

Y1 - 2013/1/14

N2 - Background: Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low.Results: In this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60%.Conclusions: The availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets.

AB - Background: Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low.Results: In this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60%.Conclusions: The availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets.

UR - http://www.scopus.com/inward/record.url?scp=84872392660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872392660&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-S1-S3

DO - 10.1186/1471-2105-14-S1-S3

M3 - Article

C2 - 23368951

AN - SCOPUS:84872392660

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.1

M1 - S3

ER -