Improving biomarker list stability by integration of biological knowledge in the learning process

Tiziana Sanavia, Fabio Aiolli, Giovanni Martino, Andrea Bisognin, Barbara Di Camillo

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes.Results: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy.Conclusions: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.

Original languageEnglish
Article numberS22
JournalBMC Bioinformatics
Volume13
Issue numberSUPPL.4
DOIs
Publication statusPublished - 28 Mar 2012
Externally publishedYes

Fingerprint

Biomarkers
Learning Process
Protein-protein Interaction
Learning
Proteins
Protein Interaction Maps
Molecular Sequence Annotation
Protein Interaction Networks
Genes
Gene
Annotation
Biological Phenomena
Geodesic Distance
Gene Ontology
Semantic Similarity
Neoplasm Genes
Classification Algorithm
Microarray Data
Semantics
Inconsistency

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Improving biomarker list stability by integration of biological knowledge in the learning process. / Sanavia, Tiziana; Aiolli, Fabio; Martino, Giovanni; Bisognin, Andrea; Di Camillo, Barbara.

In: BMC Bioinformatics, Vol. 13, No. SUPPL.4, S22, 28.03.2012.

Research output: Contribution to journalArticle

Sanavia, Tiziana ; Aiolli, Fabio ; Martino, Giovanni ; Bisognin, Andrea ; Di Camillo, Barbara. / Improving biomarker list stability by integration of biological knowledge in the learning process. In: BMC Bioinformatics. 2012 ; Vol. 13, No. SUPPL.4.
@article{3d7a41ea67da4036b3085e760a2412c0,
title = "Improving biomarker list stability by integration of biological knowledge in the learning process",
abstract = "Background: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes.Results: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy.Conclusions: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.",
author = "Tiziana Sanavia and Fabio Aiolli and Giovanni Martino and Andrea Bisognin and {Di Camillo}, Barbara",
year = "2012",
month = "3",
day = "28",
doi = "10.1186/1471-2105-13-S4-S22",
language = "English",
volume = "13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.4",

}

TY - JOUR

T1 - Improving biomarker list stability by integration of biological knowledge in the learning process

AU - Sanavia, Tiziana

AU - Aiolli, Fabio

AU - Martino, Giovanni

AU - Bisognin, Andrea

AU - Di Camillo, Barbara

PY - 2012/3/28

Y1 - 2012/3/28

N2 - Background: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes.Results: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy.Conclusions: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.

AB - Background: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes.Results: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy.Conclusions: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.

UR - http://www.scopus.com/inward/record.url?scp=84865119436&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865119436&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-S4-S22

DO - 10.1186/1471-2105-13-S4-S22

M3 - Article

C2 - 22536969

AN - SCOPUS:84865119436

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.4

M1 - S22

ER -