Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics

Teresa M.R. Noviello, Antonella Di Liddo, Giovanna M. Ventola, Antonietta Spagnuolo, Salvatore D'Aniello, Michele Ceccarelli, Luigi Cerulo

Research output: Contribution to journalArticle

Abstract

Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. Results: In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs.

Original languageEnglish
Article number407
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
Publication statusPublished - 6 Nov 2018
Externally publishedYes

Fingerprint

Long Noncoding RNA
RNA
Comparative Study
Homology
Alignment
Genes
Gene
Metric
Coding
Genetic Promoter Regions
Conservation
Protein
Promoter
Biological Phenomena
Amino Acid Sequence Homology
Proteins
Untranslated RNA
Zebrafish
Computational Biology
Tend

Keywords

  • Homology
  • Long ncRNA
  • String similarity

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Noviello, T. M. R., Di Liddo, A., Ventola, G. M., Spagnuolo, A., D'Aniello, S., Ceccarelli, M., & Cerulo, L. (2018). Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics. BMC Bioinformatics, 19(1), [407]. https://doi.org/10.1186/s12859-018-2441-6

Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics. / Noviello, Teresa M.R.; Di Liddo, Antonella; Ventola, Giovanna M.; Spagnuolo, Antonietta; D'Aniello, Salvatore; Ceccarelli, Michele; Cerulo, Luigi.

In: BMC Bioinformatics, Vol. 19, No. 1, 407, 06.11.2018.

Research output: Contribution to journalArticle

Noviello, Teresa M.R. ; Di Liddo, Antonella ; Ventola, Giovanna M. ; Spagnuolo, Antonietta ; D'Aniello, Salvatore ; Ceccarelli, Michele ; Cerulo, Luigi. / Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics. In: BMC Bioinformatics. 2018 ; Vol. 19, No. 1.
@article{856aa4e164d244608cdd958c6184c3a8,
title = "Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics",
abstract = "Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. Results: In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs.",
keywords = "Homology, Long ncRNA, String similarity",
author = "Noviello, {Teresa M.R.} and {Di Liddo}, Antonella and Ventola, {Giovanna M.} and Antonietta Spagnuolo and Salvatore D'Aniello and Michele Ceccarelli and Luigi Cerulo",
year = "2018",
month = "11",
day = "6",
doi = "10.1186/s12859-018-2441-6",
language = "English",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics

AU - Noviello, Teresa M.R.

AU - Di Liddo, Antonella

AU - Ventola, Giovanna M.

AU - Spagnuolo, Antonietta

AU - D'Aniello, Salvatore

AU - Ceccarelli, Michele

AU - Cerulo, Luigi

PY - 2018/11/6

Y1 - 2018/11/6

N2 - Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. Results: In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs.

AB - Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. Results: In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs.

KW - Homology

KW - Long ncRNA

KW - String similarity

UR - http://www.scopus.com/inward/record.url?scp=85056133228&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056133228&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2441-6

DO - 10.1186/s12859-018-2441-6

M3 - Article

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 407

ER -