Virtual genetic coding and time series analysis for alternative splicing prediction in C. elegans

Michele Ceccarelli, Antonio Maratea

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Motivation: Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. Objective and method: With the aim of developing a fully automatic procedure of recognition of alternative splicing events based only on the genomic sequence, we first introduce a virtual genetic coding scheme to numerically modeling the information content of sequences in an effective way, then we use time series analysis to extract a fixed-length set of features from each sequence and finally we adopt a supervised learning method, namely the support vector machine, to predict alternative splicing events. Results: On the base of real C. elegans data, we show that it is possible within this purely numeric framework to obtain results better than the state of the art, without any explicit modeling of homology or positions in the splice site, nor any use of other local features. Conclusion: The virtual genetic coding together with time series analysis allows us to introduce an effective and powerful sequence coding scheme, that may be useful in various areas of genomics and transcriptomics.

Original languageEnglish
Pages (from-to)109-115
Number of pages7
JournalArtificial Intelligence in Medicine
Volume45
Issue number2-3
DOIs
Publication statusPublished - 1 Feb 2009
Externally publishedYes

Fingerprint

Time series analysis
Alternative Splicing
Supervised learning
Learning algorithms
Support vector machines
Learning systems
Sequence Homology
Genomics
Learning

Keywords

  • Alternative splicing
  • Autoregressive model
  • Support vector machine
  • Virtual genetic code

ASJC Scopus subject areas

  • Artificial Intelligence
  • Medicine (miscellaneous)

Cite this

Virtual genetic coding and time series analysis for alternative splicing prediction in C. elegans. / Ceccarelli, Michele; Maratea, Antonio.

In: Artificial Intelligence in Medicine, Vol. 45, No. 2-3, 01.02.2009, p. 109-115.

Research output: Contribution to journalArticle

@article{f65062a239ee4164be2dc94d31fad9df,
title = "Virtual genetic coding and time series analysis for alternative splicing prediction in C. elegans",
abstract = "Motivation: Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. Objective and method: With the aim of developing a fully automatic procedure of recognition of alternative splicing events based only on the genomic sequence, we first introduce a virtual genetic coding scheme to numerically modeling the information content of sequences in an effective way, then we use time series analysis to extract a fixed-length set of features from each sequence and finally we adopt a supervised learning method, namely the support vector machine, to predict alternative splicing events. Results: On the base of real C. elegans data, we show that it is possible within this purely numeric framework to obtain results better than the state of the art, without any explicit modeling of homology or positions in the splice site, nor any use of other local features. Conclusion: The virtual genetic coding together with time series analysis allows us to introduce an effective and powerful sequence coding scheme, that may be useful in various areas of genomics and transcriptomics.",
keywords = "Alternative splicing, Autoregressive model, Support vector machine, Virtual genetic code",
author = "Michele Ceccarelli and Antonio Maratea",
year = "2009",
month = "2",
day = "1",
doi = "10.1016/j.artmed.2008.08.013",
language = "English",
volume = "45",
pages = "109--115",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",
number = "2-3",

}

TY - JOUR

T1 - Virtual genetic coding and time series analysis for alternative splicing prediction in C. elegans

AU - Ceccarelli, Michele

AU - Maratea, Antonio

PY - 2009/2/1

Y1 - 2009/2/1

N2 - Motivation: Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. Objective and method: With the aim of developing a fully automatic procedure of recognition of alternative splicing events based only on the genomic sequence, we first introduce a virtual genetic coding scheme to numerically modeling the information content of sequences in an effective way, then we use time series analysis to extract a fixed-length set of features from each sequence and finally we adopt a supervised learning method, namely the support vector machine, to predict alternative splicing events. Results: On the base of real C. elegans data, we show that it is possible within this purely numeric framework to obtain results better than the state of the art, without any explicit modeling of homology or positions in the splice site, nor any use of other local features. Conclusion: The virtual genetic coding together with time series analysis allows us to introduce an effective and powerful sequence coding scheme, that may be useful in various areas of genomics and transcriptomics.

AB - Motivation: Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. Objective and method: With the aim of developing a fully automatic procedure of recognition of alternative splicing events based only on the genomic sequence, we first introduce a virtual genetic coding scheme to numerically modeling the information content of sequences in an effective way, then we use time series analysis to extract a fixed-length set of features from each sequence and finally we adopt a supervised learning method, namely the support vector machine, to predict alternative splicing events. Results: On the base of real C. elegans data, we show that it is possible within this purely numeric framework to obtain results better than the state of the art, without any explicit modeling of homology or positions in the splice site, nor any use of other local features. Conclusion: The virtual genetic coding together with time series analysis allows us to introduce an effective and powerful sequence coding scheme, that may be useful in various areas of genomics and transcriptomics.

KW - Alternative splicing

KW - Autoregressive model

KW - Support vector machine

KW - Virtual genetic code

UR - http://www.scopus.com/inward/record.url?scp=61449229277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=61449229277&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2008.08.013

DO - 10.1016/j.artmed.2008.08.013

M3 - Article

VL - 45

SP - 109

EP - 115

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

IS - 2-3

ER -