Virtual genetic coding and time series analysis for alternative splicing prediction in C. elegans

Michele Ceccarelli, Antonio Maratea

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Motivation: Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. Objective and method: With the aim of developing a fully automatic procedure of recognition of alternative splicing events based only on the genomic sequence, we first introduce a virtual genetic coding scheme to numerically modeling the information content of sequences in an effective way, then we use time series analysis to extract a fixed-length set of features from each sequence and finally we adopt a supervised learning method, namely the support vector machine, to predict alternative splicing events. Results: On the base of real C. elegans data, we show that it is possible within this purely numeric framework to obtain results better than the state of the art, without any explicit modeling of homology or positions in the splice site, nor any use of other local features. Conclusion: The virtual genetic coding together with time series analysis allows us to introduce an effective and powerful sequence coding scheme, that may be useful in various areas of genomics and transcriptomics.

Original languageEnglish
Pages (from-to)109-115
Number of pages7
JournalArtificial Intelligence in Medicine
Volume45
Issue number2-3
DOIs
Publication statusPublished - 1 Feb 2009
Externally publishedYes

    Fingerprint

Keywords

  • Alternative splicing
  • Autoregressive model
  • Support vector machine
  • Virtual genetic code

ASJC Scopus subject areas

  • Artificial Intelligence
  • Medicine (miscellaneous)

Cite this