Minimum description length based selection of reference sequences for comparative assemblers

Bilal Wajid, Erchin Serpedin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Genome sequences are the most basic, yet most essential pieces of data in all biological analysis. Genome sequence is the solution to the Genome Assembly problem which remakes the entire sequence from a set of reads which are unordered and very small in size. Genome Assembly problem is therefore, quite complex and is broadly divided into denovo and comparative assembly. Comparative assembly takes the aid of a reference sequence, closely related to the unassembled genome, to determine the relative order of the reads with respect to one another, and then joins them together to form the sequence. This paper explores all variants of Minimum Description Length (MDL) to find the best reference sequence for comparative assembly. The paper looked at two-part MDL, Sophisticated MDL and MiniMax Regret and found that Sophisticated MDL performs better than two-part MDL, however, MiniMax regret owing to the nature of the problem was unsuitable. The proposed scheme is prior free and can be incorporated in the data preprocessing stage for all comparative assemblers allowing the assembly process to make use of the best reference sequence available.

Original languageEnglish
Title of host publicationProceedings 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
Pages230-233
Number of pages4
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11 - San Antonio, TX, United States
Duration: 4 Dec 20116 Dec 2011

Other

Other2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
CountryUnited States
CitySan Antonio, TX
Period4/12/116/12/11

    Fingerprint

Keywords

  • Comparative assembly
  • Genome assembly
  • MiniMax regret
  • Minimum description length
  • Sophisticated MDL
  • Two-part MDL

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Cite this

Wajid, B., & Serpedin, E. (2011). Minimum description length based selection of reference sequences for comparative assemblers. In Proceedings 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11 (pp. 230-233). [6169487]