5 Citations (Scopus)

Abstract

This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.

Original languageEnglish
Title of host publicationProceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
PublisherIEEE Computer Society
Pages86-89
Number of pages4
ISBN (Print)9781467352369
DOIs
Publication statusPublished - 1 Jan 2012
Event2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 - Washington, DC, United States
Duration: 2 Dec 20124 Dec 2012

Publication series

NameProceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
ISSN (Print)2150-3001
ISSN (Electronic)2150-301X

Other

Other2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
CountryUnited States
CityWashington, DC
Period2/12/124/12/12

Keywords

  • Bayesian Estimation
  • Genome Assembly
  • Minimum Description Length
  • de-Bruijn Graphs

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'MiB: A comparative assembly processing pipeline'. Together they form a unique fingerprint.

  • Cite this

    Wajid, B., Serpedin, E., Nounou, M., & Nounou, H. (2012). MiB: A comparative assembly processing pipeline. In Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 (pp. 86-89). [6507733] (Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics). IEEE Computer Society. https://doi.org/10.1109/GENSIPS.2012.6507733