MiB: A comparative assembly processing pipeline

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.

Original languageEnglish
Title of host publicationProceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
Pages86-89
Number of pages4
DOIs
Publication statusPublished - 2012
Event2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 - Washington, DC, United States
Duration: 2 Dec 20124 Dec 2012

Other

Other2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
CountryUnited States
CityWashington, DC
Period2/12/124/12/12

Fingerprint

Pipelines
Insertional Mutagenesis
Genome
Processing
Genes
Consensus Sequence
Single Nucleotide Polymorphism
Nucleotides
Polymorphism

Keywords

  • Bayesian Estimation
  • de-Bruijn Graphs
  • Genome Assembly
  • Minimum Description Length

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Cite this

Wajid, B., Serpedin, E., Nounou, M., & Nounou, H. (2012). MiB: A comparative assembly processing pipeline. In Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 (pp. 86-89). [6507733] https://doi.org/10.1109/GENSIPS.2012.6507733

MiB : A comparative assembly processing pipeline. / Wajid, Bilal; Serpedin, Erchin; Nounou, Mohamed; Nounou, Hazem.

Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012. 2012. p. 86-89 6507733.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wajid, B, Serpedin, E, Nounou, M & Nounou, H 2012, MiB: A comparative assembly processing pipeline. in Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012., 6507733, pp. 86-89, 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012, Washington, DC, United States, 2/12/12. https://doi.org/10.1109/GENSIPS.2012.6507733
Wajid B, Serpedin E, Nounou M, Nounou H. MiB: A comparative assembly processing pipeline. In Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012. 2012. p. 86-89. 6507733 https://doi.org/10.1109/GENSIPS.2012.6507733
Wajid, Bilal ; Serpedin, Erchin ; Nounou, Mohamed ; Nounou, Hazem. / MiB : A comparative assembly processing pipeline. Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012. 2012. pp. 86-89
@inproceedings{6bca6e1bf7d944a69884dcdee4e4cd33,
title = "MiB: A comparative assembly processing pipeline",
abstract = "This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.",
keywords = "Bayesian Estimation, de-Bruijn Graphs, Genome Assembly, Minimum Description Length",
author = "Bilal Wajid and Erchin Serpedin and Mohamed Nounou and Hazem Nounou",
year = "2012",
doi = "10.1109/GENSIPS.2012.6507733",
language = "English",
isbn = "9781467352369",
pages = "86--89",
booktitle = "Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012",

}

TY - GEN

T1 - MiB

T2 - A comparative assembly processing pipeline

AU - Wajid, Bilal

AU - Serpedin, Erchin

AU - Nounou, Mohamed

AU - Nounou, Hazem

PY - 2012

Y1 - 2012

N2 - This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.

AB - This paper introduces MiB, a comparative genome assembly pipeline that uses three key steps. The first step involves choosing the best reference sequence by using the Minimum Description Length (MDL) principle. The MDL principle not only chooses the best reference sequence (model) but also fine-tunes the model for a better assembly by rectifying all the inversions and removing most of the insertions from the reference sequence. The MDL principle also identifies the set of reads that could align to the reference sequence. The second stage uses the same set of reads that did not align to the reference sequence as an input to a de-Buijn graph based algorithm that Identifies the Deletions in the reference sequence and then Inserts Them at Appropriate Places (IDITAP). The last stage uses Bayesian Estimation for Comparative Assembly (BECA). BECA uses Quality (Q-) values for identifying probabilities of the base calls for every read and then exploits the Q-values to find the best alignments and the consensus sequence. Therefore, MiB, derived from the use of MDL-IDITAP-BECA aims to take the optimal reference sequence and the set of reads from the unassembled genome and transform the reference sequence into the novel genome by removing or rectifying four set of mutations: inversions and insertions using MDL, deletions using IDITAP and Single Nucleotide Polymorphisms (SNPs) using BECA. Preliminary test results of the proposed framework revealed promising results.

KW - Bayesian Estimation

KW - de-Bruijn Graphs

KW - Genome Assembly

KW - Minimum Description Length

UR - http://www.scopus.com/inward/record.url?scp=84877814394&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877814394&partnerID=8YFLogxK

U2 - 10.1109/GENSIPS.2012.6507733

DO - 10.1109/GENSIPS.2012.6507733

M3 - Conference contribution

AN - SCOPUS:84877814394

SN - 9781467352369

SP - 86

EP - 89

BT - Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012

ER -