MARAGAP

A modular approach to reference assisted genome assembly pipeline

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper presents MARAGAP, a modular approach to reference assisted genome assembly pipeline. MARAGAP uses the principle of Minimum Description Length to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and SNPs in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer the insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs.

Original languageEnglish
Pages (from-to)226-250
Number of pages25
JournalInternational Journal of Computational Biology and Drug Design
Volume8
Issue number3
DOIs
Publication statusPublished - 2015

Fingerprint

Single Nucleotide Polymorphism
Pipelines
Genes
Genome

Keywords

  • Bayesian statistics
  • De-Bruijn graph
  • Genome assembly
  • Graph theory
  • Local alignment
  • Minimum description length principle
  • Mutations
  • Next generation sequencing
  • Reference assisted assembly
  • Single nucleotide polymorphisms
  • SNPs

ASJC Scopus subject areas

  • Drug Discovery
  • Computer Science Applications

Cite this

@article{0ef4315009e44775b181611f62ce709d,
title = "MARAGAP: A modular approach to reference assisted genome assembly pipeline",
abstract = "This paper presents MARAGAP, a modular approach to reference assisted genome assembly pipeline. MARAGAP uses the principle of Minimum Description Length to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and SNPs in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer the insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs.",
keywords = "Bayesian statistics, De-Bruijn graph, Genome assembly, Graph theory, Local alignment, Minimum description length principle, Mutations, Next generation sequencing, Reference assisted assembly, Single nucleotide polymorphisms, SNPs",
author = "Bilal Wajid and Erchin Serpedin and Mohamed Nounou and Hazem Nounou",
year = "2015",
doi = "10.1504/IJCBDD.2015.072073",
language = "English",
volume = "8",
pages = "226--250",
journal = "International Journal of Computational Biology and Drug Design",
issn = "1756-0756",
publisher = "Inderscience Enterprises Ltd",
number = "3",

}

TY - JOUR

T1 - MARAGAP

T2 - A modular approach to reference assisted genome assembly pipeline

AU - Wajid, Bilal

AU - Serpedin, Erchin

AU - Nounou, Mohamed

AU - Nounou, Hazem

PY - 2015

Y1 - 2015

N2 - This paper presents MARAGAP, a modular approach to reference assisted genome assembly pipeline. MARAGAP uses the principle of Minimum Description Length to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and SNPs in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer the insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs.

AB - This paper presents MARAGAP, a modular approach to reference assisted genome assembly pipeline. MARAGAP uses the principle of Minimum Description Length to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and SNPs in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer the insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs.

KW - Bayesian statistics

KW - De-Bruijn graph

KW - Genome assembly

KW - Graph theory

KW - Local alignment

KW - Minimum description length principle

KW - Mutations

KW - Next generation sequencing

KW - Reference assisted assembly

KW - Single nucleotide polymorphisms

KW - SNPs

UR - http://www.scopus.com/inward/record.url?scp=84943398658&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84943398658&partnerID=8YFLogxK

U2 - 10.1504/IJCBDD.2015.072073

DO - 10.1504/IJCBDD.2015.072073

M3 - Article

VL - 8

SP - 226

EP - 250

JO - International Journal of Computational Biology and Drug Design

JF - International Journal of Computational Biology and Drug Design

SN - 1756-0756

IS - 3

ER -