CAPRG

Sequence assembling pipeline for next generation sequencing of non-model organisms

Arun Rawat, Mohamed O. Elasri, Kurt A. Gust, Glover George, Don Pham, Leona D. Scanlan, Chris Vulpe, Edward J. Perkins

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Our goal is to introduce and describe the utility of a new pipeline "Contigs Assembly Pipeline using Reference Genome" (CAPRG), which has been developed to assemble "long sequence reads" for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for "gene-hunting." Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as "unknowns" by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.

Original languageEnglish
Article numbere30370
JournalPLoS One
Volume7
Issue number2
DOIs
Publication statusPublished - 3 Feb 2012
Externally publishedYes

Fingerprint

Pipelines
Genes
Genome
genome
organisms
genome assembly
transcriptomics
application coverage
Benchmarking
inflation
Economic Inflation
genes
methodology
Technology
phylogeny
testing

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Rawat, A., Elasri, M. O., Gust, K. A., George, G., Pham, D., Scanlan, L. D., ... Perkins, E. J. (2012). CAPRG: Sequence assembling pipeline for next generation sequencing of non-model organisms. PLoS One, 7(2), [e30370]. https://doi.org/10.1371/journal.pone.0030370

CAPRG : Sequence assembling pipeline for next generation sequencing of non-model organisms. / Rawat, Arun; Elasri, Mohamed O.; Gust, Kurt A.; George, Glover; Pham, Don; Scanlan, Leona D.; Vulpe, Chris; Perkins, Edward J.

In: PLoS One, Vol. 7, No. 2, e30370, 03.02.2012.

Research output: Contribution to journalArticle

Rawat, A, Elasri, MO, Gust, KA, George, G, Pham, D, Scanlan, LD, Vulpe, C & Perkins, EJ 2012, 'CAPRG: Sequence assembling pipeline for next generation sequencing of non-model organisms', PLoS One, vol. 7, no. 2, e30370. https://doi.org/10.1371/journal.pone.0030370
Rawat, Arun ; Elasri, Mohamed O. ; Gust, Kurt A. ; George, Glover ; Pham, Don ; Scanlan, Leona D. ; Vulpe, Chris ; Perkins, Edward J. / CAPRG : Sequence assembling pipeline for next generation sequencing of non-model organisms. In: PLoS One. 2012 ; Vol. 7, No. 2.
@article{1ee2e02c13294705b94ce17ee43230e6,
title = "CAPRG: Sequence assembling pipeline for next generation sequencing of non-model organisms",
abstract = "Our goal is to introduce and describe the utility of a new pipeline {"}Contigs Assembly Pipeline using Reference Genome{"} (CAPRG), which has been developed to assemble {"}long sequence reads{"} for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for {"}gene-hunting.{"} Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as {"}unknowns{"} by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.",
author = "Arun Rawat and Elasri, {Mohamed O.} and Gust, {Kurt A.} and Glover George and Don Pham and Scanlan, {Leona D.} and Chris Vulpe and Perkins, {Edward J.}",
year = "2012",
month = "2",
day = "3",
doi = "10.1371/journal.pone.0030370",
language = "English",
volume = "7",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "2",

}

TY - JOUR

T1 - CAPRG

T2 - Sequence assembling pipeline for next generation sequencing of non-model organisms

AU - Rawat, Arun

AU - Elasri, Mohamed O.

AU - Gust, Kurt A.

AU - George, Glover

AU - Pham, Don

AU - Scanlan, Leona D.

AU - Vulpe, Chris

AU - Perkins, Edward J.

PY - 2012/2/3

Y1 - 2012/2/3

N2 - Our goal is to introduce and describe the utility of a new pipeline "Contigs Assembly Pipeline using Reference Genome" (CAPRG), which has been developed to assemble "long sequence reads" for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for "gene-hunting." Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as "unknowns" by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.

AB - Our goal is to introduce and describe the utility of a new pipeline "Contigs Assembly Pipeline using Reference Genome" (CAPRG), which has been developed to assemble "long sequence reads" for non-model organisms by leveraging a reference genome of a closely related phylogenetic relative. To facilitate this effort, we utilized two avian transcriptomic datasets generated using ROCHE/454 technology as test cases for CAPRG assembly. We compared the results of CAPRG assembly using a reference genome with the results of existing methods that utilize de novo strategies such as VELVET, PAVE, and MIRA by employing parameter space comparisons (intra-assembling comparison). CAPRG performed as well or better than the existing assembly methods based on various benchmarks for "gene-hunting." Further, CAPRG completed the assemblies in a fraction of the time required by the existing assembly algorithms. Additional advantages of CAPRG included reduced contig inflation resulting in lower computational resources for annotation, and functional identification for contigs that may be categorized as "unknowns" by de novo methods. In addition to providing evaluation of CAPRG performance, we observed that the different assembly (inter-assembly) results could be integrated to enhance the putative gene coverage for any transcriptomics study.

UR - http://www.scopus.com/inward/record.url?scp=84856511197&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84856511197&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0030370

DO - 10.1371/journal.pone.0030370

M3 - Article

VL - 7

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 2

M1 - e30370

ER -