Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding

Kevin Judd McKernan, Heather E. Peckham, Gina L. Costa, Stephen F. McLaughlin, Yutao Fu, Eric F. Tsung, Christopher R. Clouser, Cisyla Duncan, Jeffrey K. Ichikawa, Clarence C. Lee, Zheng Zhang, Swati S. Ranade, Eileen T. Dimalanta, Fiona C. Hyland, Tanya D. Sokolsky, Lei Zhang, Andrew Sheridan, Haoning Fu, Cynthia L. Hendrickson, Bin Li & 25 others Lev Kotler, Jeremy R. Stuart, Joel Malek, Jonathan M. Manning, Alena A. Antipova, Damon S. Perez, Michael P. Moore, Kathleen C. Hayashibara, Michael R. Lyons, Robert E. Beaudoin, Brittany E. Coleman, Michael W. Laptewicz, Adam E. Sannicandro, Michael D. Rhodes, Rajesh K. Gottimukkala, Shan Yang, Vineet Bafna, Ali Bashir, Andrew MacBride, Can Alkan, Jeffrey M. Kidd, Evan E. Eichler, Martin G. Reese, Francisco M. De La Vega, Alan P. Blanchard

Research output: Contribution to journalArticle

363 Citations (Scopus)

Abstract

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ∼18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

Original languageEnglish
Pages (from-to)1527-1541
Number of pages15
JournalGenome Research
Volume19
Issue number9
DOIs
Publication statusPublished - Sep 2009

Fingerprint

High-Throughput Nucleotide Sequencing
Human Genome
Single Nucleotide Polymorphism
Ligation
Clone Cells
Genome
Genetic Databases
Gene Fusion
Haploidy
Haplotypes
Libraries
Nucleotides
Alleles
Genotype
Biopsy
Mutation
Population
Genes
Neoplasms
Surveys and Questionnaires

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

McKernan, K. J., Peckham, H. E., Costa, G. L., McLaughlin, S. F., Fu, Y., Tsung, E. F., ... Blanchard, A. P. (2009). Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Research, 19(9), 1527-1541. https://doi.org/10.1101/gr.091868.109

Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. / McKernan, Kevin Judd; Peckham, Heather E.; Costa, Gina L.; McLaughlin, Stephen F.; Fu, Yutao; Tsung, Eric F.; Clouser, Christopher R.; Duncan, Cisyla; Ichikawa, Jeffrey K.; Lee, Clarence C.; Zhang, Zheng; Ranade, Swati S.; Dimalanta, Eileen T.; Hyland, Fiona C.; Sokolsky, Tanya D.; Zhang, Lei; Sheridan, Andrew; Fu, Haoning; Hendrickson, Cynthia L.; Li, Bin; Kotler, Lev; Stuart, Jeremy R.; Malek, Joel; Manning, Jonathan M.; Antipova, Alena A.; Perez, Damon S.; Moore, Michael P.; Hayashibara, Kathleen C.; Lyons, Michael R.; Beaudoin, Robert E.; Coleman, Brittany E.; Laptewicz, Michael W.; Sannicandro, Adam E.; Rhodes, Michael D.; Gottimukkala, Rajesh K.; Yang, Shan; Bafna, Vineet; Bashir, Ali; MacBride, Andrew; Alkan, Can; Kidd, Jeffrey M.; Eichler, Evan E.; Reese, Martin G.; De La Vega, Francisco M.; Blanchard, Alan P.

In: Genome Research, Vol. 19, No. 9, 09.2009, p. 1527-1541.

Research output: Contribution to journalArticle

McKernan, KJ, Peckham, HE, Costa, GL, McLaughlin, SF, Fu, Y, Tsung, EF, Clouser, CR, Duncan, C, Ichikawa, JK, Lee, CC, Zhang, Z, Ranade, SS, Dimalanta, ET, Hyland, FC, Sokolsky, TD, Zhang, L, Sheridan, A, Fu, H, Hendrickson, CL, Li, B, Kotler, L, Stuart, JR, Malek, J, Manning, JM, Antipova, AA, Perez, DS, Moore, MP, Hayashibara, KC, Lyons, MR, Beaudoin, RE, Coleman, BE, Laptewicz, MW, Sannicandro, AE, Rhodes, MD, Gottimukkala, RK, Yang, S, Bafna, V, Bashir, A, MacBride, A, Alkan, C, Kidd, JM, Eichler, EE, Reese, MG, De La Vega, FM & Blanchard, AP 2009, 'Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding', Genome Research, vol. 19, no. 9, pp. 1527-1541. https://doi.org/10.1101/gr.091868.109
McKernan, Kevin Judd ; Peckham, Heather E. ; Costa, Gina L. ; McLaughlin, Stephen F. ; Fu, Yutao ; Tsung, Eric F. ; Clouser, Christopher R. ; Duncan, Cisyla ; Ichikawa, Jeffrey K. ; Lee, Clarence C. ; Zhang, Zheng ; Ranade, Swati S. ; Dimalanta, Eileen T. ; Hyland, Fiona C. ; Sokolsky, Tanya D. ; Zhang, Lei ; Sheridan, Andrew ; Fu, Haoning ; Hendrickson, Cynthia L. ; Li, Bin ; Kotler, Lev ; Stuart, Jeremy R. ; Malek, Joel ; Manning, Jonathan M. ; Antipova, Alena A. ; Perez, Damon S. ; Moore, Michael P. ; Hayashibara, Kathleen C. ; Lyons, Michael R. ; Beaudoin, Robert E. ; Coleman, Brittany E. ; Laptewicz, Michael W. ; Sannicandro, Adam E. ; Rhodes, Michael D. ; Gottimukkala, Rajesh K. ; Yang, Shan ; Bafna, Vineet ; Bashir, Ali ; MacBride, Andrew ; Alkan, Can ; Kidd, Jeffrey M. ; Eichler, Evan E. ; Reese, Martin G. ; De La Vega, Francisco M. ; Blanchard, Alan P. / Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. In: Genome Research. 2009 ; Vol. 19, No. 9. pp. 1527-1541.
@article{475484e9ee874c1ca48fae61171a66d2,
title = "Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding",
abstract = "We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9{\%}, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ∼18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98{\%} of the reference genome is covered with at least one uniquely placed read, and 99.65{\%} is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19{\%} of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.",
author = "McKernan, {Kevin Judd} and Peckham, {Heather E.} and Costa, {Gina L.} and McLaughlin, {Stephen F.} and Yutao Fu and Tsung, {Eric F.} and Clouser, {Christopher R.} and Cisyla Duncan and Ichikawa, {Jeffrey K.} and Lee, {Clarence C.} and Zheng Zhang and Ranade, {Swati S.} and Dimalanta, {Eileen T.} and Hyland, {Fiona C.} and Sokolsky, {Tanya D.} and Lei Zhang and Andrew Sheridan and Haoning Fu and Hendrickson, {Cynthia L.} and Bin Li and Lev Kotler and Stuart, {Jeremy R.} and Joel Malek and Manning, {Jonathan M.} and Antipova, {Alena A.} and Perez, {Damon S.} and Moore, {Michael P.} and Hayashibara, {Kathleen C.} and Lyons, {Michael R.} and Beaudoin, {Robert E.} and Coleman, {Brittany E.} and Laptewicz, {Michael W.} and Sannicandro, {Adam E.} and Rhodes, {Michael D.} and Gottimukkala, {Rajesh K.} and Shan Yang and Vineet Bafna and Ali Bashir and Andrew MacBride and Can Alkan and Kidd, {Jeffrey M.} and Eichler, {Evan E.} and Reese, {Martin G.} and {De La Vega}, {Francisco M.} and Blanchard, {Alan P.}",
year = "2009",
month = "9",
doi = "10.1101/gr.091868.109",
language = "English",
volume = "19",
pages = "1527--1541",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "9",

}

TY - JOUR

T1 - Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding

AU - McKernan, Kevin Judd

AU - Peckham, Heather E.

AU - Costa, Gina L.

AU - McLaughlin, Stephen F.

AU - Fu, Yutao

AU - Tsung, Eric F.

AU - Clouser, Christopher R.

AU - Duncan, Cisyla

AU - Ichikawa, Jeffrey K.

AU - Lee, Clarence C.

AU - Zhang, Zheng

AU - Ranade, Swati S.

AU - Dimalanta, Eileen T.

AU - Hyland, Fiona C.

AU - Sokolsky, Tanya D.

AU - Zhang, Lei

AU - Sheridan, Andrew

AU - Fu, Haoning

AU - Hendrickson, Cynthia L.

AU - Li, Bin

AU - Kotler, Lev

AU - Stuart, Jeremy R.

AU - Malek, Joel

AU - Manning, Jonathan M.

AU - Antipova, Alena A.

AU - Perez, Damon S.

AU - Moore, Michael P.

AU - Hayashibara, Kathleen C.

AU - Lyons, Michael R.

AU - Beaudoin, Robert E.

AU - Coleman, Brittany E.

AU - Laptewicz, Michael W.

AU - Sannicandro, Adam E.

AU - Rhodes, Michael D.

AU - Gottimukkala, Rajesh K.

AU - Yang, Shan

AU - Bafna, Vineet

AU - Bashir, Ali

AU - MacBride, Andrew

AU - Alkan, Can

AU - Kidd, Jeffrey M.

AU - Eichler, Evan E.

AU - Reese, Martin G.

AU - De La Vega, Francisco M.

AU - Blanchard, Alan P.

PY - 2009/9

Y1 - 2009/9

N2 - We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ∼18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

AB - We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ∼18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

UR - http://www.scopus.com/inward/record.url?scp=69749090013&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69749090013&partnerID=8YFLogxK

U2 - 10.1101/gr.091868.109

DO - 10.1101/gr.091868.109

M3 - Article

VL - 19

SP - 1527

EP - 1541

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 9

ER -