Mouse BAC ends quality assessment and sequence analyses

S. Zhao, S. Shatsman, B. Ayodeji, K. Geer, G. Tsegaye, M. Krol, E. Gebregeorgis, A. Shvartsbeyn, D. Russell, L. Overton, L. Jiang, G. Dimitrov, K. Tran, J. Shetty, Joel Malek, T. Feldblyum, W. C. Nierman, C. M. Fraser

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15 × clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12× genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores ≥ 20. RPCI-23 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and ∼48% of the clones have both ends with ≥ 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.

Original languageEnglish
Pages (from-to)1736-1745
Number of pages10
JournalGenome Research
Volume11
Issue number10
DOIs
Publication statusPublished - 2001
Externally publishedYes

Fingerprint

Sequence Analysis
Clone Cells
Expressed Sequence Tags
Genome
Libraries

ASJC Scopus subject areas

  • Genetics

Cite this

Zhao, S., Shatsman, S., Ayodeji, B., Geer, K., Tsegaye, G., Krol, M., ... Fraser, C. M. (2001). Mouse BAC ends quality assessment and sequence analyses. Genome Research, 11(10), 1736-1745. https://doi.org/10.1101/gr.179201

Mouse BAC ends quality assessment and sequence analyses. / Zhao, S.; Shatsman, S.; Ayodeji, B.; Geer, K.; Tsegaye, G.; Krol, M.; Gebregeorgis, E.; Shvartsbeyn, A.; Russell, D.; Overton, L.; Jiang, L.; Dimitrov, G.; Tran, K.; Shetty, J.; Malek, Joel; Feldblyum, T.; Nierman, W. C.; Fraser, C. M.

In: Genome Research, Vol. 11, No. 10, 2001, p. 1736-1745.

Research output: Contribution to journalArticle

Zhao, S, Shatsman, S, Ayodeji, B, Geer, K, Tsegaye, G, Krol, M, Gebregeorgis, E, Shvartsbeyn, A, Russell, D, Overton, L, Jiang, L, Dimitrov, G, Tran, K, Shetty, J, Malek, J, Feldblyum, T, Nierman, WC & Fraser, CM 2001, 'Mouse BAC ends quality assessment and sequence analyses', Genome Research, vol. 11, no. 10, pp. 1736-1745. https://doi.org/10.1101/gr.179201
Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M et al. Mouse BAC ends quality assessment and sequence analyses. Genome Research. 2001;11(10):1736-1745. https://doi.org/10.1101/gr.179201
Zhao, S. ; Shatsman, S. ; Ayodeji, B. ; Geer, K. ; Tsegaye, G. ; Krol, M. ; Gebregeorgis, E. ; Shvartsbeyn, A. ; Russell, D. ; Overton, L. ; Jiang, L. ; Dimitrov, G. ; Tran, K. ; Shetty, J. ; Malek, Joel ; Feldblyum, T. ; Nierman, W. C. ; Fraser, C. M. / Mouse BAC ends quality assessment and sequence analyses. In: Genome Research. 2001 ; Vol. 11, No. 10. pp. 1736-1745.
@article{fb6239cb2dfa4230ad52128ed73b0ce6,
title = "Mouse BAC ends quality assessment and sequence analyses",
abstract = "A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80{\%}, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15 × clone coverage, 7{\%} sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12× genome coverage. The average Q20 length is 406 bp and 84{\%} of the bases have phred quality scores ≥ 20. RPCI-23 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95{\%} of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and ∼48{\%} of the clones have both ends with ≥ 100 bp contiguous unique Q20 bases. About 3{\%} mBESs match ESTs and > 70{\%} of matches were conserved between the mouse and the human or the rat. Approximately 0.1{\%} mBESs contain STSs. About 0.2{\%} mBESs match human finished sequences and > 70{\%} of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.",
author = "S. Zhao and S. Shatsman and B. Ayodeji and K. Geer and G. Tsegaye and M. Krol and E. Gebregeorgis and A. Shvartsbeyn and D. Russell and L. Overton and L. Jiang and G. Dimitrov and K. Tran and J. Shetty and Joel Malek and T. Feldblyum and Nierman, {W. C.} and Fraser, {C. M.}",
year = "2001",
doi = "10.1101/gr.179201",
language = "English",
volume = "11",
pages = "1736--1745",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "10",

}

TY - JOUR

T1 - Mouse BAC ends quality assessment and sequence analyses

AU - Zhao, S.

AU - Shatsman, S.

AU - Ayodeji, B.

AU - Geer, K.

AU - Tsegaye, G.

AU - Krol, M.

AU - Gebregeorgis, E.

AU - Shvartsbeyn, A.

AU - Russell, D.

AU - Overton, L.

AU - Jiang, L.

AU - Dimitrov, G.

AU - Tran, K.

AU - Shetty, J.

AU - Malek, Joel

AU - Feldblyum, T.

AU - Nierman, W. C.

AU - Fraser, C. M.

PY - 2001

Y1 - 2001

N2 - A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15 × clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12× genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores ≥ 20. RPCI-23 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and ∼48% of the clones have both ends with ≥ 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.

AB - A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15 × clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12× genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores ≥ 20. RPCI-23 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and ∼48% of the clones have both ends with ≥ 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.

UR - http://www.scopus.com/inward/record.url?scp=0034767179&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034767179&partnerID=8YFLogxK

U2 - 10.1101/gr.179201

DO - 10.1101/gr.179201

M3 - Article

C2 - 11591651

AN - SCOPUS:0034767179

VL - 11

SP - 1736

EP - 1745

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 10

ER -