Copy number variations in the genome of the Qatari population

Khalid Adnan Mohamed A. Fakhro, Noha Yousri, Juan L. Rodriguez-Flores, Amal Robay, Michelle R. Staudt, Francisco Agosto-Perez, Jacqueline Salit, Joel Malek, Karsten Suhre, Amin Jayyousi, Mahmoud Zirie, Dora Stadler, Jason G. Mezey, Ronald Crystal

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population. Methods: CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future. Results: Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 % of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 % of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population. Conclusion: We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.

Original languageEnglish
Article number834
JournalBMC Genomics
Volume16
Issue number1
DOIs
Publication statusPublished - 22 Oct 2015

Fingerprint

Genome
Population
Single Nucleotide Polymorphism
Chromosome Disorders
Databases
Genes
Genetic Databases
Consanguinity
Type 2 Diabetes Mellitus
Nucleotides
Mutation
Datasets

Keywords

  • Copy number variation
  • Genomics
  • Genotyping
  • Mendelian disease
  • Next-generation sequencing
  • Qatar

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Copy number variations in the genome of the Qatari population. / Fakhro, Khalid Adnan Mohamed A.; Yousri, Noha; Rodriguez-Flores, Juan L.; Robay, Amal; Staudt, Michelle R.; Agosto-Perez, Francisco; Salit, Jacqueline; Malek, Joel; Suhre, Karsten; Jayyousi, Amin; Zirie, Mahmoud; Stadler, Dora; Mezey, Jason G.; Crystal, Ronald.

In: BMC Genomics, Vol. 16, No. 1, 834, 22.10.2015.

Research output: Contribution to journalArticle

Fakhro, KAMA, Yousri, N, Rodriguez-Flores, JL, Robay, A, Staudt, MR, Agosto-Perez, F, Salit, J, Malek, J, Suhre, K, Jayyousi, A, Zirie, M, Stadler, D, Mezey, JG & Crystal, R 2015, 'Copy number variations in the genome of the Qatari population', BMC Genomics, vol. 16, no. 1, 834. https://doi.org/10.1186/s12864-015-1991-5
Fakhro, Khalid Adnan Mohamed A. ; Yousri, Noha ; Rodriguez-Flores, Juan L. ; Robay, Amal ; Staudt, Michelle R. ; Agosto-Perez, Francisco ; Salit, Jacqueline ; Malek, Joel ; Suhre, Karsten ; Jayyousi, Amin ; Zirie, Mahmoud ; Stadler, Dora ; Mezey, Jason G. ; Crystal, Ronald. / Copy number variations in the genome of the Qatari population. In: BMC Genomics. 2015 ; Vol. 16, No. 1.
@article{e6e99d0bb678488fa92441a4f5403156,
title = "Copy number variations in the genome of the Qatari population",
abstract = "Background: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population. Methods: CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future. Results: Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 {\%} of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 {\%} of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population. Conclusion: We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.",
keywords = "Copy number variation, Genomics, Genotyping, Mendelian disease, Next-generation sequencing, Qatar",
author = "Fakhro, {Khalid Adnan Mohamed A.} and Noha Yousri and Rodriguez-Flores, {Juan L.} and Amal Robay and Staudt, {Michelle R.} and Francisco Agosto-Perez and Jacqueline Salit and Joel Malek and Karsten Suhre and Amin Jayyousi and Mahmoud Zirie and Dora Stadler and Mezey, {Jason G.} and Ronald Crystal",
year = "2015",
month = "10",
day = "22",
doi = "10.1186/s12864-015-1991-5",
language = "English",
volume = "16",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Copy number variations in the genome of the Qatari population

AU - Fakhro, Khalid Adnan Mohamed A.

AU - Yousri, Noha

AU - Rodriguez-Flores, Juan L.

AU - Robay, Amal

AU - Staudt, Michelle R.

AU - Agosto-Perez, Francisco

AU - Salit, Jacqueline

AU - Malek, Joel

AU - Suhre, Karsten

AU - Jayyousi, Amin

AU - Zirie, Mahmoud

AU - Stadler, Dora

AU - Mezey, Jason G.

AU - Crystal, Ronald

PY - 2015/10/22

Y1 - 2015/10/22

N2 - Background: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population. Methods: CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future. Results: Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 % of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 % of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population. Conclusion: We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.

AB - Background: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population. Methods: CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future. Results: Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 % of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 % of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population. Conclusion: We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.

KW - Copy number variation

KW - Genomics

KW - Genotyping

KW - Mendelian disease

KW - Next-generation sequencing

KW - Qatar

UR - http://www.scopus.com/inward/record.url?scp=84944741795&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944741795&partnerID=8YFLogxK

U2 - 10.1186/s12864-015-1991-5

DO - 10.1186/s12864-015-1991-5

M3 - Article

VL - 16

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 834

ER -