Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees

Ehsan Ullah, RaghvenPhDa Mall, Mostafa M. Abbas, Khalid Kunji, Alejandro Q. Nato, Halima Bensmail, Ellen M. Wijsman, Mohamad Saad

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson’s correlation R2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R2 criterion; and (5) R2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.

Original languageEnglish
Pages (from-to)125-134
Number of pages10
JournalGenome Research
Volume29
Issue number1
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Pedigree
Genotype
Population
Neurofibromin 2
Genome-Wide Association Study
Gene Frequency
Patient Selection
Guidelines

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. / Ullah, Ehsan; Mall, RaghvenPhDa; Abbas, Mostafa M.; Kunji, Khalid; Nato, Alejandro Q.; Bensmail, Halima; Wijsman, Ellen M.; Saad, Mohamad.

In: Genome Research, Vol. 29, No. 1, 01.01.2019, p. 125-134.

Research output: Contribution to journalArticle

Ullah, Ehsan ; Mall, RaghvenPhDa ; Abbas, Mostafa M. ; Kunji, Khalid ; Nato, Alejandro Q. ; Bensmail, Halima ; Wijsman, Ellen M. ; Saad, Mohamad. / Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees. In: Genome Research. 2019 ; Vol. 29, No. 1. pp. 125-134.
@article{9f930433334c4e14ad48bf9e158ec558,
title = "Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees",
abstract = "Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson’s correlation R2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R2 criterion; and (5) R2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.",
author = "Ehsan Ullah and RaghvenPhDa Mall and Abbas, {Mostafa M.} and Khalid Kunji and Nato, {Alejandro Q.} and Halima Bensmail and Wijsman, {Ellen M.} and Mohamad Saad",
year = "2019",
month = "1",
day = "1",
doi = "10.1101/gr.236315.118",
language = "English",
volume = "29",
pages = "125--134",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "1",

}

TY - JOUR

T1 - Comparison and assessment of family- and population-based genotype imputation methods in large pedigrees

AU - Ullah, Ehsan

AU - Mall, RaghvenPhDa

AU - Abbas, Mostafa M.

AU - Kunji, Khalid

AU - Nato, Alejandro Q.

AU - Bensmail, Halima

AU - Wijsman, Ellen M.

AU - Saad, Mohamad

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson’s correlation R2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R2 criterion; and (5) R2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.

AB - Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson’s correlation R2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R2 criterion; and (5) R2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.

UR - http://www.scopus.com/inward/record.url?scp=85059497856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059497856&partnerID=8YFLogxK

U2 - 10.1101/gr.236315.118

DO - 10.1101/gr.236315.118

M3 - Article

VL - 29

SP - 125

EP - 134

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 1

ER -