Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes

Mohamad Saad, Ellen M. Wijsman

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.

Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalGenetic Epidemiology
Volume38
Issue number1
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes

Fingerprint

Pedigree
Genotype
Rare Diseases
Costs and Cost Analysis

Keywords

  • Burden test
  • Inheritance vectors
  • Kernel statistic
  • MCMC
  • Mixed linear model
  • Sequence data

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. / Saad, Mohamad; Wijsman, Ellen M.

In: Genetic Epidemiology, Vol. 38, No. 1, 01.01.2014, p. 1-9.

Research output: Contribution to journalArticle

@article{806102e5d00d4163b73b365fa80ca98f,
title = "Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes",
abstract = "Recently, the {"}Common Disease-Multiple Rare Variants{"} hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.",
keywords = "Burden test, Inheritance vectors, Kernel statistic, MCMC, Mixed linear model, Sequence data",
author = "Mohamad Saad and Wijsman, {Ellen M.}",
year = "2014",
month = "1",
day = "1",
doi = "10.1002/gepi.21776",
language = "English",
volume = "38",
pages = "1--9",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "1",

}

TY - JOUR

T1 - Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes

AU - Saad, Mohamad

AU - Wijsman, Ellen M.

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.

AB - Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.

KW - Burden test

KW - Inheritance vectors

KW - Kernel statistic

KW - MCMC

KW - Mixed linear model

KW - Sequence data

UR - http://www.scopus.com/inward/record.url?scp=84890160000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890160000&partnerID=8YFLogxK

U2 - 10.1002/gepi.21776

DO - 10.1002/gepi.21776

M3 - Article

VL - 38

SP - 1

EP - 9

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 1

ER -