HEFT

EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors

Chuan Gao, Nicole L. Tignor, Jacqueline Salit, Yael Strulovici-Barel, Neil R. Hackett, Ronald Crystal, Jason G. Mezey

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Motivation: Identification of expression Quantitative Trait Loci (eQTL), the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis.Methods: We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model is a combined multivariate regression and factor analysis, where the complete likelihood of the model is used to derive a ridge estimator for simultaneous factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects; it provides P-values and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of single nucleotide polymorphisms on a standard 8 core 2.6 G desktop.Results: By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and significantly outperforms all related hidden factor methods for eQTL analysis when there are eQTL with univariate and multivariate (pleiotropic) effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in the human lung for a study that included presumptive hidden factors. HEFT identified all of the cis-eQTL found by other hidden factor methods and 91 additional cis-eQTL. HEFT also identified a number of eQTLs with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer.Availability: Software is available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.Supplementary information: Supplementary data are available at Bioinformatics online.Contact:

Original languageEnglish
Pages (from-to)369-376
Number of pages8
JournalBioinformatics
Volume30
Issue number3
DOIs
Publication statusPublished - 1 Feb 2014
Externally publishedYes

Fingerprint

Quantitative Trait Loci
Factor analysis
Factor Analysis
Statistical Factor Analysis
Genes
Gene
Gene expression
Software
Pulmonary diseases
Gene Expression
Genetic Loci
Bioinformatics
Nucleotides
Polymorphism
Regression analysis
Lung
Computational Biology
Lung Diseases
Single Nucleotide Polymorphism
Lung Neoplasms

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

HEFT : EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors. / Gao, Chuan; Tignor, Nicole L.; Salit, Jacqueline; Strulovici-Barel, Yael; Hackett, Neil R.; Crystal, Ronald; Mezey, Jason G.

In: Bioinformatics, Vol. 30, No. 3, 01.02.2014, p. 369-376.

Research output: Contribution to journalArticle

Gao, Chuan ; Tignor, Nicole L. ; Salit, Jacqueline ; Strulovici-Barel, Yael ; Hackett, Neil R. ; Crystal, Ronald ; Mezey, Jason G. / HEFT : EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors. In: Bioinformatics. 2014 ; Vol. 30, No. 3. pp. 369-376.
@article{e1be34127bec4c4caee38fef533d8988,
title = "HEFT: EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors",
abstract = "Motivation: Identification of expression Quantitative Trait Loci (eQTL), the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis.Methods: We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model is a combined multivariate regression and factor analysis, where the complete likelihood of the model is used to derive a ridge estimator for simultaneous factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects; it provides P-values and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of single nucleotide polymorphisms on a standard 8 core 2.6 G desktop.Results: By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and significantly outperforms all related hidden factor methods for eQTL analysis when there are eQTL with univariate and multivariate (pleiotropic) effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in the human lung for a study that included presumptive hidden factors. HEFT identified all of the cis-eQTL found by other hidden factor methods and 91 additional cis-eQTL. HEFT also identified a number of eQTLs with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer.Availability: Software is available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.Supplementary information: Supplementary data are available at Bioinformatics online.Contact:",
author = "Chuan Gao and Tignor, {Nicole L.} and Jacqueline Salit and Yael Strulovici-Barel and Hackett, {Neil R.} and Ronald Crystal and Mezey, {Jason G.}",
year = "2014",
month = "2",
day = "1",
doi = "10.1093/bioinformatics/btt690",
language = "English",
volume = "30",
pages = "369--376",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - HEFT

T2 - EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors

AU - Gao, Chuan

AU - Tignor, Nicole L.

AU - Salit, Jacqueline

AU - Strulovici-Barel, Yael

AU - Hackett, Neil R.

AU - Crystal, Ronald

AU - Mezey, Jason G.

PY - 2014/2/1

Y1 - 2014/2/1

N2 - Motivation: Identification of expression Quantitative Trait Loci (eQTL), the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis.Methods: We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model is a combined multivariate regression and factor analysis, where the complete likelihood of the model is used to derive a ridge estimator for simultaneous factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects; it provides P-values and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of single nucleotide polymorphisms on a standard 8 core 2.6 G desktop.Results: By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and significantly outperforms all related hidden factor methods for eQTL analysis when there are eQTL with univariate and multivariate (pleiotropic) effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in the human lung for a study that included presumptive hidden factors. HEFT identified all of the cis-eQTL found by other hidden factor methods and 91 additional cis-eQTL. HEFT also identified a number of eQTLs with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer.Availability: Software is available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.Supplementary information: Supplementary data are available at Bioinformatics online.Contact:

AB - Motivation: Identification of expression Quantitative Trait Loci (eQTL), the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis.Methods: We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model is a combined multivariate regression and factor analysis, where the complete likelihood of the model is used to derive a ridge estimator for simultaneous factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects; it provides P-values and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of single nucleotide polymorphisms on a standard 8 core 2.6 G desktop.Results: By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and significantly outperforms all related hidden factor methods for eQTL analysis when there are eQTL with univariate and multivariate (pleiotropic) effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in the human lung for a study that included presumptive hidden factors. HEFT identified all of the cis-eQTL found by other hidden factor methods and 91 additional cis-eQTL. HEFT also identified a number of eQTLs with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer.Availability: Software is available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.Supplementary information: Supplementary data are available at Bioinformatics online.Contact:

UR - http://www.scopus.com/inward/record.url?scp=84893270241&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893270241&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btt690

DO - 10.1093/bioinformatics/btt690

M3 - Article

VL - 30

SP - 369

EP - 376

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 3

ER -