Accounting for nonlinear effects of gene expression identifies additional associated genes in transcriptome-wide association studies

Transcriptome-wide association studies (TWAS) integrate genome-wide association study (GWAS) data with gene expression (GE) data to identify (putative) causal genes for complex traits. There are two stages in TWAS: in Stage 1, a model is built to impute gene expression from genotypes, and in Stage 2, gene-trait association is tested using imputed gene expression. Despite many successes with TWAS, in the current practice, one only assumes a linear relationship between GE and the trait, which however may not hold, leading to loss of power. In this study, we extend the standard TWAS by considering a quadratic effect of GE, in addition to the usual linear effect. We train imputation models for both linear and quadratic gene expression levels in Stage 1, then include both the imputed linear and quadratic expression levels in Stage 2. We applied both the standard TWAS and our approach first to the ADNI gene expression data and the IGAP Alzheimer’s disease GWAS summary data, then to the GTEx (V8) gene expression data and the UK Biobank individual-level GWAS data for lipids, followed by validation with different GWAS data, suitable model checking and more robust TWAS methods. In all these applications, the new TWAS approach was able to identify additional genes associated with Alzheimer’s disease, LDL and HDL cholesterol levels, suggesting its likely power gains and thus the need to account for potentially nonlinear effects of gene expression on complex traits.