A Rare Variant Nonparametric Linkage Method for Nuclear and Extended Pedigrees with Application to Late-Onset Alzheimer Disease via WGS Data

To analyze family-based whole-genome sequence (WGS) data for complex traits, we developed a rare variant (RV) non-parametric linkage (NPL) analysis method, which has advantages over association methods. The RV-NPL differs from the NPL in that RVs are analyzed, and allele sharing among affected relative-pairs is estimated only for minor alleles. Analyzing families can increase power because causal variants with familial aggregation usually have larger effect sizes than those underlying sporadic diseases. Differing from association analysis, for NPL only affected individuals are analyzed, which can increase power, since unaffected family members can be susceptibility variant carriers. RV-NPL is robust to population substructure and admixture, inclusion of nonpathogenic variants, as well as allelic and locus heterogeneity and can readily be applied outside of coding regions. In contrast to analyzing common variants using NPL, where loci localize to large genomic regions (e.g., >50 Mb), mapped regions are well defined for RV-NPL. Using simulation studies, we demonstrate that RV-NPL is substantially more powerful than applying traditional NPL methods to analyze RVs. The RV-NPL was applied to analyze 107 late-onset Alzheimer disease (LOAD) pedigrees of Caribbean Hispanic and European ancestry with WGS data, and statistically significant linkage (LOD ≥ 3.8) was found with RVs in PSMF1 and PTPN21 which have been shown to be involved in LOAD etiology. Additionally, nominally significant linkage was observed with RVs in ABCA7, ACE, EPHA1, and SORL1, genes that were previously reported to be associated with LOAD. RV-NPL is an ideal method to elucidate the genetic etiology of complex familial diseases.