Quick Guide to
the Alzheimer’s Disease Sequencing Project

What is the ADSP?

  • Study Design
  • ADSP Website
  • Functional Genomics Consortium
    (Under Construction)
  • Machine Learning / Artificial Intelligence
    (Under Construction)
  • Phenotype Harmonization Consortium
    (Under Construction)

Where is the dataset hosted?

What data are available and what’s the expected timeline for future releases?

Current timelines for ADSP data production and release

Release 2 WES:
20,503 whole exomes from 28 cohorts.

  • Population breakdown: 4,349 African Ancestry, 13,904 Non-Hispanic White (NHW), 2,235 Hispanic, 15 Unknown/Other
  • February 2020: Raw genomes (CRAMs/gVCFs), Basic Phenotypes
  • September 2020: quality-controlled project level genotype VCF for bi-allelic autosomal variants
  • February 2021: quality-controlled project level genotype VCF for bi-allelic chrX variants
  • October 2021: quality-controlled project level genotype VCF for bi-allelic chrX PAR variants
  • Planned winter 2021: project level genotype VCF for multi-allelic variants

Release 3 WGS:
16,905 genomes from 24 cohorts.

  • Population breakdown: 3,018 African Ancestry, 10,517 Non-Hispanic White (NHW), 3,296 Hispanic, 74 Unknown/Other
  • February 2021: Raw genomes (CRAMs/gVCFs), Basic Phenotypes, Preview project level VCF
  • October 2021: quality-controlled project level VCF for bi-allelic autosomal variants; individual level structural variant calls
  • Planned December 2021: quality-controlled project level VCF for bi-allelic chrX and chrX PAR variants
  • Planned winter 2021: project level VCF for multi-allelic autosomal and chrX variants with full quality control

Release 4 WGS:
34,814 genomes from 42 cohorts planned.

  • Population breakdown: 4,740 African Ancestry, 16,158 Non-Hispanic White (NHW), 11,058 Hispanic, 2,700 Indian (South Asian), 158 Unknown/Other
  • Planned Spring 2022: Raw genomes (CRAMs/gVCFs), Basic Phenotypes, Preview project level genotype VCF
  • Planned winter 2022: project level VCF with full quality control, individual level structural variant calls

Sequence Data Releases

* A subset of these participants will have additional harmonized endophenotypes released in phases by the Phenotype Harmonization Consortium.

Sequence Data Availability by Cohort

How to access these data?

Apply for data access

Additional Instruction

  • How to use the Data Access Request Management (DARM) system for data requests: DARM PI User Guide
  • Files <5Gb can be downloaded directly through the web portal.
  • Files >5Gb must be downloaded directly from Amazon.
  • How to set up an Amazon account and download the data: Amazon Instructions

Is there a cost associated with downloading data?

NIAGADS incurs the cost for investigators to download most of the ADSP data, including joint genotype-called project level VCFs, phenotypes, and associated meta-data.

CRAMs, gVCFs, and SV VCFs can be downloaded using the Amazon Requester Pays option, which means that the requesting institution will incur the cost of downloading the data.

Options for AWS download using Requestor Pays option:

  1. AWS resource- You would not be charged if you download within the same region as our S3 bucket, US-East (N. Virginia), to another US-East (N. Virginia)
  2. Local download- an affordable transfer option is an Amazon Snowball. DSS would send the data to your S3 bucket, then you can create an AWS Snowball export. The device costs $250 to transfer 80TB of data (plus additional fees.

Can I run programs on S3 without downloading the data?

Certain tools can make use of S3 URLs to read data without having to download the file. Our CRAM files can be read by S3-aware alignment reader such as samtools. This allows users to download either a portion or all data from a file without having to save the entire file to a local drive. Although file access may be slower, there is cost savings.

What are the limitations in using the data?

You must have an approved Data Access Request through NIAGADS DSS. The data may only be used in accordance with your approved research use statement and local IRB approval.

What are the limitations in exchanging data and information from ADSP with other investigators?

Internal Investigators

Investigators can share data with other investigators within their institution if (1) all research being conducted is in accordance with the submitting investigators’ approved research use and (2) all investigators accessing the data have read and signed the University of Pennsylvania Data Transfer Agreement. The primary investigator will be responsible for the conduct of anyone accessing the data under their approved DAR.

External Investigators

Investigators can share data with collaborators outside of the PI’s institution. The collaborators must submit parallel project requests with (1) the same project title and (2) the same Research Use Statement and Cloud Use Statement, if applicable, that references the collaboration (for smaller collaborations, the name and institution of the collaborating PI(s) or for larger efforts, the consortium name). Note: for-profit and non-profit institutions may not be able to share all data across sites as some ADSP data cannot be used by for-profit institutions.

What type of phenotype data do you collect and where can I find additional phenotypes?

ADSP Minimal Dataset

Accompanying the sequencing data are a basic set of phenotypes collected from each of the submitting cohorts and harmonized into the ADSP format. There are currently 4 different data dictionaries for the samples included in the ADSP:

  1. Family Based Phenotypes
  2. Case/Control Phenotypes
  3. ADNI Phenotypes
  4. PSP/CBD Phenotypes

Phenotype Harmonization Consortium

Funded in mid-2021, the consortium will harmonize the following endophenotype domains: Cognition, Fluid Biomarkers (CSF), Amyloid PET, Structural MRI (T1), Diffusion MRI, Vascular Risk Factors, and Autopsy (neuropathology). The group plans to conduct yearly releases with the first release occurring in spring of 2022. The exact timeline is contingent on the availability of raw data and the data being of sufficient quality to allow for harmonization.

NACC

Additional phenotypes for the participants recruited from the Alzheimer’s Disease Research Centers (ADRCs) sequenced by ADSP can be requested directly from the National Alzheimer’s Coordinating Center (NACC).

ADNI

Additional phenotypes for the participants recruited by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) can be requested directly from LONI.

Individual cohorts

Contact the cohort PIs for more information about the phenotypes collected by their cohorts.

RUSH

Additional phenotypes for the participants recruited from the ROS, MAP, or MARs studies can be requested directly from the Rush Alzheimer’s Disease Center (RADC).

Can I request a biospecimen for a sequenced sample I am interested in running additional assays on? 

Yes, if the sample is stored at the National Centralized Repository for Alzheimer’s Disease and Related Dementia’s (NCRAD), you can find details and apply for access by visiting this page of the NCRAD website.

How do I cite the ADSP in my publications or presentations?

To cite NIAGADS:

“Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.”

Please cite/reference the use of ADSP data distributed through NIAGADS by including the accession NG00067.

To cite the ADSP Umbrella dataset (ng00067):

The acknowledgment statements for the full ADSP Umbrella dataset can be found in two places:

  • Within your approved Data Access Request PDF as part of the Data Use Certification agreement
  • On the dataset webpage, ng00067, navigate to the “Acknowledgment” tab

Each study included within the ADSP Umbrella dataset provides an acknowledgment statement. If you are using a subset of samples from the full dataset, locate the study of the samples used within your analysis in the “Study_DSS” column in the supplied “Sample_Manifest_DS__.txt” file. Use the study accession numbers to narrow down the acknowledgment list for your specific project needs.

How do I contact NIAGADS?

For any questions related to the ADSP dataset or how to access it, contact NIAGADS at niagads@pennmedicine.upenn.edu, and they can forward any questions to the appropriate parties.