NIMH Data Archive - Data

Genomics

Neuroimaging

Phenotype¹

New Trial
Clinical Trial

¹ Numbers reported are subjects by age

New Project
Grant/Project Number

Format should be in the following format: Activity Code, Institute Abbreviation, and Serial Number. Grant Type, Support Year, and Suffix should be excluded. For example, grant 1R01MH123456-01A1 should be entered R01MH123456

Collection - Use Existing Experiment

To associate an experiment to the current collection, just select an axperiment from the table below then click the associate experiment button to persist your changes (saving the collection is not required). Note that once an experiment has been associated to two or more collections, the experiment will not longer be editable.

The table search feature is case insensitive and targets the experiment id, experiment name and experiment type columns. The experiment id is searched only when the search term entered is a number, and filtered using a startsWith comparison. When the search term is not numeric the experiment name is used to filter the results.

Select	Experiment Id	Experiment Name	Experiment Type	Created On

24	HI-NGS_R1	Omics	02/16/2011
475	MB1-10 (CHOP)	Omics	06/07/2016
490	Discovery and CRISPR validation of genetic factors associated with antipsychotic-induced weight gain and cardiometabolic risk	Omics	07/07/2016
501	PharmacoBOLD Resting State	fMRI	07/27/2016
506	PVPREF	Omics	08/05/2016
509	ABC-CT Resting v2	EEG	08/18/2016
13	Comparison of FI expression in Autistic and Neurotypical Homo Sapiens	Omics	12/28/2010
18	AGRE/Broad Affymetrix 5.0 Genotype Experiment	Omics	01/06/2011
22	Stitching PCR Sequencing	Omics	02/14/2011
26	ASD_Methylation	Omics	03/01/2011
29	Microarray family 03 (father, mother, sibling)	Omics	03/24/2011
37	Standard paired-end sequencing of BCRs	Omics	04/19/2011
38	Illumina Mate-Pair BCR sequencing	Omics	04/19/2011
39	Custom Jumping Libraries	Omics	04/19/2011
40	Custom CapBP	Omics	04/19/2011
41	Immunofluorescence	Omics	05/11/2011
43	Autism brain sample genotyping, Illumina	Omics	05/16/2011
47	ARRA Autism Sequencing Collaboration at Baylor. SOLiD 4 System	Omics	08/01/2011
53	AGRE Omni1-quad	Omics	10/11/2011
59	AGP genotyping	Omics	04/03/2012
60	Ultradeep 454 sequencing of synaptic genes from postmortem cerebella of individuals with ASD and neurotypical controls	Omics	06/23/2012
63	Microemulsion PCR and Targeted Resequencing for Variant Detection in ASD	Omics	07/20/2012
76	Whole Genome Sequencing in Autism Families	Omics	01/03/2013
519	Resting	fMRI	11/08/2016
90	Genotyped IAN Samples	Omics	07/09/2013
91	NJLAGS Axiom Genotyping Array	Omics	07/16/2013
93	AGP genotyping (CNV)	Omics	09/06/2013
106	Longitudinal Sleep Study. H20 200. Channel set 2	EEG	11/07/2013
107	Longitudinal Sleep Study. H20 200. Channel set 3	EEG	11/07/2013
108	Longitudinal Sleep Study. AURA 200	EEG	11/07/2013
105	Longitudinal Sleep Study. H20 200. Channel set 1	EEG	11/07/2013
109	Longitudinal Sleep Study. AURA 400	EEG	11/07/2013
116	Gene Expression Analysis WG-6	Omics	01/07/2014
131	Jeste Lab UCLA ACEii: Charlie Brown and Sesame Street - Project 1	Eye Tracking	02/27/2014
132	Jeste Lab UCLA ACEii: Animacy - Project 1	Eye Tracking	02/27/2014
133	Jeste Lab UCLA ACEii: Mom Stranger - Project 2	Eye Tracking	02/27/2014
134	Jeste Lab UCLA ACEii: Face Emotion - Project 3	Eye Tracking	02/27/2014
145	AGRE/FMR1_Illumina.JHU	Omics	04/14/2014
146	AGRE/MECP2_Sanger.JHU	Omics	04/14/2014
147	AGRE/MECP2_Junior.JHU	Omics	04/14/2014
151	Candidate Gene Identification in familial Autism	Omics	06/09/2014
152	NJLAGS Whole Genome Sequencing	Omics	07/01/2014
154	Math Autism Study - Vinod Menon	fMRI	07/15/2014
155	Resting	fMRI	07/25/2014
156	Speech	fMRI	07/25/2014
159	Emotion	fMRI	07/25/2014
160	syllable contrast	EEG	07/29/2014
167	School-age naturalistic stimuli	Eye Tracking	09/19/2014
44	AGRE/Broad Affymetrix 5.0 Genotype Experiment	Omics	06/27/2011
45	Exome Sequencing of 20 Sporadic Cases of Autism Spectrum Disorder	Omics	07/15/2011

Collection - Add Experiment

Add Supporting Documentation

Funding Source:
URL:

To add an existing Data Structure, enter its title in the search bar. If you need to request changes, select the indicator "No, it requires changes to meet research needs" after selecting the Structure, and upload the file with the request changes specific to the selected Data Structure. Your file should follow the Request Changes Procedure. If the Data Structure does not exist, select "Request New Data Structure" and upload the appropriate zip file.

Use/Modify Existing Data Structure

Request New Data Structure

Targeted Enrollment:

Initial Submission Date:

Initial Share Date:

Data Structure Search:

Data Structures:

Submit

Request Submission Exemption

Not Eligible

The Data Expected list for this Collection shows some raw data as missing. Contact the NDA Help Desk with any questions.

Please confirm that you will not be enrolling any more subjects and that all raw data has been collected and submitted.

Collection Updated

Your Collection is now in Data Analysis phase and exempt from biannual submissions. Analyzed data is still expected prior to publication or no later than the project end date.

[CMS] Error

[CMS]

Unable to change collection phase where targeted enrollment is less than 90%

You have requested to move the sharing dates for the following assessments:

Data Expected Item	Original Sharing Date	New Sharing Date

Please provide a reason for this change, which will be sent to the Program Officers listed within this collection:

Explanation must be between 20 and 200 characters in length.

Please press Save or Cancel

SSC total recall project #2042

General
Experiments (0)
Shared Data
Publications (0)
Data Expected (1)
Associated Studies (15)

Collection Title	Collection Investigators	Collection Description
Collection Title:	SSC total recall project
Collection Investigators:	Eichler, Evan
Collection Description:	This collection consists of sequencing and variation data resulting from the reanalysis of Whole Exome Sequences from 9047 individual subjects belonging to the Simons Simplex Collection (SSC). Original data were contributed by a collaboration between NDAR Collections 1878 (Eichler Lab, University of Washington), 1936 (Wigler Lab, Cold Spring Harbor Laboratories), and 1985 (State Lab, UCSF). Reanalysis of this data was done by members of the Eichler Lab, sequences were realigned to a common reference genome (human_g1k_v37) and analyzed for possible genomic variants (SNVs, InDels, and CNVs). Details on the analysis/methods can be found in the following individual NDAR Studies: 1)realigned BAM files - NDAR Study 334 (http://ndar.nih.gov/study.html?id=334); 2)unfiltered SNV/InDel variant calls made using GATK with and without annotations - NDAR Study 348 (http://ndar.nih.gov/study.html?id=348); 3)unfiltered SNV/InDel variant calls made using FreeBayes with and without annotations - NDAR Study 349 (http://ndar.nih.gov/study.html?id=349); 4)CNV variant calls made using XHMM and CoNIFER - NDAR Study 361 (http://ndar.nih.gov/study.html?id=361).
Data Repository:	NIMH Data Archive
Permission Group:
Collection Creation Date:	08/22/2013
Collection Phase:	Funding Completed
Collection Sub-Phase:	Close Out
Blinded Clinical Trial:	No
Subjects Shared:	9,047
Collection DOI:	10.15154/wtpt-qn32

{"values":[["Next Generation Sequencing: sequencing",10363]]}

{"values":[]}

{"values":[["Autism Spectrum Mildly Affected",126],["Autism Spectrum Severely Affected",3793],["Not Defined",1],["Parental Control",7613],["Sibling Control",4186],["Neurological Control",17],["Typical Control",8],["Autism Spectrum Affected",176]]}

Loading Chart...

Funding Sources:

Funding Source Name	Funding Source URL
NIH - Contract	None

Supporting Documentation:

Grant Information:

Clinical Trials:

helpcenter.collection.general-tab

Collection - General Tab

Fields available for edit on the top portion of the page include:

Collection Title
Investigators
Collection Description
Collection Phase
Funding Source
Clinical Trials

Collection Phase: The current status of a research project submitting data to an NDA Collection, based on the timing of the award and/or the data that have been submitted.

Pre-Enrollment: The default entry made when the NDA Collection is created.
Enrolling: Data have been submitted to the NDA Collection or the NDA Data Expected initial submission date has been reached for at least one data structure category in the NDA Collection.
Data Analysis: Subject level data collection for the research project is completed and has been submitted to the NDA Collection. The NDA Collection owner or the NDA Help Desk may set this phase when they’ve confirmed data submission is complete and submitted subject counts match at least 90% of the target enrollment numbers in the NDA Data Expected. Data submission reminders will be turned off for the NDA Collection.
Funding Completed: The NIH grant award (or awards) associated with the NDA Collection has reached its end date. NDA Collections in Funding Completed phase are assigned a subphase to indicate the status of data submission.
- The Data Expected Subphase indicates that NDA expects more data will be submitted
- The Closeout Subphase indicates the data submission is complete.
- The Sharing Not Met Subphase indicates that data submission was not completed as expected.

Blinded Clinical Trial Status:

This status is set by a Collection Owner and indicates the research project is a double blinded clinical trial. When selected, the public view of Data Expected will show the Data Expected items and the Submission Dates, but the targeted enrollment and subjects submitted counts will not be displayed.
Targeted enrollment and subjects submitted counts are visible only to NDA Administrators and to the NDA Collection or as the NDA Collection Owner.
When an NDA Collection that is flagged Blinded Clinical Trial reaches the maximum data sharing date for that Data Repository (see https://nda.nih.gov/nda/sharing-regimen.html), the embargo on Data Expected information is released.

Funding Source

The organization(s) responsible for providing the funding is listed here.

Supporting Documentation

Users with Submission privileges, as well as Collection Owners, Program Officers, and those with Administrator privileges, may upload and attach supporting documentation. By default, supporting documentation is shared to the general public, however, the option is also available to limit this information to qualified researchers only.

Grant Information

Identifiable details are displayed about the Project of which the Collection was derived from. You may click in the Project Number to view a full report of the Project captured by the NIH.

Clinical Trials

Any data that is collected to support or further the research of clinical studies will be available here. Collection Owners and those with Administrator privileges may add new clinical trials.

Frequently Asked Questions

How does the NIMH Data Archive (NDA) determine which Permission Group data are submitted into?

During Collection creation, NDA staff determine the appropriate Permission Group based on the type of data to be submitted, the type of access that will be available to data access users, and the information provided by the Program Officer during grant award.
How do I know when a NDA Collection has been created?

When a Collection is created by NDA staff, an email notification will automatically be sent to the PI(s) of the grant(s) associated with the Collection to notify them.
Is a single grant number ever associated with more than one Collection?

The NDA system does not allow for a single grant to be associated with more than one Collection; therefore, a single grant will not be listed in the Grant Information section of a Collection for more than one Collection.
Why is there sometimes more than one grant included in a Collection?

In general, each Collection is associated with only one grant; however, multiple grants may be associated if the grant has multiple competing segments for the same grant number or if multiple different grants are all working on the same project and it makes sense to hold the data in one Collection (e.g., Cooperative Agreements).

Glossary

Administrator Privilege

A privilege provided to a user associated with an NDA Collection or NDA Study whereby that user can perform a full range of actions including providing privileges to other users.
Collection Owner

Generally, the Collection Owner is the contact PI listed on a grant. Only one NDA user is listed as the Collection owner. Most automated emails are primarily sent to the Collection Owner.
Collection Phase
The Collection Phase provides information on data submission as opposed to grant/project completion so while the Collection phase and grant/project phase may be closely related they are often different. Collection users with Administrative Privileges are encouraged to edit the Collection Phase. The Program Officer as listed in eRA (for NIH funded grants) may also edit this field. Changes must be saved by clicking the Save button at the bottom of the page. This field is sortable alphabetically in ascending or descending order. Collection Phase options include:
- Pre-Enrollment: A grant/project has started, but has not yet enrolled subjects.
- Enrolling: A grant/project has begun enrolling subjects. Data submission is likely ongoing at this point.
- Data Analysis: A grant/project has completed enrolling subjects and has completed all data submissions.
- Funding Completed: A grant/project has reached the project end date.
Collection Title

An editable field with the title of the Collection, which is often the title of the grant associated with the Collection.
Grant

Provides the grant number(s) for the grant(s) associated with the Collection. The field is a hyperlink so clicking on the Grant number will direct the user to the grant information in the NIH Research Portfolio Online Reporting Tools (RePORT) page.
Supporting Documentation

Various documents and materials to enable efficient use of the data by investigators unfamiliar with the project and may include the research protocol, questionnaires, and study manuals.
NIH Research Initiative

NDA Collections may be organized by scientific similarity into NIH Research Initiatives, to facilitate query tool user experience. NIH Research Initiatives map to one or multiple Funding Opportunity Announcements.
Permission Group

Access to shared record-level data in NDA is provisioned at the level of a Permission Group. NDA Permission Groups consist of one or multiple NDA Collections that contain data with the same subject consents.
Planned Enrollment

Number of human subject participants to be enrolled in an NIH-funded clinical research study. The data is provided in competing applications and annual progress reports.
Actual Enrollment

Number of human subjects enrolled in an NIH-funded clinical research study. The data is provided in annual progress reports.
NDA Collection

A virtual container and organization structure for data and associated documentation from one grant or one large project/consortium. It contains tools for tracking data submission and allows investigators to define a wide array of other elements that provide context for the data, including all general information regarding the data and source project, experimental parameters used to collect any event-based data contained in the Collection, methods, and other supporting documentation. They also allow investigators to link underlying data to an NDA Study, defining populations and subpopulations specific to research aims.
Data Use Limitations

Data Use Limitations (DULs) describe the appropriate secondary use of a dataset and are based on the original informed consent of a research participant. NDA only accepts consent-based data use limitations defined by the NIH Office of Science Policy.
Total Subjects Shared

The total number of unique subjects for whom data have been shared and are available for users with permission to access data.

Contact NDA Help Desk

ID	Name	Created Date	Status	Type
No records found.

helpcenter.collection.experiments-tab

Collection - Experiments

The number of Experiments included is displayed in parentheses next to the tab name. You may download all experiments associated with the Collection via the Download button. You may view individual experiments by clicking the Experiment Name and add them to the Filter Cart via the Add to Cart button.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may create or edit an Experiment.

Please note: The creation of an NDA Experiment does not necessarily mean that data collected, according to the defined Experiment, has been submitted or shared.

Frequently Asked Questions

Can an Experiment be associated with more than one Collection?
Yes -see the “Copy” button in the bottom left when viewing an experiment. There are two actions that can be performed via this button:
1. Copy the experiment with intent for modifications.
2. Associate the experiment to the collection. No modifications can be made to the experiment.

Glossary

Experiment Status

An Experiment must be Approved before data using the associated Experiment_ID may be uploaded.
Experiment ID

The ID number automatically generated by NDA which must be included in the appropriate file when uploading data to link the Experiment Definition to the subject record.

Contact NDA Help Desk

Shared Data:

Title	Type	Number of Subjects
Genomics Sample	Genomics	9047
Genomics Subject	Genomics	9047

helpcenter.collection.shared-data-tab

Collection - Shared Data

This tab provides a quick overview of the Data Structure title, Data Type, and Number of Subjects that are currently Shared for the Collection. The information presented in this tab is automatically generated by NDA and cannot be edited. If no information is visible on this tab, this would indicate the Collection does not have shared data or the data is private.

The shared data is available to other researchers who have permission to access data in the Collection's designated Permission Group(s). Use the Download button to get all shared data from the Collection to the Filter Cart.

Frequently Asked Questions

How will I know if another researcher uses data that I shared through the NIMH Data Archive (NDA)?

To see what data your project have submitted are being used by a study, simply go the Associated Studies tab of your collection. Alternatively, you may review an NDA Study Attribution Report available on the General tab.
Can I get a supplement to share data from a completed research project?

Often it becomes more difficult to organize and format data electronically after the project has been completed and the information needed to create a GUID may not be available; however, you may still contact a program staff member at the appropriate funding institution for more information.
Can I get a supplement to share data from a research project that is still ongoing?

Unlike completed projects where researchers may not have the information needed to create a GUID and/or where the effort needed to organize and format data becomes prohibitive, ongoing projects have more of an opportunity to overcome these challenges. Please contact a program staff member at the appropriate funding institution for more information.

Glossary

Data Structure

A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
Data Type

A grouping of data by similar characteristics such as Clinical Assessments, Omics, or Neurosignal data.
Shared

The term 'Shared' generally means available to others; however, there are some slightly different meanings based on what is Shared. A Shared NDA Study is viewable and searchable publicly regardless of the user's role or whether the user has an NDA account. A Shared NDA Study does not necessarily mean that data used in the NDA Study have been shared as this is independently determined. Data are shared according the schedule defined in a Collection's Data Expected Tab and/or in accordance with data sharing expectations in the NDA Data Sharing Terms and Conditions. Additionally, Supporting Documentation uploaded to a Collection may be shared independent of whether data are shared.

Contact NDA Help Desk

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Publications

Publications relevant to NDA data are listed below. Most displayed publications have been associated with the grant within Pubmed. Use the "+ New Publication" button to add new publications. Publications relevant/not relevant to data expected are categorized. Relevant publications are then linked to the underlying data by selecting the Create Study link. Study provides the ability to define cohorts, assign subjects, define outcome measures and lists the study type, data analysis and results. Analyzed data and results are expected in this way.

PubMed ID	Study	Title	Journal	Authors	Date	Status
No records found.

helpcenter.collection.publications-tab

Collection - Publications

The number of Publications is displayed in parentheses next to the tab name. Clicking on any of the Publication Titles will open the Publication in a new internet browsing tab.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may mark a publication as either Relevant or Not Relevant in the Status column.

Frequently Asked Questions

How can I determine if a publication is relevant?

Publications are considered relevant to a collection when the data shared is directly related to the project or collection.
Where does the NDA get the publications?

PubMed, an online library containing journals, articles, and medical research. Sponsored by NiH and National Library of Medicine (NLM).

Glossary

Create Study

A link to the Create an NDA Study page that can be clicked to start creating an NDA Study with information such as the title, journal and authors automatically populated.
Not Determined Publication

Indicates that the publication has not yet been reviewed and/or marked as Relevant or Not Relevant so it has not been determined whether an NDA Study is expected.
Not Relevant Publication

A publication that is not based on data related to the aims of the grant/project associated with the Collection or not based on any data such as a review article and, therefore, an NDA Study is not expected to be created.
PubMed

PubMed provides citation information for biomedical and life sciences publications and is managed by the U.S. National Institutes of Health's National Library of Medicine.
PubMed ID

The PUBMed ID is the unique ID number for the publication as recorded in the PubMed database.
Relevant Publication

A publication that is based on data related to the aims of the grant/project associated with the Collection and, therefore, an NDA Study is expected to be created.

Contact NDA Help Desk

Data Expected List: Mandatory Data Structures

These data structures are mandatory for your NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

For NIMH HIV-related research that involves human research participants: Select the dictionary or dictionaries most appropriate for your research. If your research does not require all three data dictionaries, just ignore the ones you do not need. There is no need to delete extra data dictionaries from your NDA Collection. You can adjust the Targeted Enrollment column in the Data Expected tab to “0” for those unnecessary data dictionaries. At least one of the three data dictionaries must have a non-zero value.

Data Expected	Targeted Enrollment	Initial Submission	Subjects Submitted	Initial Share	Subjects Shared	Status
Genomics/omics	10,060	05/13/2015	9,047	05/13/2015	9,047	Approved

To create your project's Data Expected list, use the "+New Data Expected" to add or request existing structures and to request new Data Structures that are not in the NDA Data Dictionary.

If the Structure you need already exists, locate it and specify your dates and enrollment when adding it to your Data Expected list. If you require changes to the Structure you need, select the indicator stating "No, it requires changes to meet research needs," and upload a file containing your requested changes.

If the structure you need is not yet defined in the Data Dictionary, you can select "Upload Definition" and attach the necessary materials to request its creation.

When selecting the expected dates for your data, make sure to follow the standard Data Sharing Regimen and choose dates within the date ranges that correspond to your project start and end dates.

Please visit the Completing Your Data Expected Tutorial for more information.

Data Expected List: Data Structures per Research Aims

These data structures are specific to your research aims and should list all data structures in which data will be collected and submitted for this NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

Data Expected	Targeted Enrollment	Initial Submission	Subjects Submitted	Initial Share	Subjects Shared	Status
No Data Expected

Structure not yet defined

No Status history for this Data Expected has been recorded yet

helpcenter.collection.data-expected-tab

Collection - Data Expected

The Data Expected tab displays the list of all data that NDA expects to receive in association with the Collection as defined by the contributing researcher, as well as the dates for the expected initial upload of the data, and when it is first expected to be shared, or with the research community. Above the primary table of Data Expected, any publications determined to be relevant to the data within the Collection are also displayed - members of the contributing research group can use these to define NDA Studies, connecting those papers to underlying data in NDA.

The tab is used both as a reference for those accessing shared data, providing information on what is expected and when it will be shared, and as the primary tracking mechanism for contributing projects. It is used by both contributing primary researchers, secondary researchers, and NIH Program and Grants Management staff.

Researchers who are starting their project need to update their Data Expected list to include all the Data Structures they are collecting under their grant and set their initial submission and sharing schedule according to the NDA Data Sharing Regimen.

To add existing Data Structures from the Data Dictionary, to request new Data Structure that are not in the Dictionary, or to request changes to existing Data Structures, click "+New Data Expected".

For step-by-step instructions on how to add existing Data Structures, request changes to an existing Structure, or request a new Data Structure, please visit the Completing Your Data Expected Tutorial.

If you are a contributing researcher creating this list for the first time, or making changes to the list as your project progress, please note the following:

Although items you add to the list and changes you make are displayed, they are not committed to the system until you Save the entire page using the "Save" button at the bottom of your screen. Please Save after every change to ensure none of your work is lost.
If you attempt to add a new structure, the title you provide must be unique - if another structure exists with the same name your change will fail.
Adding a new structure to this list is the only way to request the creation of a new Data Dictionary definition.

Frequently Asked Questions

What is an NDA Data Structure?

An NDA Data Structure is comprised of multiple Data Elements to make up an electronic definition of an assessment, measure, questionnaire, etc will have a corresponding Data Structure.
What is the NDA Data Dictionary?

The NDA Data Dictionary is comprised of electronic definitions known as Data Structures.

Glossary

Analyzed Data

Data specific to the primary aims of the research being conducted (e.g. outcome measures, other dependent variables, observations, laboratory results, analyzed images, volumetric data, etc.) including processed images.
Data Item

Items listed on the Data Expected list in the Collection which may be an individual and discrete Data Structure, Data Structure Category, or Data Structure Group.
Data Structure

A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
Data Structure Category

An NDA term describing the affiliation of a Data Structure to a Category, which may be disease/disorder or diagnosis related (Depression, ADHD, Psychosis), specific to data type (MRI, eye tracking, omics), or type of data (physical exam, IQ).
Data Structure Group

A Data Item listed on the Data Expected tab of a Collection that indicates a group of Data Structures (e.g., ADOS or SCID) for which data may be submitted instead of a specific Data Structure identified by version, module, edition, etc. For example, the ADOS Data Structure Category includes every ADOS Data Structure such as ADOS Module 1, ADOS Module 2, ADOS Module 1 - 2nd Edition, etc. The SCID Data Structure Group includes every SCID Data Structure such as SCID Mania, SCID V Mania, SCID PTSD, SCID-V Diagnosis, and more.
Evaluated Data

A new Data Structure category, Evaluated Data is analyzed data resulting from the use of computational pipelines in the Cloud and can be uploaded directly back to a miNDAR database. Evaluated Data is expected to be listed as a Data Item in the Collection's Data Expected Tab.
Imaging Data

Imaging+ is an NDA term which encompasses all imaging related data including, but not limited to, images (DTI, MRI, PET, Structural, Spectroscopy, etc.) as well as neurosignal data (EEG, fMRI, MEG, EGG, eye tracking, etc.) and Evaluated Data.
Initial Share Date

Initial Submission and Initial Share dates should be populated according to the NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data will be shared with authorized users upon publication (via an NDA Study) or 1-2 years after the grant end date specified on the first Notice of Award, as defined in the applicable Data Sharing Terms and Conditions.
Initial Submission Date

Initial Submission and Initial Share dates should be populated according to these NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data for all subjects is not expected on the Initial Submission Date and modifications may be made as necessary based on the project's conduct.
Research Subject and Pedigree

An NDA created Data Structure used to convey basic information about the subject such as demographics, pedigree (links family GUIDs), diagnosis/phenotype, and sample location that are critical to allow for easier querying of shared data.
Submission Cycle

The NDA has two Submission Cycles per year - January 15 and July 15.
Submission Exemption

An interface to notify NDA that data may not be submitted during the upcoming/current submission cycle.

Contact NDA Help Desk

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Associated Studies

Studies that have been defined using data from a Collection are important criteria to determine the value of data shared. The number of subjects column displays the counts from this Collection that are included in a Study, out of the total number of subjects in that study. The Data Use column represents whether or not the study is a primary analysis of the data or a secondary analysis. State indicates whether the study is private or shared with the research community.

Study NameFilter by Study Name	DOIFilter by DOI	AbstractFilter by Abstract	Collection/Study SubjectsFilter by Collection/Study Subjects	Data UsageFilter by Data Usage	StateFilter by State
Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains	10.15154/1361507	Although de novo missense mutations have been predicted to account for more cases of autism than gene-truncating mutations, most research has focused on the latter. We identified the properties of de novo missense mutations in patients with neurodevelopmental disorders (NDDs) and highlight 35 genes with excess missense mutations. Additionally, 40 amino acid sites were recurrently mutated in 36 genes, and targeted sequencing of 20 sites in 17,600 NDD patients identified 21 new patients with identical missense mutations. One recurrent site (p.Ala636Thr) occurs in a glutamate receptor subunit, GRIA1. This same amino acid substitution in the homologous but distinct mouse glutamate receptor subunit Grid2 is associated with Lurcher ataxia. Phenotypic follow-up in five individuals with GRIA1 mutations shows evidence of specific learning disabilities and autism. Overall, we find significant clustering of de novo mutations in 200 genes, highlighting specific functional domains and synaptic candidate genes important in NDD pathology.	13/18812	Primary Analysis	Shared
Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci	10.15154/1334312	Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1).	8190/9975	Secondary Analysis	Shared
Complete Realignment of Whole Exome Sequencing data from 2415 families in SSC Collection	10.15154/1169193	Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to realign sequencing data from all three collection sin a uniform manner using the latest toolchains and algorithms available, which can be used as a resource for the entire ASD Community. Original sequence data has been realigned to a single reference genome (1000 Genomes / GRCh37) using BWA, Picardtools, Samtools, and some custom python scripts. QC summary data were generated as part of the realignment process using the aforementioned tools in addition to QPLOT and some custom scripts. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public. The data package for this study represents the genomics_subject02, genomics_sample03, and omics_qa01 data structures which include realigned BAM files and QC files (i.e., QPLOT output and BAM header files). Variant calling an annotation for these data are provided in NDAR Studies 348 (https://ndar.nih.gov/study.html?id=348) and 349 (https://ndar.nih.gov/study.html?id=349).	9047/9047	Secondary Analysis	Shared
The contribution of mosaic variants to autism spectrum disorder	10.15154/1247692	De novo mutation is highly implicated in autism spectrum disorder (ASD). However, the contribution of post-zygotic mutation to ASD is poorly characterized. We performed both exome sequencing of paired samples and analysis of de novo variants from whole-exome sequencing of 2,388 families. While we find little evidence for tissue-specific mosaic mutation, multi-tissue post-zygotic mutation (i.e. mosaicism) is frequent, with detectable mosaic variation comprising 5.4% of all de novo mutations. We identify three mosaic missense and likely-gene disrupting mutations in genes previously implicated in ASD (KMT2C, NCKAP1, and MYH10) in probands but none in siblings. We find a strong ascertainment bias for mosaic mutations in probands relative to their unaffected siblings (p = 0.003). We build a model of de novo variation incorporating mosaic variants and errors in classification of mosaic status and from this model we estimate that 33% of mosaic mutations in probands contribute to 5.1% of simplex ASD diagnoses (95% credible interval 1.3% to 8.9%). Our results indicate a contributory role for multi-tissue mosaic mutation in some individuals with an ASD diagnosis.	9047/9047	Secondary Analysis	Shared
Copy Number Variants from SSC Collection ~ 2500 families by two Methods (XHMM and Conifer)	10.15154/1169318	XHMM was run on a set of realigned BAM files from the SSC collection (see NDAR Study 334 for BAM files) using the attached scripts. These scripts calculate depth of coverage using GATK, pull the GATK output from an instance on NDAR's cloud, merge the output of GATK into a single matrix, process the read depth matrix (filter, center), normalize the matrix using principal component analysis (PCA), process the normalized read depth matrix (filter, z-score), run a hidden markov model (HMM) on this matrix to identify CNVs in the normalized data, and generate family level vcfs from the xhmm data. XHMM produces as output coverage summary tables produced by GATK (sample_interval_statistics, sample_interval_summary, sample_summary, sample_statistics), principal component data files, a genotyped CNV output VCF file, and some example plots and graphics. For this study, the GATK output is available. Additional information about XHMM is available here: http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml	9041/9041	Secondary Analysis	Shared
Variant Recalling (FreeBayes) from Whole Exome Sequencing data for 2415 families in SSC Collection	10.15154/1169197	Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available. Variant calls from this study were generated using FreeBayes, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public The data package for this study includes the genomics_sample02, genomics_sample03 structures with annotated and un-annotated VCF files for each family. Another NDAR Study (348) is available with VCF files generated using GATK (https://ndar.nih.gov/study.html?id=348), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334)	8976/8976	Secondary Analysis	Shared
Variant Recalling (GATK) from Whole Exome Sequencing data for 2415 families in SSC Collection	10.15154/1169195	Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available. Variant calls from this study were generated using GATK, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public The data package for this study represents the genomics_subject02, genomics_sample03 structures which include annotated and un-annotated VCF files for each family. Another NDAR Study (349) is available with VCF files generated using FreeBayes (https://ndar.nih.gov/study.html?id=349), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334)	8976/8976	Secondary Analysis	Shared
Excess of rare inherited truncating mutations in autism	10.15154/1151812	In order to quantify the effect of private, inherited mutations on autism risk, we generated a callset of both inherited and de novo single nucleotide variants (SNVs) and copy number variants (CNVs) across 2,377 Simons Simplex Collection families. The publically deposited dataset includes 1,786 parents-child-unaffected sibling "quads" allowing us to compare burden of inherited and de novo mutations between affected and unaffected siblings in simplex autism families. We find that private, inherited truncating SNV mutations in conserved genes are significantly enriched in probands (odds ratio = 1.14, p = 0.0002) and more likely to be transmitted to children with autism when compared to their unaffected siblings (p < 0.0001). We find that this effect becomes more pronounced with increasing gene conservation (Residual Variation Intolerance Score, RVIS). Likewise, we observe a similar bias for inherited CNVs specifically for small (<100 kbp), maternally inherited events (p = 9.6x10^-3) that are enriched in CHD8 target genes (OR = 3.6, p = 2.0x10^-3). We quantified autism spectrum disorder (ASD) risk for de novo and inherited CNVs and SNVs by using a conditional logistic regression model. Independent from de novo mutations, private truncating SNVs and rare, inherited CNVs contribute an increase in risk with an odds ratio 1.11 (p = 0.0002) and 1.23 (p = 0.01), respectively. Our results indicate a statistically independent role for inherited mutations in ASD risk and identify additional high-impact risk candidate genes (e.g., RIMS1, CUL7, LZTR1 and CC2D2A) where transmitted mutations may create a sensitized background for autism but are unlikely to be necessary and sufficient for the disorder.	8911/8911	Secondary Analysis	Shared
Evolutionary and Genetic Analysis of Synonymous Nucleotide Substitutions in Subjects with Autism Spectrum Disorders	10.15154/1462716	The director of the project, Dr. Igor Rogozin, analyzed a modest collection of synonymous nucleotide substitutions from two small databases of mutations observed in autistic subjects [1]. Dr. Rogozin and his colleagues found that there was a statistically significant tendency for these synonymous nucleotide substitutions to replace a reference codon supportive of faster protein translation with a non-reference codon that is known to be associated with slower translation [1]. In the proposed study, we wish to test the codon replacement properties of synonymous substitutions reported in the much larger NDAR database, including whether the property of propensity to slower translation holds in a much larger data set of mutations. We also wish to compare the characteristics of the synonymous and nonsynonymous substitutions, using established techniques in genetics. [1] Poliakov E, Koonin EV, Rogozin IB. Impairment of translation in neurons as a putative causative factor for autism. Biology Direct. 2014; 9:16.	7200/7200	Secondary Analysis	Shared
The evolution and population diversity of human-specific segmental duplications	10.15154/1338620	Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (n=80 genes/33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed “core duplicons”, and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (e.g., TCAF1/2), we highlight ten gene families (e.g., ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing, and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.	1536/6360	Primary Analysis	Shared
Mitochondrial DNA mutations in Autism Spectrum Disorder	10.15154/1407407	Mitochondrial dysfunction is frequently observed in Autism Spectrum Disorders (ASD). Thus, variations in the mitochondrial DNA (mtDNA) sequences may contribute to increased ASD risks. In the current study, we evaluated mtDNA variations, including homoplasmy and heteroplasmy, in 903 ASD individuals along with their mothers and non-ASD siblings by using off-target reads from whole-exome sequencing data sets of Simons Foundation Autism Research Initiative (SFARI) Simons Collection available on NDAR. We found that heteroplasmic mutations in ASD individuals were enriched at non-polymorphic mtDNA sites (P = 0.0015) compared to their non-ASD siblings, which were more likely to confer deleterious effects than heteroplasmies at polymorphic mtDNA sites. Accordingly, we observed a ~1.5-fold enrichment of nonsynonymous mutations as well as a ~2.2-fold enrichment of predicted pathogenic mutations (P < 0.003) in ASD individuals compared to their non-ASD siblings. Our genetic findings substantiate pathogenic mtDNA mutations as a potential cause for ASD and synergize with recent work calling attention to their unique metabolic phenotypes for diagnosis and treatment of ASD.	2479/2709	Secondary Analysis	Shared
The striatal matrix compartment is expanded in autism spectrum disorder.	10.15154/khn8-jf08	Background: Autism spectrum disorder (ASD) is the second-most common neurodevelopmental disorder in childhood. This complex developmental disorder that manifests with restricted interests, repetitive behaviors, and difficulties in communication and social awareness. The inherited and acquired causes of ASD impact many and diverse brain regions, challenging efforts to identify a shared neuroanatomical substrate for this range of symptoms. The striatum and its connections are among the most implicated sites of abnormal structure and/or function in ASD. Striatal projection neurons develop in segregated tissue compartments, the matrix and striosome, that are histochemically, pharmacologically, and functionally distinct. Immunohistochemical assessment of ASD and animal models of autism described abnormal matrix:striosome volume ratios, with an possible shift from striosome to matrix volume. Shifting the matrix:striosome ratio could result from expansion in matrix, reduction in striosome, spatial redistribution of the compartments, or a combination of these changes. Each type of ratio-shifting abnormality may predispose to ASD but yield different combinations of ASD features. Methods: We developed a cohort of 426 children and adults (213 matched ASD-control pairs) and performed connectivity-based parcellation (diffusion tractography) of the striatum. This identified voxels with matrix-like and striosome-like patterns of structural connectivity. Results: Matrix-like volume was increased in ASD, with no evident change in the volume or organization of the striosome-like compartment. The inter-compartment volume difference (matrix minus striosome) within each individual was 31% larger in ASD. Matrix-like volume was increased in both caudate and putamen, and in somatotopic zones throughout the rostral-caudal extent of the striatum. Subjects with moderate elevations in ADOS (Autism Diagnostic Observation Schedule) scores had increased matrix-like volume, but those with highly elevated ADOS scores had 3.7-fold larger increases in matrix-like volume. Conclusions: Matrix and striosome are embedded in distinct structural and functional networks, suggesting that compartment-selective injury or maldevelopment may mediate specific and distinct clinical features. Previously, assessing the striatal compartments in humans required post mortem tissue. Striatal parcellation provides a means to assess neuropsychiatric diseases for compartment-specific abnormalities in vivo. While this ASD cohort had increased matrix-like volume, other mechanisms that shift the matrix:striosome ratio may also increase the chance of developing the diverse social, sensory, and motor phenotypes of ASD.	20/2166	Secondary Analysis	Shared
Identification of differentially methylated regions (DMRs) and cytosine sites (DMCs) in DNA methylation data of autism cases and unaffected siblings	10.15154/vpbk-fy21	We compared blood-based DNA methylation profiles between children with autism spectrum disorder (ASD) and carefully matched, unrelated neurotypical control children. Using sequencing-based method, we identified ASD-specific differentially methylated regions (DMRs) and cytosine sites (DMCs). We carried out comparative analyses with datasets from the NDA Collection 1650 (SFARI - DNA Methylation Analysis Cohort) that measured blood DNA methylation in ASD using microarray technology. We also identified DMRs and DMCs using metilene and minfi pipelines in the DNAm datasets from the NDA Collection 1650.	601/728	Secondary Analysis	Shared
Phenotypic subtyping and re-analysis of existing methylation data from autistic probands in simplex families reveal ASD subtype-associated differentially methylated genes and biological functions	10.15154/1522603	Autism spectrum disorder (ASD) describes a group of neurodevelopmental disorders with core deficits in social communication and manifestation of restricted, repetitive, and stereotyped behaviors. Despite the core symptomatology, ASD is extremely heterogeneous with respect to the severity of symptoms and behaviors. This heterogeneity presents an inherent challenge to all large-scale genome-wide 'omics analyses. In the present study, we address this heterogeneity by stratifying ASD probands from simplex families according to severity of behavioral scores on the Autism Diagnostic Interview-Revised diagnostic instrument, followed by re-analysis of existing DNA methylation data from individuals in three ASD subphenotypes in comparison to that of their respective unaffected siblings. We demonstrate that subphenotyping of cases enables the identification of over 1.6 times the number of statistically significant differentially methylated genes (DMGs) between cases and controls, compared to that identified when all cases are combined. Our analyses also reveal ASD-related neurological functions and comorbidities that are enriched among DMGs in each phenotypic subgroup but not in the combined case group. These findings may aid in the development of subtype-directed diagnostics and therapeutics.	129/584	Secondary Analysis	Shared
Embryonic lethal genetic variants and chromosomally normal pregnancy loss	10.15154/1521342	Objective: To examine whether rare potentially damaging genetic variants are associated with chromosomally normal pregnancy loss and estimate the magnitude of the association. Design: Case-control. Setting: Cases comprise 19 chromosomally normal loss conceptus-parent trios. They derive from a consecutive series of karyotyped losses at one hospital. Controls comprise 547 unaffected siblings of autism cases-parent trios from the National Database for Autism Research. Main outcome measures: The rate of predicted damaging variants in the exome (loss of function and missense–damaging) and the proportions of probands with at least one such variant among cases versus controls. Results: The proportions of probands with at least one rare predicted damaging variant were 36.8% among cases and 22.9% among controls (odds ratio (OR)=2.0, 99% CI 0.5-7.3). No case has a variant in a fetal anomaly gene. The proportion with variants in possibly embryonic lethal genes was increased in case probands (OR=14.5, 99% CI 1.5-89.7); variants occurred in BAZ1A, FBN2 and TIMP2. Conclusion: Rare genetic variants in the conceptus may be a cause of chromosomally normal loss. A larger sample is needed to estimate the magnitude of the association with precision and to identify relevant biological pathways.	547/547	Secondary Analysis	Shared

* Data not on individual level

helpcenter.collection.associated-studies-tab

Collection - Associated Studies

Clicking on the Study Title will open the study details in a new internet browser tab. The Abstract is available for viewing, providing the background explanation of the study, as provided by the Collection Owner.

Primary v. Secondary Analysis: The Data Usage column will have one of these two choices. An associated study that is listed as being used for Primary Analysis indicates at least some and potentially all of the data used was originally collected by the creator of the NDA Study. Secondary Analysis indicates the Study owner was not involved in the collection of data, and may be used as supporting data.

Private v. Shared State: Studies that remain private indicate the associated study is only available to users who are able to access the collection. A shared study is accessible to the general public.

Frequently Asked Questions

How do I associate a study to my collection?

Studies are associated to the Collection automatically when the data is defined in the Study.

Glossary

Associated Studies Tab

A tab in a Collection that lists the NDA Studies that have been created using data from that Collection including both Primary and Secondary Analysis NDA Studies.

Contact NDA Help Desk

Edit

Choose File:	Select File
File Type:
Description:

Exemption Type*
From Date*
To Date*
Reason*	Characters Remaining:

Disclaimer

Filter Cart

Frequently Asked Questions

Glossary

NDA Help Center

Collection - General Tab

Frequently Asked Questions

Glossary

NDA Help Center

Collection - Experiments

Frequently Asked Questions

Glossary

NDA Help Center

Collection - Shared Data

Frequently Asked Questions

Glossary

Publications

NDA Help Center

Collection - Publications

Frequently Asked Questions

Glossary

NDA Help Center

Collection - Data Expected

Frequently Asked Questions

Glossary

Associated Studies

NDA Help Center

Collection - Associated Studies

Frequently Asked Questions

Glossary

New Password:
Repeat New Password: