National Institute of Mental Health Data Archive (NDA) Sign In
National Institute of Mental Health Data Archive (NDA) Sign In

Success! An email is on its way!

Please check your email to complete the linking process. The link you receive is only valid for 30 minutes.

Check your spam or junk folder if you do not receive the email in the next few minutes.

Warning Notice This is a U.S. Government computer system, which may be accessed and used only for authorized Government business by authorized personnel. Unauthorized access or use of this computer system may subject violators to criminal, civil, and/or administrative action. All information on this computer system may be intercepted, recorded, read, copied, and disclosed by and to authorized personnel for official purposes, including criminal investigations. Such information includes sensitive data encrypted to comply with confidentiality and privacy requirements. Access or use of this computer system by any person, whether authorized or unauthorized, constitutes consent to these terms. There is no right of privacy in this system.
Create or Link an Existing NDA Account
NIMH Data Archive (NDA) Sign In or Create An Account
Update Password

You have logged in with a temporary password. Please update your password. Passwords must contain 8 or more characters and must contain at least 3 of the following types of characters:

  • Uppercase
  • Lowercase
  • Numbers
  • Special Characters limited to: %,_,!,@,#,$,-,%,&,+,=,),(,*,^,:,;

Subscribe to our mailing list

Mailing List(s)
Email Format

You are now leaving the NIMH Data Archive (NDA) web site to go to:

Click on the address above if the page does not change within 10 seconds.


NDA is not responsible for the content of this external site and does not monitor other web sites for accuracy.

Accept Terms
Data Access Terms - Decline Terms

Are you sure you want to cancel? This will decline terms and you will not be authorized for access.

1 Numbers reported are subjects by age
New Trial
New Project

Format should be in the following format: Activity Code, Institute Abbreviation, and Serial Number. Grant Type, Support Year, and Suffix should be excluded. For example, grant 1R01MH123456-01A1 should be entered R01MH123456

Please select an experiment type below

Collection - Use Existing Experiment
To associate an experiment to the current collection, just select an axperiment from the table below then click the associate experiment button to persist your changes (saving the collection is not required). Note that once an experiment has been associated to two or more collections, the experiment will not longer be editable.

The table search feature is case insensitive and targets the experiment id, experiment name and experiment type columns. The experiment id is searched only when the search term entered is a number, and filtered using a startsWith comparison. When the search term is not numeric the experiment name is used to filter the results.
SelectExperiment IdExperiment NameExperiment Type
Created On
475MB1-10 (CHOP)Omics06/07/2016
490Illumina Infinium PsychArray BeadChip AssayOmics07/07/2016
501PharmacoBOLD Resting StatefMRI07/27/2016
509ABC-CT Resting v2EEG08/18/2016
13Comparison of FI expression in Autistic and Neurotypical Homo SapiensOmics12/28/2010
18AGRE/Broad Affymetrix 5.0 Genotype ExperimentOmics01/06/2011
22Stitching PCR SequencingOmics02/14/2011
29Microarray family 03 (father, mother, sibling)Omics03/24/2011
37Standard paired-end sequencing of BCRsOmics04/19/2011
38Illumina Mate-Pair BCR sequencingOmics04/19/2011
39Custom Jumping LibrariesOmics04/19/2011
40Custom CapBPOmics04/19/2011
43Autism brain sample genotyping, IlluminaOmics05/16/2011
53AGRE Omni1-quadOmics10/11/2011
59AGP genotypingOmics04/03/2012
60Ultradeep 454 sequencing of synaptic genes from postmortem cerebella of individuals with ASD and neurotypical controlsOmics06/23/2012
63Microemulsion PCR and Targeted Resequencing for Variant Detection in ASDOmics07/20/2012
76Whole Genome Sequencing in Autism FamiliesOmics01/03/2013
90Genotyped IAN SamplesOmics07/09/2013
91NJLAGS Axiom Genotyping ArrayOmics07/16/2013
93AGP genotyping (CNV)Omics09/06/2013
106Longitudinal Sleep Study. H20 200. Channel set 2EEG11/07/2013
107Longitudinal Sleep Study. H20 200. Channel set 3EEG11/07/2013
108Longitudinal Sleep Study. AURA 200EEG11/07/2013
105Longitudinal Sleep Study. H20 200. Channel set 1EEG11/07/2013
109Longitudinal Sleep Study. AURA 400EEG11/07/2013
116Gene Expression Analysis WG-6Omics01/07/2014
131Jeste Lab UCLA ACEii: Charlie Brown and Sesame Street - Project 1Eye Tracking02/27/2014
132Jeste Lab UCLA ACEii: Animacy - Project 1Eye Tracking02/27/2014
133Jeste Lab UCLA ACEii: Mom Stranger - Project 2Eye Tracking02/27/2014
134Jeste Lab UCLA ACEii: Face Emotion - Project 3Eye Tracking02/27/2014
151Candidate Gene Identification in familial AutismOmics06/09/2014
152NJLAGS Whole Genome SequencingOmics07/01/2014
154Math Autism Study - Vinod MenonfMRI07/15/2014
160syllable contrastEEG07/29/2014
167School-age naturalistic stimuliEye Tracking09/19/2014
44AGRE/Broad Affymetrix 5.0 Genotype ExperimentOmics06/27/2011
45Exome Sequencing of 20 Sporadic Cases of Autism Spectrum DisorderOmics07/15/2011
78MET GenotypesOmics03/18/2013
Collection - Add Experiment
Add Supporting Documentation
Select File

To add an existing Data Structure, enter its title in the search bar. If you need to request changes, select the indicator "No, it requires changes to meet research needs" after selecting the Structure, and upload the file with the request changes specific to the selected Data Structure. Your file should follow the Request Changes Procedure. If the Data Structure does not exist, select "Request New Data Structure" and upload the appropriate zip file.

Request Submission Exemption
Characters Remaining:
Not Eligible

The Data Expected list for this Collection shows some raw data as missing. Contact the NDA Help Desk with any questions.

Please confirm that you will not be enrolling any more subjects and that all raw data has been collected and submitted.

Collection Updated

Your Collection is now in Data Analysis phase and exempt from biannual submissions. Analyzed data is still expected prior to publication or no later than the project end date.

[CMS] Attention
[CMS] Please confirm that you will not be enrolling any more subjects and that all raw data has been collected and submitted.
[CMS] Error


Unable to change collection phase where targeted enrollment is less than 90%

Delete Submission Exemption
Are you sure you want to delete this submission exemption?
You have requested to move the sharing dates for the following assessments:
Data Expected Item Original Sharing Date New Sharing Date

Please provide a reason for this change, which will be sent to the Program Officers listed within this collection:

Explanation must be between 20 and 200 characters in length.

Please press Save or Cancel
Add New Email Address - Dialog
New Email Address
Collection Summary Collection Charts
Collection Title Collection Investigators Collection Description
1/5 to 5/5 Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing
Mark Daly, Richard Gibbs, Joseph Buxbaum, Gerard Schellenberg and James Sutcliffe 
ARRA Autism Sequencing Collaboration
NIMH Data Archive
Funding Completed
Close Out
Loading Chart...
NIH - Extramural None

https://software.broadinstitute.org/gatk/ Software Genome Analysis Toolkit Qualified Researchers
http://picard.sourceforge.net/ Software Picard Qualified Researchers
http://samtools.sourceforge.net/ Software SAM Tools Qualified Researchers

R01MH089175-01 1/5: Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing 09/30/2009 08/31/2011 Not Reported Not Reported BAYLOR COLLEGE OF MEDICINE $2,998,515.00
R01MH089208-01 2/5-Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing 09/30/2009 12/31/2012 Not Reported Not Reported BROAD INSTITUTE, INC. $4,165,764.00
R01MH089482-01 5/5 - Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing 09/30/2009 08/31/2012 Not Reported Not Reported VANDERBILT UNIVERSITY $5,196,989.00
R01MH089025-01 3/5-Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing 09/30/2009 08/31/2012 Not Reported Not Reported ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI $1,412,032.00
R01MH089004-01 4/5-Elucidating the Genetic Architecture of Autism by Deep Genomic Sequencing 09/30/2009 08/31/2012 Not Reported Not Reported UNIVERSITY OF PENNSYLVANIA $1,208,739.00


NDA Help Center

Collection - General Tab

Fields available for edit on the top portion of the page include:

  • Collection Title
  • Investigators
  • Collection Description
  • Collection Phase
  • Funding Source
  • Clinical Trials

Collection Phase: The current status of a research project submitting data to an NDA Collection, based on the timing of the award and/or the data that have been submitted.

  • Pre-Enrollment: The default entry made when the NDA Collection is created.
  • Enrolling: Data have been submitted to the NDA Collection or the NDA Data Expected initial submission date has been reached for at least one data structure category in the NDA Collection.
  • Data Analysis: Subject level data collection for the research project is completed and has been submitted to the NDA Collection. The NDA Collection owner or the NDA Help Desk may set this phase when they’ve confirmed data submission is complete and submitted subject counts match at least 90% of the target enrollment numbers in the NDA Data Expected. Data submission reminders will be turned off for the NDA Collection.
  • Funding Completed: The NIH grant award (or awards) associated with the NDA Collection has reached its end date. NDA Collections in Funding Completed phase are assigned a subphase to indicate the status of data submission.
    • The Data Expected Subphase indicates that NDA expects more data will be submitted
    • The Closeout Subphase indicates the data submission is complete.
    • The Sharing Not Met Subphase indicates that data submission was not completed as expected.

Blinded Clinical Trial Status:

  • This status is set by a Collection Owner and indicates the research project is a double blinded clinical trial. When selected, the public view of Data Expected will show the Data Expected items and the Submission Dates, but the targeted enrollment and subjects submitted counts will not be displayed.
  • Targeted enrollment and subjects submitted counts are visible only to NDA Administrators and to the NDA Collection or as the NDA Collection Owner.
  • When an NDA Collection that is flagged Blinded Clinical Trial reaches the maximum data sharing date for that Data Repository (see https://nda.nih.gov/nda/sharing-regimen.html), the embargo on Data Expected information is released.

Funding Source

The organization(s) responsible for providing the funding is listed here.

Supporting Documentation

Users with Submission privileges, as well as Collection Owners, Program Officers, and those with Administrator privileges, may upload and attach supporting documentation. By default, supporting documentation is shared to the general public, however, the option is also available to limit this information to qualified researchers only.

Grant Information

Identifiable details are displayed about the Project of which the Collection was derived from. You may click in the Project Number to view a full report of the Project captured by the NIH.

Clinical Trials

Any data that is collected to support or further the research of clinical studies will be available here. Collection Owners and those with Administrator privileges may add new clinical trials.

Frequently Asked Questions

  • How does the NIMH Data Archive (NDA) determine which Permission Group data are submitted into?
    During Collection creation, NDA staff determine the appropriate Permission Group based on the type of data to be submitted, the type of access that will be available to data access users, and the information provided by the Program Officer during grant award.
  • How do I know when a NDA Collection has been created?
    When a Collection is created by NDA staff, an email notification will automatically be sent to the PI(s) of the grant(s) associated with the Collection to notify them.
  • Is a single grant number ever associated with more than one Collection?
    The NDA system does not allow for a single grant to be associated with more than one Collection; therefore, a single grant will not be listed in the Grant Information section of a Collection for more than one Collection.
  • Why is there sometimes more than one grant included in a Collection?
    In general, each Collection is associated with only one grant; however, multiple grants may be associated if the grant has multiple competing segments for the same grant number or if multiple different grants are all working on the same project and it makes sense to hold the data in one Collection (e.g., Cooperative Agreements).


  • Administrator Privilege
    A privilege provided to a user associated with an NDA Collection or NDA Study whereby that user can perform a full range of actions including providing privileges to other users.
  • Collection Owner
    Generally, the Collection Owner is the contact PI listed on a grant. Only one NDA user is listed as the Collection owner. Most automated emails are primarily sent to the Collection Owner.
  • Collection Phase
    The Collection Phase provides information on data submission as opposed to grant/project completion so while the Collection phase and grant/project phase may be closely related they are often different. Collection users with Administrative Privileges are encouraged to edit the Collection Phase. The Program Officer as listed in eRA (for NIH funded grants) may also edit this field. Changes must be saved by clicking the Save button at the bottom of the page. This field is sortable alphabetically in ascending or descending order. Collection Phase options include:
    • Pre-Enrollment: A grant/project has started, but has not yet enrolled subjects.
    • Enrolling: A grant/project has begun enrolling subjects. Data submission is likely ongoing at this point.
    • Data Analysis: A grant/project has completed enrolling subjects and has completed all data submissions.
    • Funding Completed: A grant/project has reached the project end date.
  • Collection Title
    An editable field with the title of the Collection, which is often the title of the grant associated with the Collection.
  • Grant
    Provides the grant number(s) for the grant(s) associated with the Collection. The field is a hyperlink so clicking on the Grant number will direct the user to the grant information in the NIH Research Portfolio Online Reporting Tools (RePORT) page.
  • Supporting Documentation
    Various documents and materials to enable efficient use of the data by investigators unfamiliar with the project and may include the research protocol, questionnaires, and study manuals.
  • NIH Research Initiative
    NDA Collections may be organized by scientific similarity into NIH Research Initiatives, to facilitate query tool user experience. NIH Research Initiatives map to one or multiple Funding Opportunity Announcements.
  • Permission Group
    Access to shared record-level data in NDA is provisioned at the level of a Permission Group. NDA Permission Groups consist of one or multiple NDA Collections that contain data with the same subject consents.
  • Planned Enrollment
    Number of human subject participants to be enrolled in an NIH-funded clinical research study. The data is provided in competing applications and annual progress reports.
  • Actual Enrollment
    Number of human subjects enrolled in an NIH-funded clinical research study. The data is provided in annual progress reports.
  • NDA Collection
    A virtual container and organization structure for data and associated documentation from one grant or one large project/consortium. It contains tools for tracking data submission and allows investigators to define a wide array of other elements that provide context for the data, including all general information regarding the data and source project, experimental parameters used to collect any event-based data contained in the Collection, methods, and other supporting documentation. They also allow investigators to link underlying data to an NDA Study, defining populations and subpopulations specific to research aims.
  • Data Use Limitations
    Data Use Limitations (DULs) describe the appropriate secondary use of a dataset and are based on the original informed consent of a research participant. NDA only accepts consent-based data use limitations defined by the NIH Office of Science Policy.
  • Total Subjects Shared
    The total number of unique subjects for whom data have been shared and are available for users with permission to access data.

NDA Help Center

Collection - Experiments

The number of Experiments included is displayed in parentheses next to the tab name. You may download all experiments associated with the Collection via the Download button. You may view individual experiments by clicking the Experiment Name and add them to the Filter Cart via the Add to Cart button.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may create or edit an Experiment.

Please note: The creation of an NDA Experiment does not necessarily mean that data collected, according to the defined Experiment, has been submitted or shared.

Frequently Asked Questions

  • Can an Experiment be associated with more than one Collection?

    Yes -see the “Copy” button in the bottom left when viewing an experiment. There are two actions that can be performed via this button:

    1. Copy the experiment with intent for modifications.
    2. Associate the experiment to the collection. No modifications can be made to the experiment.


  • Experiment Status
    An Experiment must be Approved before data using the associated Experiment_ID may be uploaded.
  • Experiment ID
    The ID number automatically generated by NDA which must be included in the appropriate file when uploading data to link the Experiment Definition to the subject record.
Autism Diagnostic Interview - Cumulative Clinical Assessments 435
Autism Diagnostic Observation Schedule (ADOS) - Module 4 Clinical Assessments 9
Autism Diagnostic Observation Schedule (ADOS)- Module 1 Clinical Assessments 154
Autism Diagnostic Observation Schedule (ADOS)- Module 2 Clinical Assessments 68
Autism Diagnostic Observation Schedule (ADOS)- Module 3 Clinical Assessments 110
CPEA STAART PPVT SUMMARY 2004 Clinical Assessments 322
Genomics Sample Genomics 2094
Genomics Subject Genomics 2097
Processed MRI Data Imaging 2
Ravens Coloured Progressive Matrices (CPM) Clinical Assessments 322
Stanford-Binet Intelligence Scales, Fifth Edition (SB5) Clinical Assessments 18

NDA Help Center

Collection - Shared Data

This tab provides a quick overview of the Data Structure title, Data Type, and Number of Subjects that are currently Shared for the Collection. The information presented in this tab is automatically generated by NDA and cannot be edited. If no information is visible on this tab, this would indicate the Collection does not have shared data or the data is private.

The shared data is available to other researchers who have permission to access data in the Collection's designated Permission Group(s). Use the Download button to get all shared data from the Collection to the Filter Cart.

Frequently Asked Questions

  • How will I know if another researcher uses data that I shared through the NIMH Data Archive (NDA)?
    To see what data your project have submitted are being used by a study, simply go the Associated Studies tab of your collection. Alternatively, you may review an NDA Study Attribution Report available on the General tab.
  • Can I get a supplement to share data from a completed research project?
    Often it becomes more difficult to organize and format data electronically after the project has been completed and the information needed to create a GUID may not be available; however, you may still contact a program staff member at the appropriate funding institution for more information.
  • Can I get a supplement to share data from a research project that is still ongoing?
    Unlike completed projects where researchers may not have the information needed to create a GUID and/or where the effort needed to organize and format data becomes prohibitive, ongoing projects have more of an opportunity to overcome these challenges. Please contact a program staff member at the appropriate funding institution for more information.


  • Data Structure
    A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
  • Data Type
    A grouping of data by similar characteristics such as Clinical Assessments, Omics, or Neurosignal data.
  • Shared
    The term 'Shared' generally means available to others; however, there are some slightly different meanings based on what is Shared. A Shared NDA Study is viewable and searchable publicly regardless of the user's role or whether the user has an NDA account. A Shared NDA Study does not necessarily mean that data used in the NDA Study have been shared as this is independently determined. Data are shared according the schedule defined in a Collection's Data Expected Tab and/or in accordance with data sharing expectations in the NDA Data Sharing Terms and Conditions. Additionally, Supporting Documentation uploaded to a Collection may be shared independent of whether data are shared.

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:


Publications relevant to NDA data are listed below. Most displayed publications have been associated with the grant within Pubmed. Use the "+ New Publication" button to add new publications. Publications relevant/not relevant to data expected are categorized. Relevant publications are then linked to the underlying data by selecting the Create Study link. Study provides the ability to define cohorts, assign subjects, define outcome measures and lists the study type, data analysis and results. Analyzed data and results are expected in this way.

PubMed IDStudyTitleJournalAuthorsDateStatus
38001974Create StudyIdentification of Neurotransmission and Synaptic Biological Processes Disrupted in Autism Spectrum Disorder Using Interaction Networks and Community Detection Analysis.BiomedicinesVilela, Joana; Martiniano, Hugo; Marques, Ana Rita; Santos, João Xavier; Asif, Muhammad; Rasga, Célia; Oliveira, Guiomar; Vicente, Astrid MouraNovember 4, 2023Not Determined
29593342Create StudyGenetic variants and pathways implicated in a pediatric inflammatory bowel disease cohort.Genes and immunityShaw KA, Cutler DJ, Okou D, Dodd A, Aronow BJ, Haberman Y, Stevens C, Walters TD, Griffiths A, Baldassano RN, Noe JD, Hyams JS, Crandall WV, Kirschner BS, Heyman MB, Snapper S, Guthery S, Dubinsky MC, Shapiro JM, Otley AR, Daly M, Denson LA, Kugathasan S, Zwick MEMarch 2018Not Determined
29358944Create StudyA Powerful Gene-Based Test Accommodating Common and Low-Frequency Variants to Detect Both Main Effects and Gene-Gene Interaction Effects in Case-Control Studies.Frontiers in geneticsChung, Ren-Hua; Kang, Chen-YuJanuary 2017Not Determined
28344757Create StudyLeveraging blood serotonin as an endophenotype to identify de novo and rare variants involved in autism.Molecular autismChen R, Davis LK, Guter S, Wei Q, Jacob S, Potter MH, Cox NJ, Cook EH, Sutcliffe JS, Li BJanuary 2017Not Determined
26439716Create StudyInterpreting de novo Variation in Human Disease Using denovolyzeR.Current protocols in human geneticsWare, James S; Samocha, Kaitlin E; Homsy, Jason; Daly, Mark J2015Not Determined
25363760Create StudySynaptic, transcriptional and chromatin genes disrupted in autism.NatureDe Rubeis, Silvia; He, Xin; Goldberg, Arthur P; Poultney, Christopher S; Samocha, Kaitlin; Cicek, A Erucment; Kou, Yan; Liu, Li; Fromer, Menachem; Walker, Susan; Singh, Tarinder; Klei, Lambertus; Kosmicki, Jack; Shih-Chen, Fu; Aleksic, Branko; Biscaldi, Monica; Bolton, Patrick F; Brownfeld, Jessica M; Cai, Jinlu; Campbell, Nicholas G; Carracedo, Angel; Chahrour, Maria H; Chiocchetti, Andreas G; Coon, Hilary; Crawford, Emily L; Curran, Sarah R; Dawson, Geraldine; Duketis, Eftichia; Fernandez, Bridget A; Gallagher, Louise; Geller, Evan; Guter, Stephen J; Hill, R Sean; Ionita-Laza, Juliana; Jimenz Gonzalez, Patricia; Kilpinen, Helena; Klauck, Sabine M; Kolevzon, Alexander; Lee, Irene; Lei, Irene; Lei, Jing; Lehtimäki, Terho; Lin, Chiao-Feng; Ma'ayan, Avi; Marshall, Christian R; McInnes, Alison L; Neale, Benjamin; Owen, Michael J; Ozaki, Noriio; Parellada, Mara; Parr, Jeremy R; Purcell, Shaun; Puura, Kaija; Rajagopalan, Deepthi; Rehnström, Karola; Reichenberg, Abraham; Sabo, Aniko; Sachse, Michael; Sanders, Stephan J; Schafer, Chad; Schulte-Rüther, Martin; Skuse, David; Stevens, Christine; Szatmari, Peter; Tammimies, Kristiina; Valladares, Otto; Voran, Annette; Li-San, Wang; Weiss, Lauren A; Willsey, A Jeremy; Yu, Timothy W; Yuen, Ryan K C; DDD Study; Homozygosity Mapping Collaborative for Autism; UK10K Consortium; Cook, Edwin H; Freitag, Christine M; Gill, Michael; Hultman, Christina M; Lehner, Thomas; Palotie, Aaarno; Schellenberg, Gerard D; Sklar, Pamela; State, Matthew W; Sutcliffe, James S; Walsh, Christiopher A; Scherer, Stephen W; Zwick, Michael E; Barett, Jeffrey C; Cutler, David J; Roeder, Kathryn; Devlin, Bernie; Daly, Mark J; Buxbaum, Joseph DNovember 13, 2014Not Relevant
25270638Create StudyConsensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes.Bioinformatics (Oxford, England)Trubetskoy, Vassily; Rodriguez, Alex; Dave, Uptal; Campbell, Nicholas; Crawford, Emily L; Cook, Edwin H; Sutcliffe, James S; Foster, Ian; Madduri, Ravi; Cox, Nancy J; Davis, Lea KJanuary 15, 2015Not Relevant
25086666Create StudyA framework for the interpretation of de novo mutation in human disease.Nature geneticsSamocha, Kaitlin E; Robinson, Elise B; Sanders, Stephan J; Stevens, Christine; Sabo, Aniko; McGrath, Lauren M; Kosmicki, Jack A; Rehnström, Karola; Mallick, Swapan; Kirby, Andrew; Wall, Dennis P; MacArthur, Daniel G; Gabriel, Stacey B; DePristo, Mark; Purcell, Shaun M; Palotie, Aarno; Boerwinkle, Eric; Buxbaum, Joseph D; Cook Jr, Edwin H; Gibbs, Richard A; Schellenberg, Gerard D; Sutcliffe, James S; Devlin, Bernie; Roeder, Kathryn; Neale, Benjamin M; Daly, Mark JSeptember 2014Not Determined
24094742Create StudyIdentification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder.American journal of human geneticsPoultney, Christopher S; Goldberg, Arthur P; Drapeau, Elodie; Kou, Yan; Harony-Nicolas, Hala; Kajiwara, Yuji; De Rubeis, Silvia; Durand, Simon; Stevens, Christine; Rehnström, Karola; Palotie, Aarno; Daly, Mark J; Ma'ayan, Avi; Fromer, Menachem; Buxbaum, Joseph DOctober 3, 2013Not Relevant
23979605Create StudyDe novo mutation in the dopamine transporter gene associates dopamine dysfunction with autism spectrum disorder.Molecular psychiatryHamilton PJ, Campbell NG, Sharma S, Erreger K, Herborg Hansen F, Saunders C, Belovich AN, , Sahai MA, Cook EH, Gether U, McHaourab HS, Matthies HJ, Sutcliffe JS, Galli ADaly MJGibbs RABoerwinkle EBuxbaum JDCook EHDevlin BLim ETNeale BMRoeder KSabo ASchellenberg GDStevens CSutcliffe JSDecember 2013Not Determined
23966865Create StudyIntegrated model of de novo and inherited genetic variants yields greater power to identify risk genes.PLoS geneticsHe, Xin; Sanders, Stephan J; Liu, Li; De Rubeis, Silvia; Lim, Elaine T; Sutcliffe, James S; Schellenberg, Gerard D; Gibbs, Richard A; Daly, Mark J; Buxbaum, Joseph D; State, Matthew W; Devlin, Bernie; Roeder, Kathryn2013Not Determined
23943636Create StudyDRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments.Bioinformatics (Oxford, England)Lin CF, Valladares O, Childress DM, Klevak E, Geller ET, Hwang YC, Tsai EA, Schellenberg GD, Wang LSOctober 1, 2013Not Determined
23743231Create StudyWhole exome sequencing reveals minimal differences between cell line and whole blood derived DNA.GenomicsSchafer CM, Campbell NG, Cai G, Yu F, Makarov V, Yoon S, Daly MJ, Gibbs RA, Schellenberg GD, Devlin B, Sutcliffe JS, Buxbaum JD, Roeder KOctober 2013Not Determined
23711981Create StudyDisruption of the non-canonical Wnt gene PRICKLE2 leads to autism-like behaviors with evidence for hippocampal synaptic dysfunction.Molecular psychiatrySowers, L P; Loo, L; Wu, Y; Campbell, E; Ulrich, J D; Wu, S; Paemka, L; Wassink, T; Meyer, K; Bing, X; El-Shanti, H; Usachev, Y M; Ueno, N; Manak, J R; Manak, R J; Shepherd, A J; Ferguson, P J; Darbro, B W; Richerson, G B; Mohapatra, D P; Wemmie, J A; Bassuk, A GOctober 2013Not Determined
23684009Create StudySequence kernel association tests for the combined effect of rare and common variants.American journal of human geneticsIonita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin XJune 6, 2013Not Determined
23593035Create StudyAnalysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls.PLoS geneticsLiu, Li; Sabo, Aniko; Neale, Benjamin M; Nagaswamy, Uma; Stevens, Christine; Lim, Elaine; Bodea, Corneliu A; Muzny, Donna; Reid, Jeffrey G; Banks, Eric; Coon, Hillary; Depristo, Mark; Dinh, Huyen; Fennel, Tim; Flannick, Jason; Gabriel, Stacey; Garimella, Kiran; Gross, Shannon; Hawes, Alicia; Lewis, Lora; Makarov, Vladimir; Maguire, Jared; Newsham, Irene; Poplin, Ryan; Ripke, Stephan; Shakir, Khalid; Samocha, Kaitlin E; Wu, Yuanqing; Boerwinkle, Eric; Buxbaum, Joseph D; Cook Jr, Edwin H; Devlin, Bernie; Schellenberg, Gerard D; Sutcliffe, James S; Daly, Mark J; Gibbs, Richard A; Roeder, KathrynApril 2013Not Relevant
23386037Create StudyFamily-based association tests for sequence data, and comparisons with population-based association tests.European journal of human genetics : EJHGIonita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin XOctober 2013Not Relevant
23352160Create StudyRare complete knockouts in humans: population distribution and significant role in autism spectrum disorders.NeuronLim ET, Raychaudhuri S, Sanders SJ, Stevens C, Sabo A, MacArthur DG, Neale BM, Kirby A, Ruderfer DM, Fromer M, Lek M, Liu L, Flannick J, Ripke S, Nagaswamy U, Muzny D, Reid JG, Hawes A, Newsham I, Wu Y, Lewis L, Dinh H, Gross S, Wang LS, Lin CF, et al.January 23, 2013Not Relevant
23216583Create StudyCharacterizing polymorphisms and allelic diversity of von Willebrand factor gene in the 1000 Genomes.Journal of thrombosis and haemostasis : JTHWang, Q Y; Song, J; Gibbs, R A; Boerwinkle, E; Dong, J F; Yu, F LFebruary 2013Not Relevant
22843986Create StudyzCall: a rare variant caller for array-based genotyping: genetics and population analysis.Bioinformatics (Oxford, England)Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, O'Dushlaine C, Moran JL, Chambert K, Stevens C, , , Sklar P, Hultman CM, Purcell S, McCarroll SA, Sullivan PF, Daly MJ, Neale BMOctober 1, 2012Not Relevant
22641211Create StudyExome sequencing and the genetic basis of complex traits.Nature geneticsKiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SRJune 2012Not Relevant
22610117Create StudyExtremely low-coverage sequencing and imputation increases power for genome-wide association studies.Nature geneticsPasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, Gupta N, Neale BM, Daly MJ, Sklar P, Sullivan PF, Bergen S, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Purcell SM, Haas DW, Liang L, Sunyaev S, Patterson N, de Bakker PI, Reich D, Price ALJune 2012Not Relevant
22578327Create StudyScan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets.American journal of human geneticsIonita-Laza I, Makarov V, , Buxbaum JDBoerwinkle EBuxbaum JDCook EHDaly MJDevlin BGibbs RRoeder KSabo ASchellenberg GDSutcliffe JSJune 8, 2012Not Relevant
22511880Study (293)Whole-exome sequencing and homozygosity analysis implicate depolarization-regulated neuronal genes in autism.PLoS geneticsChahrour, Maria H; Yu, Timothy W; Lim, Elaine T; Ataman, Bulent; Coulter, Michael E; Hill, R Sean; Stevens, Christine R; Schubert, Christian R; ARRA Autism Sequencing Collaboration; Greenberg, Michael E; Gabriel, Stacey B; Walsh, Christopher A2012Relevant
22499558Create StudyNetwork- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability.American journal of medical genetics. Part C, Seminars in medical geneticsKou, Yan; Betancur, Catalina; Xu, Huilei; Buxbaum, Joseph D; Ma'ayan, AviMay 15, 2012Not Relevant
22495311Study (317)Patterns and rates of exonic de novo mutations in autism spectrum disorders.NatureNeale, Benjamin M; Kou, Yan; Liu, Li; Ma'ayan, Avi; Samocha, Kaitlin E; Sabo, Aniko; Lin, Chiao-Feng; Stevens, Christine; Wang, Li-San; Makarov, Vladimir; Polak, Paz; Yoon, Seungtai; Maguire, Jared; Crawford, Emily L; Campbell, Nicholas G; Geller, Evan T; Valladares, Otto; Schafer, Chad; Liu, Han; Zhao, Tuo; Cai, Guiqing; Lihm, Jayon; Dannenfelser, Ruth; Jabado, Omar; Peralta, Zuleyma; Nagaswamy, Uma; Muzny, Donna; Reid, Jeffrey G; Newsham, Irene; Wu, Yuanqing; Lewis, Lora; Han, Yi; Voight, Benjamin F; Lim, Elaine; Rossin, Elizabeth; Kirby, Andrew; Flannick, Jason; Fromer, Menachem; Shakir, Khalid; Fennell, Tim; Garimella, Kiran; Banks, Eric; Poplin, Ryan; Gabriel, Stacey; DePristo, Mark; Wimbish, Jack R; Boone, Braden E; Levy, Shawn E; Betancur, Catalina; Sunyaev, Shamil; Boerwinkle, Eric; Buxbaum, Joseph D; Cook Jr, Edwin H; Devlin, Bernie; Gibbs, Richard A; Roeder, Kathryn; Schellenberg, Gerard D; Sutcliffe, James S; Daly, Mark JApril 4, 2012Relevant
22257670Create StudyAnnTools: a comprehensive and versatile annotation toolkit for genomic variants.Bioinformatics (Oxford, England)Makarov V, O'Grady T, Cai G, Lihm J, Buxbaum JD, Yoon SMarch 1, 2012Not Relevant
22137099Create StudyFinding disease variants in Mendelian disorders by using sequence data: methods and applications.American journal of human geneticsIonita-Laza I, Makarov V, Yoon S, Raby B, Buxbaum J, Nicolae DL, Lin XDecember 9, 2011Not Relevant
21408211Create StudyTesting for an unusual distribution of rare variants.PLoS geneticsNeale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJMarch 2011Not Relevant
20876472Create StudyA comprehensive analysis of deletions, multiplications, and copy number variations in PARK2.NeurologyKay DM, Stevens CF, Hamza TH, Montimurro JS, Zabetian CP, Factor SA, Samii A, Griffith A, Roberts JW, Molho ES, Higgins DS, Gancher S, Moses L, Zareparsi S, Poorkaj P, Bird T, Nutt J, Schellenberg GD, Payami HSeptember 28, 2010Not Relevant

NDA Help Center

Collection - Publications

The number of Publications is displayed in parentheses next to the tab name. Clicking on any of the Publication Titles will open the Publication in a new internet browsing tab.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may mark a publication as either Relevant or Not Relevant in the Status column.

Frequently Asked Questions

  • How can I determine if a publication is relevant?
    Publications are considered relevant to a collection when the data shared is directly related to the project or collection.
  • Where does the NDA get the publications?
    PubMed, an online library containing journals, articles, and medical research. Sponsored by NiH and National Library of Medicine (NLM).


  • Create Study
    A link to the Create an NDA Study page that can be clicked to start creating an NDA Study with information such as the title, journal and authors automatically populated.
  • Not Determined Publication
    Indicates that the publication has not yet been reviewed and/or marked as Relevant or Not Relevant so it has not been determined whether an NDA Study is expected.
  • Not Relevant Publication
    A publication that is not based on data related to the aims of the grant/project associated with the Collection or not based on any data such as a review article and, therefore, an NDA Study is not expected to be created.
  • PubMed
    PubMed provides citation information for biomedical and life sciences publications and is managed by the U.S. National Institutes of Health's National Library of Medicine.
  • PubMed ID
    The PUBMed ID is the unique ID number for the publication as recorded in the PubMed database.
  • Relevant Publication
    A publication that is based on data related to the aims of the grant/project associated with the Collection and, therefore, an NDA Study is expected to be created.
Data Expected List: Mandatory Data Structures

These data structures are mandatory for your NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

For NIMH HIV-related research that involves human research participants: Select the dictionary or dictionaries most appropriate for your research. If your research does not require all three data dictionaries, just ignore the ones you do not need. There is no need to delete extra data dictionaries from your NDA Collection. You can adjust the Targeted Enrollment column in the Data Expected tab to “0” for those unnecessary data dictionaries. At least one of the three data dictionaries must have a non-zero value.

Data ExpectedTargeted EnrollmentInitial SubmissionSubjects SharedStatus
Research Subject and Pedigree info icon
To create your project's Data Expected list, use the "+New Data Expected" to add or request existing structures and to request new Data Structures that are not in the NDA Data Dictionary.

If the Structure you need already exists, locate it and specify your dates and enrollment when adding it to your Data Expected list. If you require changes to the Structure you need, select the indicator stating "No, it requires changes to meet research needs," and upload a file containing your requested changes.

If the structure you need is not yet defined in the Data Dictionary, you can select "Upload Definition" and attach the necessary materials to request its creation.

When selecting the expected dates for your data, make sure to follow the standard Data Sharing Regimen and choose dates within the date ranges that correspond to your project start and end dates.

Please visit the Completing Your Data Expected Tutorial for more information.
Data Expected List: Data Structures per Research Aims

These data structures are specific to your research aims and should list all data structures in which data will be collected and submitted for this NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

Data ExpectedTargeted EnrollmentInitial SubmissionSubjects SharedStatus
Genomics/omics info icon
Structure not yet defined
No Status history for this Data Expected has been recorded yet

NDA Help Center

Collection - Data Expected

The Data Expected tab displays the list of all data that NDA expects to receive in association with the Collection as defined by the contributing researcher, as well as the dates for the expected initial upload of the data, and when it is first expected to be shared, or with the research community. Above the primary table of Data Expected, any publications determined to be relevant to the data within the Collection are also displayed - members of the contributing research group can use these to define NDA Studies, connecting those papers to underlying data in NDA.

The tab is used both as a reference for those accessing shared data, providing information on what is expected and when it will be shared, and as the primary tracking mechanism for contributing projects. It is used by both contributing primary researchers, secondary researchers, and NIH Program and Grants Management staff.

Researchers who are starting their project need to update their Data Expected list to include all the Data Structures they are collecting under their grant and set their initial submission and sharing schedule according to the NDA Data Sharing Regimen.

To add existing Data Structures from the Data Dictionary, to request new Data Structure that are not in the Dictionary, or to request changes to existing Data Structures, click "+New Data Expected".

For step-by-step instructions on how to add existing Data Structures, request changes to an existing Structure, or request a new Data Structure, please visit the Completing Your Data Expected Tutorial.

If you are a contributing researcher creating this list for the first time, or making changes to the list as your project progress, please note the following:

  • Although items you add to the list and changes you make are displayed, they are not committed to the system until you Save the entire page using the "Save" button at the bottom of your screen. Please Save after every change to ensure none of your work is lost.
  • If you attempt to add a new structure, the title you provide must be unique - if another structure exists with the same name your change will fail.
  • Adding a new structure to this list is the only way to request the creation of a new Data Dictionary definition.

Frequently Asked Questions

  • What is an NDA Data Structure?
    An NDA Data Structure is comprised of multiple Data Elements to make up an electronic definition of an assessment, measure, questionnaire, etc will have a corresponding Data Structure.
  • What is the NDA Data Dictionary?
    The NDA Data Dictionary is comprised of electronic definitions known as Data Structures.


  • Analyzed Data
    Data specific to the primary aims of the research being conducted (e.g. outcome measures, other dependent variables, observations, laboratory results, analyzed images, volumetric data, etc.) including processed images.
  • Data Item
    Items listed on the Data Expected list in the Collection which may be an individual and discrete Data Structure, Data Structure Category, or Data Structure Group.
  • Data Structure
    A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
  • Data Structure Category
    An NDA term describing the affiliation of a Data Structure to a Category, which may be disease/disorder or diagnosis related (Depression, ADHD, Psychosis), specific to data type (MRI, eye tracking, omics), or type of data (physical exam, IQ).
  • Data Structure Group
    A Data Item listed on the Data Expected tab of a Collection that indicates a group of Data Structures (e.g., ADOS or SCID) for which data may be submitted instead of a specific Data Structure identified by version, module, edition, etc. For example, the ADOS Data Structure Category includes every ADOS Data Structure such as ADOS Module 1, ADOS Module 2, ADOS Module 1 - 2nd Edition, etc. The SCID Data Structure Group includes every SCID Data Structure such as SCID Mania, SCID V Mania, SCID PTSD, SCID-V Diagnosis, and more.
  • Evaluated Data
    A new Data Structure category, Evaluated Data is analyzed data resulting from the use of computational pipelines in the Cloud and can be uploaded directly back to a miNDAR database. Evaluated Data is expected to be listed as a Data Item in the Collection's Data Expected Tab.
  • Imaging Data
    Imaging+ is an NDA term which encompasses all imaging related data including, but not limited to, images (DTI, MRI, PET, Structural, Spectroscopy, etc.) as well as neurosignal data (EEG, fMRI, MEG, EGG, eye tracking, etc.) and Evaluated Data.
  • Initial Share Date
    Initial Submission and Initial Share dates should be populated according to the NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data will be shared with authorized users upon publication (via an NDA Study) or 1-2 years after the grant end date specified on the first Notice of Award, as defined in the applicable Data Sharing Terms and Conditions.
  • Initial Submission Date
    Initial Submission and Initial Share dates should be populated according to these NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data for all subjects is not expected on the Initial Submission Date and modifications may be made as necessary based on the project's conduct.
  • Research Subject and Pedigree
    An NDA created Data Structure used to convey basic information about the subject such as demographics, pedigree (links family GUIDs), diagnosis/phenotype, and sample location that are critical to allow for easier querying of shared data.
  • Submission Cycle
    The NDA has two Submission Cycles per year - January 15 and July 15.
  • Submission Exemption
    An interface to notify NDA that data may not be submitted during the upcoming/current submission cycle.

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Associated Studies

Studies that have been defined using data from a Collection are important criteria to determine the value of data shared. The number of subjects column displays the counts from this Collection that are included in a Study, out of the total number of subjects in that study. The Data Use column represents whether or not the study is a primary analysis of the data or a secondary analysis. State indicates whether the study is private or shared with the research community.

Study NameAbstractCollection/Study SubjectsData UsageState
Prenatal substance exposure and child health: Understanding the role of environmental factors, genetics, and brain development This current study examined the interactions of 4 most common prenatal substance exposures (PSE), which are coffee, alcohol, tobacco and cannabis with poly-environmental and genetic factors and is the first to establish the comprehensive pathway map of the PSE in the context of both poly-environmental and genetic factors and adolescent brain structure. Using the large cohort from the ABCD study, this study not only demonstrates that PSE is widely associated with offspring health, but also elucidates the existence of possible modifiable factors for those who have already had the PSE. Associations of prenatal alcohol exposure with more physical and psychological impairment, impulsivity, and better cognitive performance, and associations of prenatal coffee exposure with more externalizing and total problems remained significant after adjustment for poly-environmental and -genetic risks while associations of prenatal marijuana/tobacco exposure diminished. While prenatal alcohol exposure was associated with a larger total cortical volume and regional volumes, prenatal tobacco exposure was associated with a smaller total cortical volume and surface area, smaller regional volumes and surface areas. Prenatal coffee exposure was associated with larger postcentral gyrus volume, smaller regional volume and surface area in the pericalcarine cortex, thinner superior frontal gyrus and rostral middle frontal gyrus, thicker right lingual gyrus and isthmus-cingulate cortex.Of the four PSE, environmental factors contributed to more health associations of prenatal tobacco exposure via moderation and mediation, while genetic factors confounded more health associations of prenatal marijuana exposure. The brain mediation analysis found that at baseline, cortical volume in the right middle temporal gyrus mediated the association of prenatal alcohol exposure with externalizing problems, whereas cortical volume in the right postcentral gyrus mediated the association of prenatal coffee exposure with externalizing problems. 2/11878Secondary AnalysisShared
No Evidence for Association of Autism with Rare Heterozygous Point Mutations in Contactin-Associated Protein-Like 2 (CNTNAP2), or in Other Contactin-Associated Proteins or ContactinsContactins and Contactin-Associated Proteins, and Contactin-Associated Protein-Like 2 (CNTNAP2) in particular, have been widely cited as autism risk genes based on findings from homozygosity mapping, molecular cytogenetics, copy number variation analyses, and both common and rare single nucleotide association studies. However, data specifically with regard to the contribution of heterozygous single nucleotide variants (SNVs) have been inconsistent. In an effort to clarify the role of rare point mutations in CNTNAP2 and related gene families, we have conducted targeted next-generation sequencing and evaluated existing sequence data in cohorts totaling 2704 cases and 2747 controls. We find no evidence for statistically significant association of rare heterozygous mutations in any of the CNTN or CNTNAP genes, including CNTNAP2, placing marked limits on the scale of their plausible contribution to risk.2094/4118Primary AnalysisShared
Elucidating the Genetic Architecture of Autism by Deep Genomic SequencingARRA Autism Sequencing Collaboration The VCF files provided as Study Results for this study are what was provided at the time the study was created and consist of the Autism Only consent group. There is an additional General Research Use cohort, but those data are not provided here in this study. To obtain data from the General Research Use cohort please visit the dbGaP Study phs000298. It should be noted that the dbGaP study has been updated since the time this study was created, and the update includes genomics data on additional subjects. http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000298.v2.p2 2095/2095Primary AnalysisShared
Unravelling the Collective Diagnostic Power Behind the Features in the Autism Diagnostic Observation ScheduleBackground: Autism is a group of heterogeneous disorders defined by deficits in social interaction and communication. Typically, diagnosis depends on the results of a behavioural examination called the Autism Diagnostic Observation Schedule (ADOS). Unfortunately, administration of the ADOS exam is time-consuming and requires a significant amount of expert intervention, leading to delays in diagnosis and access to early intervention programs. The diagnostic power of each feature in the ADOS exam is currently unknown. Our hypothesis is that certain features could be removed from the exam without a significant reduction in diagnostic accuracy, sensitivity or specificity. Objective: Determine the smallest subset of predictive features in ADOS module-1 (an exam variant for patients with minimal verbal skills). Methodology: ADOS module-1 datasets were acquired from the Autism Genetic Resource Exchange and the National Database for Autism Research. The datasets contained 2572 samples with the following labels: autism (1763), autism spectrum (513), and non-autism (296). The datasets were used as input to 4 different cost-sensitive classifiers in Weka (functional trees, LADTree, logistic model trees, and PART). For each classifier, a 10-fold cross validation was preformed and the number of predictive features, accuracy, sensitivity, and specificity was recorded. Results & Conclusion: Each classifier resulted in a reduction of the number of ADOS features required for autism diagnosis. The LADtree classifier was able to obtain the largest reduction, utilizing only 10 of 29 ADOS module-1 features (96.8% accuracy, 96.9% sensitivity, and 95.9% specificity). Overall, these results are a step towards a more efficient behavioural exam for autism diagnosis. 121/1832Secondary AnalysisShared
Automated Autism Diagnosis using Phenotypic and Genotypic Attributes: Phase IThe ultimate goal of this project is to develop a predictive system that can automate the diagnosis process for autism using phenotypic and genotypic attributes for classification. At this time, only a first phase is being pursued: starting with scores from Autism Diagnostic Observation Schedule (ADOS) reports, use data-mining techniques to select the smallest set of the most informative evaluation points that can lead to similar behavioral diagnoses as using all report features. The effort began in March, 2016 after data access to NDAR was granted. This report describes the results from that date through the end of December 2016.121/1045Secondary AnalysisShared
Germline Mutations in Predisposition Genes in Pediatric CancerBackground The prevalence and spectrum of predisposing mutations among children and adolescents with cancer are largely unknown. Knowledge of such mutations may improve the understanding of tumorigenesis, direct patient care, and enable genetic counseling of patients and families. Methods In 1120 patients younger than 20 years of age, we sequenced the whole genomes (in 595 patients), whole exomes (in 456), or both (in 69). We analyzed the DNA sequences of 565 genes, including 60 that have been associated with autosomal dominant cancer-predisposition syndromes, for the presence of germline mutations. The pathogenicity of the mutations was determined by a panel of medical experts with the use of cancer-specific and locus-specific genetic databases, the medical literature, computational predictions, and second hits identified in the tumor genome. The same approach was used to analyze data from 966 persons who did not have known cancer in the 1000 Genomes Project, and a similar approach was used to analyze data from an autism study (from 515 persons with autism and 208 persons without autism). Results Mutations that were deemed to be pathogenic or probably pathogenic were identified in 95 patients with cancer (8.5%), as compared with 1.1% of the persons in the 1000 Genomes Project and 0.6% of the participants in the autism study. The most commonly mutated genes in the affected patients were TP53 (in 50 patients), APC (in 6), BRCA2 (in 6), NF1 (in 4), PMS2 (in 4), RB1 (in 3), and RUNX1 (in 3). A total of 18 additional patients had protein-truncating mutations in tumor-suppressor genes. Of the 58 patients with a predisposing mutation and available information on family history, 23 (40%) had a family history of cancer. Conclusions Germline mutations in cancer-predisposing genes were identified in 8.5% of the children and adolescents with cancer. Family history did not predict the presence of an underlying predisposition syndrome in most patients.723/723Secondary AnalysisShared
Data-Driven Generation of Synthetic Behavioral Feature Vectors Modeling Children with Autism Spectrum DisordersBehavioral data on children with Autism Spectrum Disorders (ASD) are available thanks to standardized diagnostic tools, such as the Autism Diagnostic Observation Schedule (ADOS). This data can be of great use to enhance the learning and reasoning of agents interacting with children with ASD. However, the amount of such available data is limited and may not prove useful by itself to inform the algorithms of complex agents. To address this data scarcity problem, we present a method for generating synthetic behavioral data in the form of feature vectors characterizing a wide range of children with ASD. Our method relies on a thorough analysis and partition of the feature space based on a real dataset containing the ADOS scores of 279 children. We first analyze the real dataset using dimensionality reduction techniques, then introduce data-driven descriptors that partition the feature space into regions naturally arising from the data. We end by presenting a descriptor-based sampling method to generate synthetic feature vectors that successfully preserves the correlation structure of the real dataset.47/173Secondary AnalysisShared
Patterns and rates of exonic de novo mutations in autism spectrum disorders.Notes: Data submitted to NDAR did not include interview age. Publication Abstract: Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.104/104Primary AnalysisShared
* Data not on individual level

NDA Help Center

Collection - Associated Studies

Clicking on the Study Title will open the study details in a new internet browser tab. The Abstract is available for viewing, providing the background explanation of the study, as provided by the Collection Owner.

Primary v. Secondary Analysis: The Data Usage column will have one of these two choices. An associated study that is listed as being used for Primary Analysis indicates at least some and potentially all of the data used was originally collected by the creator of the NDA Study. Secondary Analysis indicates the Study owner was not involved in the collection of data, and may be used as supporting data.

Private v. Shared State: Studies that remain private indicate the associated study is only available to users who are able to access the collection. A shared study is accessible to the general public.

Frequently Asked Questions

  • How do I associate a study to my collection?
    Studies are associated to the Collection automatically when the data is defined in the Study.


  • Associated Studies Tab
    A tab in a Collection that lists the NDA Studies that have been created using data from that Collection including both Primary and Secondary Analysis NDA Studies.