Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains | Although de novo missense mutations have been predicted to account for more cases of autism than gene-truncating mutations, most research has focused on the latter. We identified the properties of de novo missense mutations in patients with neurodevelopmental disorders (NDDs) and highlight 35 genes with excess missense mutations. Additionally, 40 amino acid sites were recurrently mutated in 36 genes, and targeted sequencing of 20 sites in 17,600 NDD patients identified 21 new patients with identical missense mutations. One recurrent site (p.Ala636Thr) occurs in a glutamate receptor subunit, GRIA1. This same amino acid substitution in the homologous but distinct mouse glutamate receptor subunit Grid2 is associated with Lurcher ataxia. Phenotypic follow-up in five individuals with GRIA1 mutations shows evidence of specific learning disabilities and autism. Overall, we find significant clustering of de novo mutations in 200 genes, highlighting specific functional domains and synaptic candidate genes important in NDD pathology. | 13/18812 | Primary Analysis | Shared |
Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci | Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1). | 8190/9975 | Secondary Analysis | Shared |
Complete Realignment of Whole Exome Sequencing data from 2415 families in SSC Collection | Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to realign sequencing data from all three collection sin a uniform manner using the latest toolchains and algorithms available, which can be used as a resource for the entire ASD Community. Original sequence data has been realigned to a single reference genome (1000 Genomes / GRCh37) using BWA, Picardtools, Samtools, and some custom python scripts. QC summary data were generated as part of the realignment process using the aforementioned tools in addition to QPLOT and some custom scripts. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public. The data package for this study represents the genomics_subject02, genomics_sample03, and omics_qa01 data structures which include realigned BAM files and QC files (i.e., QPLOT output and BAM header files). Variant calling an annotation for these data are provided in NDAR Studies 348 (https://ndar.nih.gov/study.html?id=348) and 349 (https://ndar.nih.gov/study.html?id=349). | 9047/9047 | Secondary Analysis | Shared |
The contribution of mosaic variants to autism spectrum disorder | De novo mutation is highly implicated in autism spectrum disorder (ASD). However, the contribution of post-zygotic mutation to ASD is poorly characterized. We performed both exome sequencing of paired samples and analysis of de novo variants from whole-exome sequencing of 2,388 families. While we find little evidence for tissue-specific mosaic mutation, multi-tissue post-zygotic mutation (i.e. mosaicism) is frequent, with detectable mosaic variation comprising 5.4% of all de novo mutations. We identify three mosaic missense and likely-gene disrupting mutations in genes previously implicated in ASD (KMT2C, NCKAP1, and MYH10) in probands but none in siblings. We find a strong ascertainment bias for mosaic mutations in probands relative to their unaffected siblings (p = 0.003). We build a model of de novo variation incorporating mosaic variants and errors in classification of mosaic status and from this model we estimate that 33% of mosaic mutations in probands contribute to 5.1% of simplex ASD diagnoses (95% credible interval 1.3% to 8.9%). Our results indicate a contributory role for multi-tissue mosaic mutation in some individuals with an ASD diagnosis. | 9047/9047 | Secondary Analysis | Shared |
Copy Number Variants from SSC Collection ~ 2500 families by two Methods (XHMM and Conifer) | XHMM was run on a set of realigned BAM files from the SSC collection (see NDAR Study 334 for BAM files) using the attached scripts. These scripts calculate depth of coverage using GATK, pull the GATK output from an instance on NDAR's cloud, merge the output of GATK into a single matrix, process the read depth matrix (filter, center), normalize the matrix using principal component analysis (PCA), process the normalized read depth matrix (filter, z-score), run a hidden markov model (HMM) on this matrix to identify CNVs in the normalized data, and generate family level vcfs from the xhmm data. XHMM produces as output coverage summary tables produced by GATK (sample_interval_statistics, sample_interval_summary, sample_summary, sample_statistics), principal component data files, a genotyped CNV output VCF file, and some example plots and graphics. For this study, the GATK output is available.
Additional information about XHMM is available here: http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml | 9041/9041 | Secondary Analysis | Shared |
Variant Recalling (FreeBayes) from Whole Exome Sequencing data for 2415 families in SSC Collection | Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available.
Variant calls from this study were generated using FreeBayes, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public
The data package for this study includes the genomics_sample02, genomics_sample03 structures with annotated and un-annotated VCF files for each family. Another NDAR Study (348) is available with VCF files generated using GATK (https://ndar.nih.gov/study.html?id=348), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334) | 8976/8976 | Secondary Analysis | Shared |
Variant Recalling (GATK) from Whole Exome Sequencing data for 2415 families in SSC Collection | Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available.
Variant calls from this study were generated using GATK, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public
The data package for this study represents the genomics_subject02, genomics_sample03 structures which include annotated and un-annotated VCF files for each family. Another NDAR Study (349) is available with VCF files generated using FreeBayes (https://ndar.nih.gov/study.html?id=349), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334) | 8976/8976 | Secondary Analysis | Shared |
Excess of rare inherited truncating mutations in autism | In order to quantify the effect of private, inherited mutations on autism risk, we generated a callset of both inherited and de novo single nucleotide variants (SNVs) and copy number variants (CNVs) across 2,377 Simons Simplex Collection families. The publically deposited dataset includes 1,786 parents-child-unaffected sibling "quads" allowing us to compare burden of inherited and de novo mutations between affected and unaffected siblings in simplex autism families. We find that private, inherited truncating SNV mutations in conserved genes are significantly enriched in probands (odds ratio = 1.14, p = 0.0002) and more likely to be transmitted to children with autism when compared to their unaffected siblings (p < 0.0001). We find that this effect becomes more pronounced with increasing gene conservation (Residual Variation Intolerance Score, RVIS). Likewise, we observe a similar bias for inherited CNVs specifically for small (<100 kbp), maternally inherited events (p = 9.6x10^-3) that are enriched in CHD8 target genes (OR = 3.6, p = 2.0x10^-3). We quantified autism spectrum disorder (ASD) risk for de novo and inherited CNVs and SNVs by using a conditional logistic regression model. Independent from de novo mutations, private truncating SNVs and rare, inherited CNVs contribute an increase in risk with an odds ratio 1.11 (p = 0.0002) and 1.23 (p = 0.01), respectively. Our results indicate a statistically independent role for inherited mutations in ASD risk and identify additional high-impact risk candidate genes (e.g., RIMS1, CUL7, LZTR1 and CC2D2A) where transmitted mutations may create a sensitized background for autism but are unlikely to be necessary and sufficient for the disorder. | 8911/8911 | Secondary Analysis | Shared |
Evolutionary and Genetic Analysis of Synonymous Nucleotide Substitutions in Subjects with Autism Spectrum Disorders | The director of the project, Dr. Igor Rogozin, analyzed a modest collection of synonymous nucleotide substitutions from two small databases of mutations observed in autistic subjects [1]. Dr. Rogozin and his colleagues found that there was a statistically significant tendency for these synonymous nucleotide substitutions to replace a reference codon supportive of faster protein translation with a non-reference codon that is known to be associated with slower translation [1]. In the proposed study, we wish to test the codon replacement properties of synonymous substitutions reported in the much larger NDAR database, including whether the property of propensity to slower translation holds in a much larger data set of mutations. We also wish to compare the characteristics of the synonymous and nonsynonymous substitutions, using established techniques in genetics.
[1] Poliakov E, Koonin EV, Rogozin IB. Impairment of translation in neurons as a putative causative factor for autism. Biology Direct. 2014; 9:16. | 7200/7200 | Secondary Analysis | Shared |
The evolution and population diversity of human-specific segmental duplications | Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (n=80 genes/33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed “core duplicons”, and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (e.g., TCAF1/2), we highlight ten gene families (e.g., ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing, and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits. | 1536/6360 | Primary Analysis | Shared |
Mitochondrial DNA mutations in Autism Spectrum Disorder | Mitochondrial dysfunction is frequently observed in Autism Spectrum Disorders (ASD). Thus, variations in the mitochondrial DNA (mtDNA) sequences may contribute to increased ASD risks. In the current study, we evaluated mtDNA variations, including homoplasmy and heteroplasmy, in 903 ASD individuals along with their mothers and non-ASD siblings by using off-target reads from whole-exome sequencing data sets of Simons Foundation Autism Research Initiative (SFARI) Simons Collection available on NDAR. We found that heteroplasmic mutations in ASD individuals were enriched at non-polymorphic mtDNA sites (P = 0.0015) compared to their non-ASD siblings, which were more likely to confer deleterious effects than heteroplasmies at polymorphic mtDNA sites. Accordingly, we observed a ~1.5-fold enrichment of nonsynonymous mutations as well as a ~2.2-fold enrichment of predicted pathogenic mutations (P < 0.003) in ASD individuals compared to their non-ASD siblings. Our genetic findings substantiate pathogenic mtDNA mutations as a potential cause for ASD and synergize with recent work calling attention to their unique metabolic phenotypes for diagnosis and treatment of ASD. | 2479/2709 | Secondary Analysis | Shared |
Identification of differentially methylated regions (DMRs) and cytosine sites (DMCs) in DNA methylation data of autism cases and unaffected siblings | We compared blood-based DNA methylation profiles between children with autism spectrum disorder (ASD) and carefully matched, unrelated neurotypical control children. Using sequencing-based method, we identified ASD-specific differentially methylated regions (DMRs) and cytosine sites (DMCs). We carried out comparative analyses with datasets from the NDA Collection 1650 (SFARI - DNA Methylation Analysis Cohort) that measured blood DNA methylation in ASD using microarray technology. We also identified DMRs and DMCs using metilene and minfi pipelines in the DNAm datasets from the NDA Collection 1650. | 601/728 | Secondary Analysis | Shared |
Phenotypic subtyping and re-analysis of existing methylation data from autistic probands in simplex families reveal ASD subtype-associated differentially methylated genes and biological functions | Autism spectrum disorder (ASD) describes a group of neurodevelopmental disorders with core deficits in social communication and manifestation of restricted, repetitive, and stereotyped behaviors. Despite the core symptomatology, ASD is extremely heterogeneous with respect to the severity of symptoms and behaviors. This heterogeneity presents an inherent challenge to all large-scale genome-wide 'omics analyses. In the present study, we address this heterogeneity by stratifying ASD probands from simplex families according to severity of behavioral scores on the Autism Diagnostic Interview-Revised diagnostic instrument, followed by re-analysis of existing DNA methylation data from individuals in three ASD subphenotypes in comparison to that of their respective unaffected siblings. We demonstrate that subphenotyping of cases enables the identification of over 1.6 times the number of statistically significant differentially methylated genes (DMGs) between cases and controls, compared to that identified when all cases are combined. Our analyses also reveal ASD-related neurological functions and comorbidities that are enriched among DMGs in each phenotypic subgroup but not in the combined case group. These findings may aid in the development of subtype-directed diagnostics and therapeutics. | 129/584 | Secondary Analysis | Shared |
Embryonic lethal genetic variants and chromosomally normal pregnancy loss | Objective: To examine whether rare potentially damaging genetic variants are associated with chromosomally normal pregnancy loss and estimate the magnitude of the association.
Design: Case-control.
Setting: Cases comprise 19 chromosomally normal loss conceptus-parent trios. They derive from a consecutive series of karyotyped losses at one hospital. Controls comprise 547 unaffected siblings of autism cases-parent trios from the National Database for Autism Research.
Main outcome measures: The rate of predicted damaging variants in the exome (loss of function and missense–damaging) and the proportions of probands with at least one such variant among cases versus controls.
Results: The proportions of probands with at least one rare predicted damaging variant were 36.8% among cases and 22.9% among controls (odds ratio (OR)=2.0, 99% CI 0.5-7.3). No case has a variant in a fetal anomaly gene. The proportion with variants in possibly embryonic lethal genes was increased in case probands (OR=14.5, 99% CI 1.5-89.7); variants occurred in BAZ1A, FBN2 and TIMP2.
Conclusion: Rare genetic variants in the conceptus may be a cause of chromosomally normal loss. A larger sample is needed to estimate the magnitude of the association with precision and to identify relevant biological pathways.
| 547/547 | Secondary Analysis | Shared |