Cloud Access Overview
The mission of the National Institute of Mental Health Data Archive (NDA) is to make research data available for reuse. Data collected across projects can be aggregated and made available using the GUID, including clinical data, and the results of imaging, genomics, and other experimental data collected from the same participants. In this way, separate experiments on genotypes and brain volumes can inform the research community on the over one hundred thousand subjects now contained in the NDA. The NDA’s cloud computation capability provides a framework in support of this infrastructure.
How does it work?
The NDA holds and protects rich datasets (fastq, brain imaging) in object-based storage (Amazon S3). To facilitate access, the NDA supports the deployment of data packages (created through the NDA Query tools) to an Amazon Web Service Oracle database. These databases contain a table for each data structure in a package. Associated data files are available via read-only access to NDA’s S3 objects. Addresses for those objects in the associated package are provided in the miNDAR table titled S3_LINKS. By providing this interface, the NDA envisions real-time computation against rich datasets that can be initiated without the need to download full data packages. Furthermore, a new category of data structure has been created called "evaluated data." Tables for these structures will be created for each miNDAR, allowing researchers using NDA cloud capabilities and computational pipelines to write any analyzed data directly back to the miNDAR database. This will enable the NDA to make this data available to the general research community when appropriate.
miNDARs can also be populated with your own data and uploaded directly back into the NDA for a streamlined data submission directly from a hosted database.
How do I get started?
The option to launch data packages to a cloud-hosted database will be available during package creation. You can deploy previously generated data packages as well as new ones.
To move data to Oracle, first create a data package in the NDA. Then, following registration, enter the package id and credentials requested on the miNDAR tab. This will start the miNDAR creation process, which takes approximately 10 minutes. Once created, the miNDAR connect details will be emailed to you, and can be used to establish a connection with your credentials.
Access to download data to non-AWS internet addresses is limited, please read about our user download threshold. Access from AWS internet addresses is unlimited.
Files included in a data package are accessible from Amazon Web Services (AWS) S3 Object Storage. Each miNDAR package will have a table “S3_LINKS” table containing URIs for all objects in that package. Using direct web service calls to Amazon Web Service's S3 API, a third-party tool, or client libraries, data from these objects can be streamed or downloaded.
Authorization to access S3 Objects requires authentication with AWS using temporary AWS credentials or presigned URL. Both forms of authorization are time-limited, and require individual users to authenticate with the package service, and request either credentials or presigned URL for one or more files within a specific package.
Users may access the web service using the swagger user interface, or by writing their own tool.
Examples are provided on the NDA GitHub Page.