Getting Access to Shared Data
Summary information on the data shared in NDA is available in the NDA Query Tool without the need for an NDA user account. To request access to record-level human subject data, you must submit a Data Access Request.
NDA Permission Groups consist of one or multiple NDA Collections that contain data with the same subject consents. An NDA Collection is a virtual container for data and other information related to a project/grant. It provides important information about the project, funding amounts, reported enrollment, data sharing schedule, and results. Broad Use Permission Groups consist of one or multiple NDA Collections that contain data that all have been consented for broad research use. Controlled Access Permission Groups consist of one or multiple NDA Collections that contain data with the same subject consent-based data use limitations. Open Access Permission Groups consist of one or multiple NDA Collections that contain data that all have been consented for broad research use and can be accessed by users who are not affiliated with an NIH-recognized research institution.
To get started, create an NDA user account or login to your existing one. Users with NDA credentials may submit Data Access Requests for one permission group at a time, from the NDA Permissions Dashboard. Each request includes an NDA Data Use Certification signed by the lead recipient and an authorized Signing Official from the recipient’s research institution. If you do not know your institution's Signing Official, contact us at the NDA Help Desk. Additional research staff from the same institution may be added to the Data Use Certification by completing the Senior/Key Person Profile (Collaborating Investigator) section. There is no limit to the number of additional staff members who can be added to a data access request, so only one request is needed for your entire lab. All recipients on a Data Use Certification must be affiliated with the recipient’s research institution. Data Access Requests for a given NDA Permission Group are reviewed by an NIH-staffed Data Access Committee.
NDA users submitting Data Access Requests for Broad Use and Controlled Access Permission Groups must be sponsored by an NIH recognized institution with a Federalwide Assurance and have a research related need to access NDA data. NDA users submitting Data Access Requests for Controlled Access Permission Groups must adhere to consent-based data use limitations. Data Access Requests for Open Access Permission Groups do not require institutional sponsorship. NDA Federated Repositories have their own access requirements. Detailed information about NDA permission groups is maintained at https://nda.nih.gov/about/about-us.html.
The mission of the National Institute of Mental Health Data Archive (NDA) is to make research data available for reuse. Data collected across projects can be aggregated and made available using the GUID, including clinical data, and the results of imaging, genomic, and other experimental data collected from the same participants. In this way, separate experiments on genotypes and brain volumes can inform the research community on the over one hundred thousand subjects now contained in the NDA. The NDA’s cloud computation capability provides a framework in support of this infrastructure.
How does it work?
NDA holds and protects rich datasets (fastq, brain imaging) in object-based storage (Amazon S3). To facilitate access, the NDA supports the deployment of data packages (created through the NDA Query tools) to an Amazon Web Service Oracle database. These databases contain a table for each data structure in a package. Associated data files are available via read-only access to NDA’s S3 objects. Addresses for those objects in the associated package are provided in the miNDAR table titled S3_LINKS. By providing this interface, the NDA envisions real-time computation against rich datasets that can be initiated without the need to download full data packages. Furthermore, a new category of data structure has been created called "evaluated data." Tables for these structures will be created for each miNDAR, allowing researchers using NDA cloud capabilities and computational pipelines to write any data directly back to the miNDAR database. This will enable the NDA to make this data available to the general research community when appropriate.
miNDARs can also be populated with your own data and uploaded directly back into the NDA for a streamlined data submission directly from a hosted database.
How do I get started?
The option to launch data packages to a cloud hosted database will be available during package creation. You can deploy previously generated data packages as well as new ones.
To move data to Oracle, first create a package in the NDA. Then, following registration, enter the package id and credentials requested on the miNDAR tab. This will start the miNDAR creation process, which takes approximately 10 minutes. Once created, the miNDAR connect details will be emailed to you, and can be used to establish a connection with your credentials.
Access to download data to non-AWS internet addresses is limited to 20 Terabytes over 30 days. For more detail, including examples, please read about our user download threshold.
Files included in a package are accessible from Amazon Web Services (AWS) S3 Object Storage. Each miNDAR package will have a table “S3_LINKS” table containing URIs for all objects in that package. Using direct web serivice calls to Amazon Web Service's S3 API, a third party tool, or client libraries, data from these objects can be streamed or downloaded.
For security purposes temporary AWS credentials are needed to access the S3 Objects. Temporary credentials are issued by authenticating with a web service using your NDA username and password. AWS credentials can be obtained directly from the web service (see examples on our GitHub page) or from the download manager, which is available in both a GUI and command line version.
For the GUI version, go to the 'Tools' menu and select 'Generate AWS Credentials'.
For the command line download manager, use the following syntax:
java -jar downloadmanager.jar --username user --password pass --g
For help with the command line download manager, use the following switches: -h, --help
The web service provides temporary credentials in three parts:
- an access key,
- a secret key,
- and a session token
All three parts are needed in order to authenticate properly with S3 and retrieve data.
Additionally the web service provides returns an expiration timestamp for the token in YYYY-MM-DDTHH:MM:SS-TZ format (TZ=HH:MM). New keys can be retrieved at any time. A service oriented approach allows for implementation of pipeline procedures which can request new keys at the appropriate stage of data processing.