Accessing Files in the Cloud

NDA holds and protects rich datasets (fastq, brain imaging, etc.) in object-based storage (Amazon S3). To facilitate access to these the NDA supports deployment of data packages (created through the standard NDA query tools) to cloud-hosted Oracle databases called miNDARs (mini-NDARs), and read-only access to associated data files such as omics or imaging data directly where they are stored in S3. By providing this access, the NDA envisions real-time computation against rich datasets that can be initiated without the need to download data. Direct download is also currently supported if necessary.

This tutorial series demonstrates the basic process to deploy a data package to a miNDAR, use it to obtain the S3 endpoints of data files, and use nda-tools to download the files.

Creating a Package

Last Modified on Oct 21, 2022

NOTE: This text has been updated since the tutorial was created. The recording is not yet updated.

Once you have used one or more query tools to add a filter, or filters, to your cart for download, you will be able to view and edit filters in the cart panel in the upper-right corner of the page. Clicking Create Data Package/Add Data to Study will take you to the Data Packaging Page. This is a page where you can view the data you currently have returned by your query. The left panel displays the source Collections of all the data, and the right panel displays a list of all the data structures included. You can check or uncheck Collections and structures to remove or include in this particular package. An individual must be in both a checked Collection and a checked structure to be included.

You can also click Find All Subject Data to drop the existing query and replace it with a Query by GUID filter of all data for all currently included subjects. Once your selections are made, you can click Create Package to name and begin creating your download package. You will have the option to include or exclude associated files. These are associated data files such as omics, EEG, images, etc.