Policy for the NIMH Data Archive (NDA)
The National Institute of Mental Health (NIMH) is interested in advancing research to identify factors that influence the prevention, cause, diagnosis, and treatment of a variety of diseases/disorders. Serving this aim, the NIMH Data Archive (NDA) now provides infrastructure for data collection, retrieval, and results reporting related to human subjects research. The NIMH hopes that the broad research use of the NDA will accelerate the advancement of research.
A central function of the NDA is to store and to link together clinical, omics, neuro-signal recordings, and other data derived from individuals who participate in research studies. Depending on the goals of the research, investigators may collect any combination of data from research participants during a single study. The NDA provides the infrastructure to store, search across, and analyze these varied types of data. In addition, the NDA provides longitudinal storage of a research participant’s de-identified information that generated by one or more research studies. In other words, the NDA is able to associate a single research participant’s data even if those data were collected at different locations or through different studies. By doing so, the NDA gives researchers access to more data than they can collect on their own making it easier and faster for researchers to gather, evaluate, and share research information and results from a variety of sources.
The NIMH has given considerable thought to the policy issues surrounding human subjects research data and their deposition into federally controlled databases. Because the NDA will contain information derived from genome-wide association studies, in addition to other clinical data, the principles contained in this policy were developed to be consistent with existing NIH polices. This policy addresses (1) data sharing procedures, (2) data access principles, and (3) issues regarding the protection of research participants during the submission of, storage of, and access of data within the NDA.
The potential for public benefit through the sharing of research data is significant; however, data related to the presence or risk of developing a disease/disorder and information regarding paternity or ancestry, may be sensitive. Therefore, protecting the privacy of research participants and the confidentiality of their data is critically important. Risks to individuals, groups, or communities should be balanced carefully with potential benefits of the knowledge to be gained. The sensitive nature of information about participants and the broad data distribution goals of the NDA highlight the importance of the informed consent process.
The NDA Policy applies to research data collected both prospectively and retrospectively. For prospective studies, in which data sharing through the NDA is conceived within the study design at the time research participants provide their consent, the NIMH expects specific discussion within the informed consent process and documentation that participant’s data will be shared for research purposes through the NDA. For retrospectively collected data, the NIMH anticipates considerable variation in the extent to which data sharing and future research have been addressed within the informed consent documents; therefore, the NIMH expects the submitting institution to determine whether a retrospective study is appropriate for submission to the NDA (including an IRB and/or Privacy Board review of specific study elements, such as participant consent). The NIMH may give programmatic consideration to requests for funds or other resources needed to conduct additional participant consent, when appropriate. Furthermore, the NIMH intends to adapt the NDA Policy to be consistent with best practices for the consideration and risk-benefit analysis of participant research data sharing under this policy.
In the event that participants withdraw consent to share their individual-level data through the NDA, the submitting institution will be responsible for alerting the NDA to request that the specific research participant’s data be removed from future data distributions. However, any data that have been distributed to researchers will not be retracted.
As an agency of the federal government, the NIMH is required to release government records in response to a request under the Freedom of Information Act (FOIA), unless they are exempt from release under one of the FOIA exemptions. Although NIMH-held data will be de-identified and not hold direct identifiers to individuals within the NDA, the agency recognizes the personal and potentially sensitive nature of the data being stored. The NIMH takes the position that technologies available within the public domain today, and technological advances expected over the next few years, make the identification of specific individuals from raw data feasible and increasingly straightforward. Therefore, the agency believes that the release of un-redacted NDA data in response to a FOIA request would constitute an unreasonable invasion of personal privacy under FOIA Exemption 6, 5 U.S.C. § 552 (b)(6). Therefore, among the safeguards that the NIMH foresees using to preserve the privacy of research participants is the redaction of individual-level research data from disclosures made in response to FOIA requests and the denial of requests for un-redacted datasets.
Additionally, the NIMH acknowledges that legitimate requests for access to data may be made by law enforcement offices. The NDA will not possess direct identifiers, nor will the NIMH have access to the link between the code and the identifiable information that may reside with the primary investigator and institution for particular studies. The release of identifiable information may be protected from compelled disclosure by the primary investigator’s institution obtained a Certificate of Confidentiality for the original study. Researchers submitting data are encouraged to consider whether a Certificate of Confidentiality might be appropriate for their data as an additional safeguard with regard to involuntary disclosure of research participant identities. For its data, the NDA will hold a Certificate of Confidentiality. Further information about Certificates of Confidentiality is available by clicking here.
The NIMH believes that the full value of data in the NDA to the public is best realized if de-identified data related to a finding or publication are made broadly available as rapidly as possible to a wide range of scientific investigators (see the following NIMH Guide Notices: NOT-MH-15-012 (Clinical Research), NOT-MH-09-005 (Sharing Autism Related Human Subject Research), and NOT-MH-14-015 (Clinical Trials).
II. Data Management
To facilitate broad and consistent access to supported datasets, the NDA provides a single point of access to basic information. Although the NIMH envisions that access to many datasets will be possible through the NDA, the intention is not for the NDA to become the exclusive point of data sharing for these data or for the NDA to delimit the structures or tools that may be appropriate for other similar databases. The NDA will, however, accept human subjects research datasets contributed from other relevant sources.
To ensure the security of the data held by the NDA, the NIMH employs multiple tiers of data security based on the content and level of risk associated with the data. The NIMH has established and will maintain operating policies and procedures to address issues including, but not limited to, the privacy and confidentiality of research participants, the interests of individuals and groups, data access procedures, and data security mechanisms. Procedures safeguarding the NDA and its data will be reviewed periodically.
III. Data Submission
Data submitted to the NDA will be de-identified such that the identities of subjects cannot be readily ascertained or otherwise associated with the data by NDA staff or secondary data users (45 CFR 46). In addition, de-identified data will be coded using a unique code known as a Global Unique Identifier (GUID). Use of the GUID minimizes risks to study participants because it keeps one individual’s information separate from that of another person without using names, addresses, or other identifying information. The unique code also allows the NDA to link together all submitted information on a single participant, giving researchers access to information that may have been collected elsewhere. The GUID is a computer-generated alphanumeric code [example: NDA-1A462BS] that is unique to each research participant (i.e., each person’s information in the NDA—or each subject’s record—has a different GUID). For those studies that do not have the information needed to create a GUID (typically retrospective studies seeking to submit data to the NDA), a unique random identifier called a pseudo-GUID can be used.
The process of assigning a GUID prevents direct identifiers from ever being transmitted or stored in the NDA. However, although no direct identifiers are transmitted, entities covered by the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule may wish to consider whether transmission of the hash codes that are generated based on personally identifiable information and sent to the NIMH as well as the subsequent disclosure of the GUID with health information constitute a disclosure of protected health information. Researchers should direct questions to their institutions or contact legal counsel about how the Privacy Rule may apply to a specific research project or organization. Covered entities may also wish to visit the Department of Health and Human Services, Office for Civil Rights’ Web site for more information on the Privacy Rule and review educational materials, here. Additional information on the GUID is available by clicking here.
Tools for analysis of genomic and other complex data types are increasingly able to make inferences about some individual traits (e.g., height; weight; and, skin, hair, and eye color) and to identify predilections for characteristics (e.g., risk of developing some diseases) and behaviors with social stigma. In recognition of these risks, the NDA Policy includes steps to protect the interests and privacy concerns of individuals, families and identifiable groups who participate in research. The NIMH requests that institutions submitting datasets to the NDA certify that an Institutional Review Board (IRB) has considered such risks and that investigators have de-identified the data in accordance with 45 CFR 46.102(f) before the data are submitted. Prior to submitting data to the NDA, institutions must complete the NDA Data Submission Agreement - OMB Control Number: 0925-0667. Completed NDA Data Submission Agreements will only be accepted from institutions recognized by the NIH in the eRA Commons and signed by an Authorized Institutional Business Official – commonly called a signing official – as registered in eRA Commons by the institution.
IV. Data Access
The NDA provides basic descriptive and aggregate summary information for general public use. Such summary information may include summary counts and general statistics on completed assessment instruments. Access to subject level datasets submitted and stored in the NDA will only be provided for research purposes through the completion of the NDA Data Use Certification: OMB Control Number: 0925-0667. For the majority of the data available in the NDA, Data Use Certifications will only be accepted from researchers who are sponsored by an institution registered in the NIH’s eRA Commons with an active Federal-wide Assurance issued through the Office for Human Research Protections (OHRP). Additionally, the application must include a reason for access related to scientific investigation, scholarship or teaching, or other form of research. Based upon this information, an appropriate Data Access Committee (DAC) with relevant expertise may authorize access.
The established Data Access Committees or its designees will review requests for access to determine whether the proposed use of the dataset is scientifically and ethically appropriate. In the event that requests raise concerns related to privacy and confidentiality, risks to populations or groups, or other concerns, the DAC will consult with other experts, as appropriate.
V. Large Dataset Protection
The NDA is set up to accept omics, imaging, and neurosignal recordings data and results. Raw or nearly raw data received from research instrumentation (sequencers, MR scanners, EEG headsets) are also accepted and expected. These data are large, making them costly to copy and secure. Therefore, for those interested in using these data, the NDA will support access for just in time computation; however, the data are not to be persisted (i.e., stored) beyond the time necessary for computation.
VI. Collaboration Policy
A goal of the NDA is to provide human subjects researchers with the capability to share data and collaborate at the earliest possible opportunity in order to speed scientific discoveries; however, data sharing for secondary or collaborative research may be limited when a study is ongoing. For example, an incomplete dataset may not be appropriate to answer certain research questions and it may engender bias or error. Furthermore, in many cases, new or secondary uses of a dataset will be most effectively undertaken in collaboration with the lab that originated the data. Synergies may be developed to improve both the ongoing study and the new research. The NIMH wishes to encourage collaboration whenever appropriate helping facilitate such sharing.
Those who are conducting the experiment are in the best position to determine whether new uses of data are appropriate during the time that a study is ongoing. As a result, potential collaboration requires communication between the data submitter and the potential data recipient. Therefore, the NIMH recommends that recipients who are seeking access to data from an ongoing study to coordinate with the submitters. This coordination may result in a decision to collaborate, a decision to authorize access absent collaboration, or a decision to delay access until a study is complete. The NDA is setup to support each of these scenarios. For those researchers intending to use the NDA as a means for collection, submitters should notify NDA staff of their agreement to authorize access to data from an ongoing study to specific individuals that will remain embargoed to others and to specify when the data will become broadly shared. Once approved by the NDA, those data will be unavailable to others until the agreed upon time or event milestone (e.g., publication).
Additional information and detailed implementation guidance related to NDA can be found at this link, here. Specific questions about this policy should be directed to:
Office of the Director
National Institute of Mental Health, National Institutes of Health
6001 Executive Boulevard, Room 8252, MSC 9649
Rockville, Maryland 20892-9649
(if overnight delivery): Rockville, Maryland 20852 Phone: (301) 443-3265