Using the NDA GUID
The GUID, or Global Unique Identifier, is an alphanumeric code that is created by the NIMH Data Archive (NDA) GUID Tool and used as an identifier for a research participant. The GUID provides a secure mechanism to link research participants within and across research project datasets in NDA. The NDA GUID Tool and underlying algorithms were released in 2010 as the result of a collaboration between NIH, the Simons Foundation, and the informatics researcher community. Review the publication Using Global Unique Identifiers to Link Autism Collections (Johnson et al. 2010) for more information. Note that the GUID matching sensitivity has been reduced from the original design described in this paper.
The GUID itself is not personally identifiable information or protected health information.
Originally implemented to support the autism research community, the GUID is now available for other research communities needing a common subject identifier across research laboratories and repositories. Contact us at the NDA Help Desk for any questions regarding the GUID, the NDA GUID Tool, or to request the use of GUIDs in your research.
- How does the NDA GUID Tool work?
- What information will I need to create a GUID?
- GUID Creation FAQ
- How do I use the NDA GUID Tool?
- Data Submission with GUIDs
- Additional GUID Features & Functions
- Security Controls
- An authorized member of the research project team (user) downloads the NDA GUID Tool.
- The user enters participant PII into the tool.
- In a local computation, the tool generates a series of one-way hash codes based on the PII entered, without the PII ever leaving your computer.
- The hash codes are encrypted and securely sent to the GUID system at NDA.
- If the hash codes match an existing hash code, the GUID associated with that hash code is sent back to the researcher. The GUID is an alphanumeric code that is randomly and persistently linked to the hash codes within the secure NDA GUID system and cannot be traced back to the PII entered by the research project team member.
- If the hash codes do not match an existing hash code in the NDA GUID system, a new GUID is created and sent back.
With the NDA GUID Tool, the same participant information will return the same GUID whenever or wherever it is entered. This allows NDA to anonymously link participant data records across time and locations, without ever receiving identifying information. The ability to link participant records and the protection of participant confidentiality are both critical components of data sharing. This system has the following advantages:
- No PII ever leaves your computer.
- There is nothing about a GUID that would allow someone to infer the identity of the individual to whom it belongs.
- The same individual's information will result in the same GUID across time, location, and research study. This allows authorized researchers to match shared data from that participant regardless of source, without ever sharing or viewing PII.
Additional information can be found in the Security Controls section.
Investigators must strip all personally identifiable information (PII) before submitting any data to the NDA. GUIDs are used as identifiers for research participants and are required for all submitted data and are created using the NDA GUID Tool.
All of the following information is required to create a GUID:
- First Name
- Middle Name
- Last Name
- Date of Birth
- City/Municipality of Birth
This information should be recorded exactly as it appears on the birth certificate, to ensure that it does not change over the course of the participant’s life. If adequate information is not available to fully create a valid GUID, please view pseudoGUIDs for more information.
|What if I only have the middle initial or nickname?||Only full legal names as they appear on the birth certificate should be used. Initials, nicknames, or unknowns/blanks should not be used.|
|What if I don't know the middle name or they left "middle name" blank?||If you know that the participant has no legal middle name on their birth certificate, the tool allows you to specify that the person has no middle name. If you do not have the middle name, or if you do not know whether they have one, you cannot create a GUID.|
|What should I do with suffixes like Jr., III, etc.?||Suffixes such as these should be omitted from the names when creating a GUID.|
|Should I include state/country when entering the city/municipality of birth?||No. Only the city/municipality name as it appears on the birth certificate should be entered.|
If any of the information necessary to create a GUID is missing, a pseudoGUID can be created. This is a random ID that can be used as a placeholder where this information is not available, and "promoted" to a real GUID when the information is obtained at a future date.
To request pseudoGUIDs, users must have both NDA GUID Tool access and must submit a justification request to the NDA Help Desk explaining why GUIDs are not able to be generated. The Principal Investigator should submit the request for pseudoGUIDs. Requests are reviewed by NIMH on a case-by-case basis. NDA does not provision pseudoGUIDs to third-party NDA GUID Tool users.
Please review SOP-08 GUID Generation Permission Request for more information.
Create an NDA account and request NDA GUID Tool access by emailing the NDA Help Desk stating your purpose for needing access. This may be to create GUIDs for a project submitting data to an NDA data repository or to use GUIDs as subject identifiers in your own research community.
Your NDA GUID Tool credentials are the same as your NDA login credentials. If you do not remember your NDA credentials or need a password reset, please visit https://nda.nih.gov/reset_password.html. Users resetting their passwords should complete the password reset process and set a new permanent password before attempting to login to the NDA GUID Tool.
Using the NDA GUID Tool
Launch the NDA GUID Tool software using the appropriate executable found on the NDA Tools page.
Review the NDA GUID Tool User Manual below for installation instructions and a guide to using the NDA GUID Tool.
Last modified on Apr 29, 2021
GUIDs make it possible to match participants across labs and research data repositories. Every data structure in the NDA Data Dictionary includes this identifier (labeled as the element, "subjectkey"). Once a GUID has been generated for all participants enrolled in your study, the GUID, or the pseudoGUID, should be used to identify the same study participant throughout your project. It will be entered as the subject identifier data element “subjectkey,” which is required in every submission file. Once a project has integrated these preparatory concepts into its plan for enrollment and data collection, NDA user accounts can be created and used to access the project’s NDA Collection. Please view NDA Harmonization Standards for more information on data submission standards for NDA Collections.
Additionally, the GUID is used by researchers who are publishing the results of primary or secondary analyses on data shared through NDA to associate subjects to cohorts in an NDA Study. This allows a researcher to link publications directly to data in NDA (see NDA Study).
- Create GUIDs in Batch - The GUID interface requires double-entry making it useful when entering a few subjects at a time. For research sites that have already collected participant PII, the software can generate GUIDs for multiple subjects at a time using the GUID Batch Template linked to in the tool's interface and as described above.
- Promote pseudoGUID(s) -This will link a pseudoGUID to a standard GUID, essentially recognizing the two identifiers as the same individual, and is the method used to change a pseudoGUID into a full GUID once consent or the necessary PII have been obtained after the initial creation. Using this function allows you to specify a pseudoGUID along with the PII. Multiple pseudoGUIDs can be linked using the Promote pseudoGUID template. This feature is only accessible to authorized users who have demonstrated a real need. Please view pseudoGUIDs for more information.
The GUID algorithm developed in 2010 is a cryptographic hashing algorithm that creates unique hash codes based on the data commonly found on a person’s birth certificate. Hash codes generated from the same birth certificate data (personally identifying information or PII) are identical. The NDA GUID Tool uses an SHA-512 hashing algorithm to generate a one-way hash. This hash algorithm has not been cracked and should not be reversible (https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions).
The hashing algorithm is run inside the NDA GUID Tool software that researchers must run on their own computer. The PII input into the NDA GUID Tool never leaves the researcher’s local environment. Instead, one-way hashes are securely transmitted to the NDA cloud database, which uses the same secure transmission protocol (via a secure web service) to return random alphanumeric strings called GUIDs.
GUIDs and their matched hash codes are only ever stored in the secure database maintained by NDA’s security team in the Amazon cloud.
The hash codes generated by the algorithm depend on the spelling of the words (PII) entered, so there’s no way to account for/predict typos or data entry errors. To address possible data entry errors, it is possible to reconcile GUIDs at a later date.
Account sharing is a direct violation of any NDA user agreement.
Researchers with NDA data access who are not submitting data will not have access to the NDA GUID Tool. They are unable to use birth certificate data to generate GUIDs and identify themselves or others in the NDA database.
Researchers who are submitting data to NDA and have NDA data access require additional controls to ensure appropriate data use. NDA uses GUIDs to link data submitted from one laboratory to data from the same individual submitted from a second laboratory. In the case where two laboratories have measured data from the same research participant, both laboratories could use the NDA Query Tool to determine if the GUID in their study was submitted by the other laboratory. However, researchers in each laboratory would need to have approved NDA data access to access additional, individual-level information about that subject beyond what they measured. The researchers and their institution would have signed an NDA Data Use Certification for NDA data access, which states “If Recipients access data on individuals for whom they, themselves, have previously submitted data to the NIMH Data Archive, Recipients may gain access to more data about an individual participant than they, themselves, collected. Consequently, these research activities may be considered “human subjects research” within the scope of 45 C.F.R. 46. Recipients must comply with the requirements contained in 45 C.F.R. 46, as applicable, which may require Institutional Review Board (IRB) approval of the Research Data Use Statement.” Further, researchers outside of either of those laboratories will not have access to any PII in NDA.
Access to NDA data is governed by Data Access Committees for specific research use and requires that researchers and their institution sign an NDA Data Use Certification agreement and have an IRB review if appropriate. Any attempt to reidentify a subject in NDA is a direct violation of the Data Use Certification agreement.
NDA audits all operations performed by users accessing the NDA GUID Tool. Suspicious activity can be rapidly identified, and user access can be turned off immediately if appropriate. Users must log into the NDA GUID Tool for every new session.
|NDA Harmonization Standards||NDA Harmonization Approach|