Using the NDA GUID

The GUID, or Global Unique Identifier, is an alphanumeric code that is created by the NIMH Data Archive (NDA) GUID Tool and used as an identifier for a research participant. The GUID provides a secure mechanism to link research participants within and across research project datasets in NDA. The NDA GUID Tool and underlying algorithms were released in 2010 as the result of a collaboration between NIH, the Simons Foundation, and the informatics researcher community. Review the publication Using Global Unique Identifiers to Link Autism Collections (Johnson et al. 2010) for more information. Note that the GUID matching sensitivity has been reduced from the original design described in this paper.

The GUID itself is not personally identifiable information or protected health information.

Originally implemented to support the autism research community, the GUID is now available for other research communities needing a common subject identifier across research laboratories and repositories. Contact us at the NDA Help Desk for any questions regarding the GUID, the NDA GUID Tool, or to request the use of GUIDs in your research.

How does the NDA GUID Tool work?

  1. An authorized member of the research project team (user) requests and is approved for access to the GUID Tool.
  2. The user downloads the NDA GUID Tool.
  3. The user enters participant PII into the tool.
  4. In a local computation, the tool generates a series of one-way hash codes based on the PII entered, without the PII ever leaving your computer.
  5. The hash codes are encrypted and securely sent to the GUID system at NDA.
  6. If the hash codes match an existing hash code, the GUID associated with that hash code is sent back to the researcher. The GUID is an alphanumeric code that is randomly and persistently linked to the hash codes within the secure NDA GUID system and cannot be traced back to the PII entered by the research project team member.
  7. If the hash codes do not match an existing hash code in the NDA GUID system, a new GUID is created and sent back.

With the NDA GUID Tool, the same participant information will return the same GUID whenever or wherever it is entered. This allows NDA to link participant data records across time and locations, without ever receiving identifying information. The ability to link participant records and the protection of participant confidentiality are both critical components of data sharing. This system has the following advantages:

  • No PII ever leaves your computer.
  • There is nothing about a GUID that would allow someone to infer the identity of the individual to whom it belongs.
  • The same individual's information will result in the same GUID across time, location, and research study. This allows authorized researchers to match shared data from that participant regardless of source, without ever sharing or viewing PII.

Additional information can be found in the Security Controls section.

Back to top of page

 

What information will I need to create a GUID?

Investigators must strip all personally identifiable information (PII) before submitting any data to the NDA. GUIDs are used as identifiers for research participants and are required for all submitted data and are created using the NDA GUID Tool.

All of the following information is required to create a GUID:

  • First Name
  • Middle Name
  • Last Name
  • Sex
  • Date of Birth
  • City/Municipality of Birth

This information should be recorded exactly as it appears on the birth certificate, to ensure that it does not change over the course of the participant’s life. If adequate information is not available to fully create a valid GUID, please view pseudoGUIDs for more information.

Back to top of page

 

Below is a table with some commonly encountered questions regarding the specifics of GUID creation.

Question Answer
What if I only have the middle initial or nickname? Only full legal names as they appear on the birth certificate should be used. Initials, nicknames, or unknowns/blanks should not be used.
What if I don't know the middle name or they left "middle name" blank? If you know that the participant has no legal middle name on their birth certificate, the tool allows you to specify that the person has no middle name. If you do not have the middle name, or if you do not know whether they have one, you cannot create a GUID.
What should I do with suffixes like Jr., III, etc.? Suffixes such as these should be omitted from the names when creating a GUID.
Should I include state/country when entering the city/municipality of birth? No. Only the city/municipality name as it appears on the birth certificate should be entered.
What should I do if a participant identifies as non-binary? When creating GUIDs, you should always use the information listed on the most recent birth certificate. Therefore, you should use the assigned sex at birth when a participant identifies as non-binary.
What should I do if a participant was born intersex? The information listed on the participant's most recent birth certificate should be used when creating GUIDs. If the birth certificate lists neither male nor female, pseudoGUIDs will need to be created. Please view pseudoGUIDs for how to request pseudoGUIDs.
What should I do if a participant is transgender? The information listed on the participant's most recent birth certificate should be used when creating GUIDs. If their birth certificate has not yet been officially updated, then you must use the information listed on the current birth certificate.
I promoted pseudoGUIDs to real GUIDs. Do I have to resubmit my data? If you promote pseudoGUIDs to real GUIDs using the NDA GUID Tool, you don't need to resubmit your data. The NDA GUID Tool automatically promotes pseudoGUIDs to real GUIDs.

Back to top of page

 

pseudoGUIDs

If any of the information necessary to create a GUID is missing, a pseudoGUID can be requested. This is a random ID that can be used as a placeholder for a real GUID, and "promoted" to a real GUID when the information is obtained at a future date.

Before pseudoGUIDs can be requested, you need:

  • Access to an active NDA Collection,
  • The PI must have submitted the Data Submission Agreement (DSA) to the Collection,
  • Active NDA GUID Tool access.

To request pseudoGUIDs, the Principal Investigator (PI) should email the NDA Help Desk with the following:

  • The NDA Collection number related to the request,
  • An explanation of why real GUIDs cannot be generated for the research project,
  • The number of pseudoGUIDs needed,
  • Copy (cc) the NIH Program Official and the institutional Signing Official (SO).

Requests are reviewed by NIMH on a case-by-case basis. NDA does not provide pseudoGUIDs to third-party NDA GUID Tool users. For more details, see SOP-08 GUID Generation Permission Request.

Back to top of page

 

How do I use the NDA GUID Tool?

Get Access

  1. Create an NDA account
  2. Obtain at least submission-level permissions.
  3. Request NDA GUID Tool access by emailing the NDA Help Desk stating your purpose for needing access.
  4. NDA Staff will provide you NDA GUID tool access once they confirm you have the appropriate level permissions.

Using the NDA GUID Tool

Launch the NDA GUID Tool software using the appropriate executable found on the NDA Tools page.

Review the NDA GUID Tool User Manual below for installation instructions and a guide to using the NDA GUID Tool.

NDA GUID Tool User Manual    
Last modified May, 2024

Back to top of page

 

Data Submission with GUIDs

GUIDs make it possible to match participants across labs and research data repositories. Every data structure in the NDA Data Dictionary includes this identifier (labeled as the element, "subjectkey"). Once a GUID has been generated for all participants enrolled in your study, the GUID, or the pseudoGUID, should be used to identify the same study participant throughout your project. It will be entered as the subject identifier data element “subjectkey,” which is required in every submission file. Once a project has integrated these preparatory concepts into its plan for enrollment and data collection, NDA user accounts can be created by first logging into NDA using one of the three Research Auth Service (RAS) identities. NDA accounts are used to access the project’s NDA Collection. Please view NDA Harmonization Standards for more information on data submission standards for NDA Collections.

Additionally, the GUID is used by researchers who are publishing the results of primary or secondary analyses on data shared through NDA to associate subjects to cohorts in an NDA Study. This allows a researcher to link publications directly to data in NDA (see NDA Study).

Back to top of page

Additional GUID Features and Functions

  • Create GUIDs in Batch - The GUID interface requires double-entry making it useful when entering a few subjects at a time. For research sites that have already collected participant PII, the software can generate GUIDs for multiple subjects at a time using the GUID Batch Template linked to in the tool's interface and as described above.
  • Promote pseudoGUID(s) -This will link a pseudoGUID to a standard GUID, essentially recognizing the two identifiers as the same individual, and is the method used to change a pseudoGUID into a full GUID once consent or the necessary PII have been obtained after the initial creation. Using this function allows you to specify a pseudoGUID along with the PII. Multiple pseudoGUIDs can be linked using the Promote pseudoGUID template. This feature is only accessible to authorized users who have demonstrated a real need. Please view pseudoGUIDs for more information.

Back to top of page

 

Security Controls

Technical Controls

The GUID algorithm developed in 2010 is a cryptographic hashing algorithm that creates unique hash codes based on the data commonly found on a person’s birth certificate. Hash codes generated from the same birth certificate data (personally identifying information or PII) are identical. The NDA GUID Tool uses an SHA-512 hashing algorithm to generate a one-way hash. This hash algorithm has not been cracked and should not be reversible ( https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions ).

The hashing algorithm is run inside the NDA GUID Tool software that researchers must run on their own computer. The PII input into the NDA GUID Tool never leaves the researcher’s local environment. Instead, one-way hashes are securely transmitted to the NDA cloud database, which uses the same secure transmission protocol (via a secure web service) to return random alphanumeric strings called GUIDs.

GUIDs and their matched hash codes are only ever stored in the secure database maintained by NDA’s security team in the Amazon cloud.

The hash codes generated by the algorithm depend on the spelling of the words (PII) entered, so there’s no way to account for/predict typos or data entry errors. To address possible data entry errors, it is possible to reconcile GUIDs at a later date.

guid-diagram.png

Administrative Controls

Only two types of NDA users are given access to the NDA GUID Tool: (1) researchers submitting data to an NDA collection and (2) researchers who are not submitting data to NDA but have been approved by the NDA Program Lead to use the Tool after a third-party risk assessment that includes institutional sign-off on the NDA GUID Tool Terms of Use.

Account sharing is a direct violation of any NDA user agreement.

Researchers with NDA data access who are not submitting data will not have access to the NDA GUID Tool. They are unable to use birth certificate data to generate GUIDs and identify themselves or others in the NDA database.

Researchers who are submitting data to NDA and have NDA data access require additional controls to ensure appropriate data use. NDA uses GUIDs to link data submitted from one laboratory to data from the same individual submitted from a second laboratory. In the case where two laboratories have measured data from the same research participant, both laboratories could use the NDA Query Tool to determine if the GUID in their study was submitted by the other laboratory. However, researchers in each laboratory would need to have approved NDA data access to access additional, individual-level information about that subject beyond what they measured. The researchers and their institution would have signed an NDA Data Use Certification for NDA data access, which states “If Recipients access data on individuals for whom they, themselves, have previously submitted data to the NIMH Data Archive, Recipients may gain access to more data about an individual participant than they, themselves, collected. Consequently, these research activities may be considered “human subjects research” within the scope of 45 C.F.R. 46. Recipients must comply with the requirements contained in 45 C.F.R. 46, as applicable, which may require Institutional Review Board (IRB) approval of the Research Data Use Statement.” Further, researchers outside of either of those laboratories will not have access to any PII in NDA.

Access to NDA data is governed by Data Access Committees for specific research use and requires that researchers and their institution sign an NDA Data Use Certification agreement and have an IRB review if appropriate. Any attempt to reidentify a subject in NDA is a direct violation of the Data Use Certification agreement.

NDA audits all operations performed by users accessing the NDA GUID Tool. Suspicious activity can be rapidly identified, and user access can be turned off immediately if appropriate. Users must log into the NDA GUID Tool for every new session.

Back to top of page

 

Contribute Data NIH Guide Notices NDA Sharing Regimen Data Submission Agreement Setting Up Data Expected Exceptions to Data Submission NDA Data Standards NIMH Common Data Elements Using the NDA GUID NDA Harmonization Approach Data QA Reporting Submit Data From a Paper Collection Closeout