The GUID

GUID Creation

Upon receipt of award, you will need to complete certain Data Harmonization Standards. One of these NDA Standards is the process of stripping all Personally Identifiable Information (PII) and translating that into a GUID. The GUID is a universal subject ID that allows researchers to share data specific to a study participant without exposing PII and makes it possible to match participants across labs and research data repositories. Every data structure in the NDA Data Dictionary includes this identifier (labelled as the element subjectkey). Additionally, the GUID is used by researchers publishing the results of primary or secondary analyses on data shared through NDA to associate subjects to cohorts in an NDA Study. This allows a researcher to link publications directly to data in NDA (see NDA Study).

The tool itself is a GUI or command line Java web-start application that you launch directly from the NDA website. The GUID Tool supports single subject data entry or bulk GUID generation. Email us at NDAHelp@mail.nih.gov for information on the command line tool.

To create a GUID requires an individual's legal name at birth, date of birth, sex, and city/municipality of birth. Because information on the birth certificate is constant over an individual's life, it is imperative to include the information as it appears on the birth certificate. Otherwise, a subject mismatch will occur if the research subject enrolls in other research studies and another source is used. When generating GUIDs for twin subjects, the Get GUIDs for Multiple Subjects function in the GUID Tool must be used as described below to prevent a false positive match.

If you are submitting data to NDA, you can check the box to request access to the GUID Tool when creating your account. Please contact us if you already have an account and need access to the GUID Tool.


NDA GUID Plan

The GUID Tool is a piece of software that accepts the personal information of study participants and uses it to create a series of hash codes. These codes are sent to our system and checked against the GUID database. If these codes have been seen before, that means the information matches an existing GUID, and this GUID is sent back. If no match is found, a new GUID is created and sent back. If someone else enters the same information later, the tool will detect this match and send back the same GUID. The GUID itself is a series of alpha-numeric characters. This system has the following advantages:

  1. No PII ever leaves your computer.
  2. There is nothing about a GUID that would allow someone to infer the identity of the individual to whom it belongs.
  3. The same individual's information will result in the same GUID across time, location, and research study. This allows researchers to match shared data from that participant regardless of source, without ever sharing or viewing PII.
  4. All the following information is required to create a GUID: first name, last name, middle name, sex. date of birth, and city/municipality of birth. In all cases, all this information should be obtained and entered as it appears on the birth certificate. Using the birth certificate ensures that this information cannot change throughout an individual's lifespan. If you have any questions about the GUID Tool or creation of a GUID, please contact the NDA Help Desk.

All personnel in your NDA Collection that are responsible for creating GUIDs must have access to the GUID tool under their own NDA user profile. Access to the GUID Tool does not require approval. The GUID tool can be self-selected in the user’s NDA Account Profile by checking the GUID Tool Access checkbox, which can be done here.  


 GUID Process

The other project consideration relevant prior to enrollment is the collection of the subject information necessary to take advantage of the NDA’s primary de-identification tool, the GUID.  Should the informed consents not support GUID creation (see below), or the data necessary to generate a GUID is not available, then the lab may consider using a random identifier called the Pseudo-GUID instead.  

This ability is critical to the added value of broad data sharing, so it is important to ensure that your enrollment process considers which information is necessary to create a GUID before you start the data collection process:

  • First name
  • Middle name
  • Last name
  • Date of birth
  • Sex
  • City/municipality of birth

In all cases, this information is needed as it appears on the birth certificate. The birth certificate is referenced as the authoritative source for all of this information, as this ensures it does not change through the individual's life. If any of this information has already changed since birth, the information should still be collected as it appears on the birth certificate.

Once a GUID has been generated for all participants enrolled in your study, it, or the Pseudo-GUID, should be used to identify the same study participant throughout your project. It will be entered as the subject identifier data element “subjectkey,” which is required in every submission file. Once a project has integrated these preparatory concepts into their plan for enrollment and data collection, NDA user accounts can be requested and used to access the project’s NDA Collection.