NIMH Data Archive Post-Submission QA/QC
The NDA ensures all data are harmonized to a common definition prior to submission. This validation ensures that all data conforms. Post submission, Quality Assurance and Quality Control (QA/QC) checks are run across records ensuring basic information is accurate. Results of these checks are then collated by NDA Collection and sent to those that submitted the data, typically data managers, so that any issues found may be fixed prior to the next scheduled submission. The email provides a link to a report containing a Summary and Detailed QA/QC issues. This page provides information on what checks are made and when found, how labs submitting those data may interpret and resolve the issues found.
In addition to the QA/QC reports, those with Admin access to the NDA collection may also download records identified with errors by navigating to their Collection, clicking on the Submissions tab, and then selecting the "Submissions with QA Errors" radio button to see if any submissions contain errors (authentication is required). Then, by adding specific error types to your filter cart and downloading them, you can more easily see the records in error by viewing only the affected submissions. You'll have to review all data structures to see where that error is discrepant with other data previously submitted.
The NIMH Data Archive (NDA) Data Submission Agreement (DSA) terms and conditions (clause 3) state submitters agree that all submitted data have been de-identified so that the identities of subjects cannot be readily ascertained or otherwise associated with the data by the NIMH Data Archive staff or secondary data users (https://nda.nih.gov/about/standard-operating-procedures.html#sop5).
When submitting neuroimaging data, NDA often discovers PII in the header fields of DICOM or NIIFTI files. It is helpful to work with your scanner technician to understand their normal process for entering information at the time of data collection, as this can give you hints about where information is most likely to be entered by mistake or fields that will only occasionally contain data but should still be checked.
NDA’s recommended approach when submitting these types of files is to inspect the header information before submitting image files and cleaning only the fields that contain the PII to ensure that all data is de-identified. Please inspect all fields before submitting images as PII is sometimes entered in non-standard fields either by accident or by local convention. Wiping all the header information before submitting can limit the ability of other researchers to conduct secondary analyses using the data submitted.
A variety of third-party software tools are available to facilitate this process. Dr. Chris Rorden, at the University of South Carolina, has compiled a list of stand-alone tools that are free to use. Moreover, below is a list of libraries or built-in functions available for interacting with DICOM files in most common programming languages:
- Java https://github.com/dcm4che/dcm4che
- Julia https://github.com/JuliaHealth/DICOM.jl
- MATLAB https://www.mathworks.com/help/images/ref/dicominfo.html
- Python https://pydicom.github.io/
Use the NDA Validation and Upload Tool to submit corrected data files. All data quality issues must be corrected promptly to avoid delays with the next submission cycle.
If you already submitted new data before correcting your QA issues:
- Please notify the NDA Help Desk.
- Correct the identified QA errors.
- Resubmit all data again.
If your submissions involve associated files, which do not need replacing, please contact the NDA Help Desk for assistance from our Data Curation team.
Please view our How to Provide Corrected Files guide to see full instructions on how to resubmit your corrected data.
Downloads of submissions with potential QA issues through this method will automatically include a QA_into.txt file with a report on these issues.
The goal is to resolve these issues as soon as possible helping ensure that accurate data are shared in a timely manner. Feel free to contact us if you require additional information or assistance in resolving these outstanding issues (NDAHelp@mail.nih.gov).
The following summary discrepancies are provided in the NDA QA Report.
MISSING SUBJECTS FOR DATA STRUCTURE PROVIDED
Clinical data are expected to be submitted cumulatively. This error is returned when the percentage of submitted subjects are not included in subsequent submissions. Please note, this rule checks subjectkey, age, date combinations and forgives resubmissions that are off by +/-2 months. Therefore it is possible that a subject was resubmitted but was flagged in the QA process because that subject was not resubmitted for that age date combination within the data structure of interest.
- Suggested Action: Review the submission to confirm all subjects collected so far were included in your cumulative submission. If not, reupload your cumulative submission. Note that neurosignaling and omics subject data are excluded from this report as those data are only expected to be submitted once.
MISSING DATA STRUCTURE
Clinical data are expected to be submitted cumulatively. This error indicates a structure submitted previously is not found in subsequent submissions. You will be provided with the shortname of the structure in question.
- Suggested Action: Confirm all cumulative data was submitted. Upload missing structures or confirm with NDA to resolve.
The following errors are available in the QA/QC report provided. However, these issues may also be added to your filter cart and downloaded helping you identify and resolve the issue. To download, go to your collection, the submission tab (authentication is required to see the submission tab) and select "QA Submissions with errors" radio button and then add the submission(s) to the NDA filter cart.
MISCALCULATED AGE - INTERVIEW DATE INCONSISTENT WITH AGE
This error indicates that multiple ages (in months) are listed for a given subject for the same relative interview_date. It is very important that age be consistent across measures.
- Suggested Action: Review the ages (in months) recorded for the subjects across the data structures noted on the error report. Ensure that the correct age in recorded. Age in months should be determined using the following rule: If <=15 days into the next month, round the age down in months. If >15 days into the next month, round the age up to the next month. Once records have been corrected, resubmit the data.
MISCALCULATED AGE - INTERVIEW AGE INCONSISTENT WITH DATE
This error indicates that multiple dates are listed for a given subject for the same relative interview_age. It is very important that age is consistent across measures.
- Suggested Action: Review the ages (in months) recorded for the subjects across the data structures noted on the error report. Ensure that the correct age is recorded. Age in months should be determined using the following rule: If <=15 days into the next month, round the age down in months. If >15 days into the next month, round the age up to the next month. Once records have been corrected, resubmit the data.
INCONSISTENT SEX PROVIDED
This error indicates that a given subject has existing records with the sex labeled with more than one valid sex value (e.g., male (M) in one instance and female (F) in another).
- Suggested Action: Review the source documents for the correct sex. Record the appropriate code (male (M), female (F) ) on all subject records locally and resubmit. If the change in sex is a valid change, note this information in the Supporting Documentation section of the Collection.
INCONSISTENT GUID FOR SAME SRC_SUBJECT_ID
This error indicates that a src_subject_id (locally used ID) within a data set is ascribed to multiple GUIDs. A SRC_SUBJECT_ID and a GUID should refer to a unique individual consistently across measures.
- Suggested Action: Review the source documents for the subject’s correct GUID. If necessary, use the GUID Tool to validate the GUID. Correct data locally and resubmit.
INCONSISTENT SRC_SUBJECT_ID FOR SAME GUID
This error indicates that a given GUID has multiple src_subject_ids (locally used IDs) within a single data set. A SRC_SUBJECT_ID and a GUID should refer to a unique individual consistently across measures.
- Suggested Action: Review the source documents for the subject’s correct src_subject_id (locally used ID). Correct data locally and resubmit. This change in the data should be documented and uploaded to the Supporting Documentation in the Collection.
SEX VALUE THRESHOLD WARNING
This warning indicates that the percentage of the other (O) and/or not reported (NR) values for the data element 'sex' exceeds acceptable thresholds for on NDA Collection. The acceptable threshold for the value other (O) is 25% and the acceptable threshold for the value not reported (NR) is 50%. NDA strongly encourages data submitters to review datasets that exceed these thresholds to ensure that data are accurately reported. Please note, when this warning occurs, submission can continue without further action.
- Suggested Action: Review the data to confirm that the values reported in the sex element are accurate or update the data accordingly.
DUPLICATE RECORDS WITHIN A SUBMISSION
This error indicates identical data has been submitted for a subject within a dataset.
- Suggested Action: Identify the duplicate subject using the SUBMITTED_SRC_SUBJECT_ID value, and remove the duplicate record from the dataset and resubmit.
|NDA Harmonization Approach||Submit Data from a Paper|