Data Structure
Name, Type, Required, Description, Value Range, Notes
Name | Type | Required | Description | Value Range | Notes |
---|---|---|---|---|---|
subjectkey | GUID | Required | The NDAR Global Unique Identifier (GUID) for research subject | NDAR* | |
src_subject_id | String | Required | Subject ID how it's defined in lab/project | ||
interview_date | Date | Required | Date on which the interview/genetic test/sampling/imaging/biospecimen was completed. MM/DD/YYYY | ||
interview_age | Integer | Required | Age in months at the time of the interview/test/sampling/imaging. | 0::1440 | Age is rounded to chronological month. If the research participant is 15-days-old at time of interview, the appropriate value would be 0 months. If the participant is 16-days-old, the value would be 1 month. |
sex | String | Required | Sex of subject at birth | M;F; O; NR | M = Male; F = Female; O=Other; NR = Not reported |
experiment_id | Integer | Required | ID for the Experiment/settings/run | ||
cellid | String | Recommended | Unique identifier of cells | ||
samplesubtype | Integer | Recommended | Subtype of sample, whether it is cells, nuclei, or bulk | 1::3 | 1=Cell; 2=Nucleus; 3=Bulk |
libraryid | String | Recommended | Library ID as provided by the lab | ||
gen_software | String | Recommended | Name of the software used on sequencing platform | ||
softwareversion | String | Recommended | Version number of the software used | ||
referenceset | Integer | Recommended | A set of references (e.g., canonical assembled contigs) which defines a common coordinate space for comparing reference-aligned experimental data | 1::5; -99 | 1=1000 Genomes phase 3; 2=GRCh38; 3=GRCh37; 4=MMUL1.0; 5=HRC; -99=Other |
otherreferenceset | String | Recommended | A set of references (e.g., canonical assembled contigs) which defines a common coordinate space for comparing reference-aligned experimental data | ||
librarybatch | String | Recommended | Batch library was prepared in | ||
sequencingbatch | String | Recommended | Batch library was sequenced in | ||
libraryselection | Integer | Recommended | The general strategy by which the library was prepared | 1::37; -88; -99 | 1=Random; 2=PCR Random PCR; 3=RT-PCR; 4=HMPR; 5=MF; 6=repeat fractionation; 7=size fractionation; 8=MSLL; 9=cDNA; 10=cDNA random priming; 11=cDNA oligo-dT; 12=PolyA; 13=Oligo-dT; 14=Inverse rRNA; 15=ChIP; 16=MNase; 17=DNase; 18=Hybrid selection; 19=Reduced representation; 20=Restriction digest; 21=5-methylcytidine antibody; 22=MBD2 protein methyl-CpG binding domain; 23=CAGE; 24=RACE; 25=MDA; 26=padlock probes capture method; 27=cell hashing; 28=DHA library construction; 29=EndItDNAEndRepairKit; 30=KapaHyperPrep; 31=IncRNAenrichment; 32=MULTIseq; 33=PCR-free; 34=rRNA depletion; 35=SPLITseq; 36=STARRseq; 37=SureCell; -99=Other; -88=Unspecified |
libraryconstructionprotocol | Integer | Recommended | Method used to construct the sequence library | 1::31; -99 | 1=SMART-seq; 2=SMART-seq2; 3=SMART-seq3; 4=SMART-seq4; 5=STRT-seq; 6=STRT-seq-C1; 7=STRT-seq-2i; 8=Quartz-seq; 9=Quartz-seq2; 10=CEL-seq; 11=CEL-seq2; 12=10x Chromium Single Cell 3' V3 FeatureBarcoding; 13=10x Chromium Single Cell 3' V2 and V3 GE; 14=10x Chromium Single Cell 3' V1 GE; 15=10x Chromium Single Cell 5' VDJ; 16=10x Chromium Single Cell 5' GE; 17=SureCell 3' WTA for ddSEQ; 18=MARS-seq / MARS-seq2.0; SCRB-seq / mcSCRB-seq; 18=Drop-seq / Seq-Well; 19=scifi-RNA-seq; 20=Microwell-seq; 21=BD Rhapsody; 22=sci-RNA-seq3; 23=sci-RNA-seq; 24=Seq-Well S3; 25=Tang 2009; 26=SPLiT-seq; 27=inDrop; 28=NEBNext; 29=NexteraXT; 30=Omni-ATAC; 31=10x Visium Gene Expression protocol; -99=Other |
otherlibraryconstructprotocol | String | Recommended | Other type of library construction protocol not listed | ||
librarysource | Integer | Recommended | The type of source material that is being sequenced | 1; 2; -99 | 1=Genomic; 2=Genomic single cell; -99=Other |
otherlibrarysource | String | Recommended | Other library source not listed | ||
readlength | Float | Recommended | The length of the read | ||
librarylayout | Integer | Recommended | If the library is paired-end or single-end | 1; 2 | 1=Single; 2=Paired-end |
totalreads | Integer | Recommended | Total number of sequencing reads from a library | ||
numbercells | Integer | Recommended | Number of cells or nuclei sequenced | ||
readstrandorigin | Integer | Recommended | The strand from which the read originates in a strand-specific protocol | 1::4; 999 | 1=Forward; 2=Reverse; 3=Unstranded; 4=First-strand; 999=Missing |
isstranded | Integer | Recommended | Whether or not the library is stranded. | 0; 1 | 0=No; 1=Yes |
libraryversion | String | Recommended | Library Version: for example, rnaSeq 10x library version | ||
validbarcodereads | String | Recommended | Fraction of reads with cell barcodes that match the whitelist | ||
mediangenes | Integer | Recommended | The median number of genes detected (with nonzero UMI counts) across all cell-associated barcodes | ||
medianumis | Integer | Recommended | The median number of total Unique Molecular Identifier (UMI) counts across all cell-associated barcodes | ||
gen_rin | Float | Recommended | RNA Integrity Number | ||
rnabatch | String | Recommended | Batch in which RNA sample was isolated | ||
ribozero_batch | String | Recommended | Ribozero treatment batch | ||
data_file1 | File | Recommended | Data file 1 | ||
data_file1_type | String | Recommended | type of data file 1 | ||
hcdi_tissue | Integer | Recommended | Organ or tissue of origin for cells if not listed | 1::76; -77; -88; -99 | 1=Amygdala; 2=Amygdaloid complex; 3=Anterior cingulate cortex; 4=Blood; 5=Bone marrow; 6=Buccal Mucosa; 7=Buffy Coat; 8=Caudate nucleus; 9=Cecum derived fecal material; 10=Cerebellar cortex; 11=Cerebellum; 12=Cerebral cortex; 13=Cortical plate; 14=Dorsal pallium; 15=Dorsal Root Ganglion; 16=Dorsolateral prefrontal cortex; 17=Dorsomedial prefrontal cortex; 18=Embryonic tissue; 19=Fecal material; 20=Forebrain; 21=Frontal cortex; 22=Frontal lobe; 23=Frontal pole; 24=Hippocampus; 25=Head of caudate nucleus; 26=Inferior frontal gyrus; 27=Inferior temporal cortex; 28=Inferior temporal gyrus; 29=Inferolateral temporal cortex; 30=Insular cortex; 31=Left cerebral hemisphere; 32=Livermedial dorsal nucleus of thalamus; 33=Medial frontal cortex; 34=Medial ganglionic eminence; 35=Medial orbital frontal cortex; 36=Medial prefrontal cortex; 37=Meninges; 38=Midbrain; 39=Middle frontal gyrus; 40=Middle temporal gyrus; 41=Nerve tissue; 42=Nucleus accumbens; 43=Occipital lobe; 44=Occipital visual cortex; 45=Olfactory neuroepithelium; 46=Orbitofrontal cortex; 47=Parahippocampal gyrus; 48=Parietal cortex; 49=Plasma; 50=Posterior cingulate cortex; 51=Posteroinferior parietal cortex; 52=Posterior inferior parietal cortex; 53=Posterior superior temporal cortex; 54=Precentral gyrus; 55=Prefrontal cortex; 56=Primary auditory cortex; 57=Primary motor cortex; 58=Primary somatosensory cortex; 59=Primary tumor; 60=Primary visual cortex; 61=Putamen; 62=Right cerebral hemisphere; 63=Serum; 64=Splenocyte; 65=Striatum; 66=Subgenual anterior cingulate cortex; 67=Subgenual cingulate cortex; 68=Superior parietal lobe; 69=Superior temporal gyrus; 70=Temporal cortex; 71=Temporal pole; 72=Thalamus; 73=Ventricular zone; 74=Ventrolateral prefrontal cortex; 75=VZ/SVZ; 76=Whole brain; -77=Unspecified; -88=Not Applicable; -99=Other |
dlpfc_rna_isola_prepoperator | String | Recommended | Operator for DNA Isolation Preparation | ||
flowcell_batch | String | Recommended | Multiplex batch | ||
flowcell_lane_a | String | Recommended | Flowcell lane | ||
flowcell_lane_b | String | Recommended | Flowcell lane | ||
flowcell_name | String | Recommended | flowcell identifier | ||
hemisphere | Float | Recommended | Hemisphere: 1 right / 0 left | 0;1;999 | 0=Left; 1=Right; 999=Unknown |
rat280 | Float | Recommended | Ratio of absorbance at 260nm and 280nm | ||
sample_id_biorepository | String | Recommended | Biorepository Sample ID | ||
psych_enc_exclude_reason | String | Recommended | Reason subject was excluded from study | ||
flowcell_2 | String | Recommended | CMC_HBCC-specific flowcell identifier | ||
flowcell_given_to_core | String | Recommended | Flowcell identifier | ||
flowcell_id | String | Recommended | Flowcell identifier | multiple flowcells separated by :: | |
flowcell_name_2 | String | Recommended | CMC_HBCC-specific flowcell name | ||
study | String | Recommended | Study; The code for each individual study | ||
brodmann_area | String | Recommended | A segmentation of the cerebral cortex on the basis of cytoarchitecture | ||
psych_enc_exclude | Integer | Recommended | Subject excluded from study | 0;1;999 | 0=No; 1=Yes; 999=Missing |
ercc_added | Integer | Recommended | Indicates if ERCC spike-in pools were added to the background RNA | 0;1; 999 | 0=No; 1=Yes; 999=missing |
librarykit | Integer | Recommended | Illumina kit catalogue number | 0;1; 999 | 0=Illumina RS-122-2301; 1=Kapa Hyper Prep Kit; 999=missing |
librarytype | String | Recommended | The type of library, in assays where samples are barcoded or hashed for multiplexing or each sample has multiple libraries amplified separately before pooled sequencing. | ||
mappedreads_multimapped | Integer | Recommended | Mapped reads multimapped | ||
mappedreads_primary | Integer | Recommended | Mapped reads primary | ||
nucleicacidsource | Integer | Recommended | Subtype of sample, whether it is bulk cell, bulk nuclei, single cell, single nucleus, sorted cells, or sorted nuclei. | 0::5; 999 | 0=bulk cell; 1=bulk nuclei; 2=single cell; 3=single nucleus; 4=sorted cells; 5=sorted nuclei; 999=missing |
readlength_max | Float | Recommended | The maximum length of the read | ||
readlength_min | Float | Recommended | The minimum length of the read | ||
rnaseqid | String | Recommended | ID for RNAseq Assay | ||
rrnarate | Integer | Recommended | rRNA Rate | 0::11; 999 | 0=0.00%; 1=0.01%; 2=0.02%; 3=0.03%; 4=0.04%; 5=0.05%; 6=0.06%; 7=0.11%; 8=DNE; 9=average; 10=max; 11=std; 999=missing |
samplebarcode | String | Recommended | The nucleotide sequence of the sample barcode used to identify cells from a single sample in cell hashing or multiplexing assays. | ||
sequencingplatform | Integer | Recommended | Sequencing platform | 0::2; 999 | 0=HiSeq2000; 1=HiSeq2500; 2=HiSeq4000; 999=Missing |
tissuestate | Integer | Recommended | State of tissue preservation | 0;1; 999 | 0=Flash frozen chunk; 1=Flash frozen chunk then isolated in trizol; 999=missing |
celltype | String | Recommended | Cell type | ||
externalreference | String | Recommended | External Data Reference | ||
filename | String | Recommended | File name | ||
library_prep_batch | String | Recommended | Sequencing library batch | ||
platform | String | Recommended | Name of particular experiment platform | ||
assay | Integer | Recommended | The technology used to generate the data in this file | 0::15; 999 | 0=ATACSeq; 1=CUT(and)Tag; 2=ChIPSeq; 3=GO-CaRT; 4=HI-C; 5=RNA-seq; 6=TMT quantitation; 7=bisulfiteSeq; 8=errBisulfiteSeq; 9=label free mass spectrometry; 10=methylationArray; 11=mirnaSeq; 12=oxBS-Seq; 13=scrnaSeq; 14=snpArray; 15=wholeGenomeSeq; 999=Missing |
hcdi_organ | Integer | Recommended | Organ of origin for cells if applicable | 1::19; -99 | 1=Lymph node; 2=Kidney; 3=Skin; 4=Mammary gland; 5=Nerves; 6=Brain; 7=Blood; 8=Breast; 9=Colon; 10=Lung; 11=Prostate; 12=Pancreas; 13=Ovary; 14=Spleen; 15=Bone marrow; 16=Bursa Of Fabricius; 17=Nose; 18=Cerebrospinal fluid; 19=Liver; -99=Other |
ethnicity | String | Recommended | Ethnicity of participant | Hispanic or Latino; Not Hispanic or Latino; Unknown | |
psych_enc_datatype | Integer | Recommended | RNA data type | 0;1;999 | 0=lncRNA; 1=mRNA; 999=missing |
rna_type | Integer | Recommended | RNA type | 0;1;999 | 0=rRNA-depleted; 1=total RNA; 999=missing |
sequencing_assay | Integer | Recommended | Sequencing assay | 0::2;999 | 0=IsoSeq; 1=RNA-Seq; 2=SeqCap; 999=missing |
submission_file_name | String | Recommended | Submission file name | ||
file_status | Integer | Recommended | Status of files | 0;1;999 | 0=Processed; 1=Raw; 999=Missing |
data_file5_type | String | Recommended | data file 5 type/description | ||
data_file5 | File | Recommended | Data file 5 | ||
visium_protocol_version | String | Recommended | Protocol version of the visium slides used | Ex V1, V2, or other | |
loupe_version | String | Recommended | Loupe alignment version | ||
fetal_age | Integer | Recommended | Age of the fetus (i.e., gestational age) in days | 0::322 | |
fetal_age_type | Integer | Recommended | Type of gestational age used for fetal_age | 1;2 | 1 = Postovulatory gestational age, defined as days since the last ovulation, expected term of 266 days; 2 = Postmenstrual gestational age, days since the last menstrual period, expected term of 280 days |
differentiationdaysinculture | Integer | Recommended | Number of days in culture |