Naming convention
For the 6.0 release, the complete ABCD tabulated data resource has been recurated using a standardized naming convention. That means that all variables in the tabulated dataset have been renamed using the new convention. We acknowledge that this change will require some adjustments of existing analysis pipelines and might generally introduce some friction. We nevertheless hope that the recuration effort will benefit all users of the ABCD tabulated data resource going forward as the new data curation standard resolves a lot of inconsistencies that existed in previous releases and implements a clear structure that is easier to search and process across the whole dataset.
General design
dm_s_tab_item
Variable names are comprised of four main components that are separated by a single underscore:
- Domain
- Source/recipient
- Table
- Item
The table and item components can have additional subcomponents that are separated using multiple underscores to indicate nesting within the four main components.
Components
Domain
dm_s_tab_item
Domain: Keyword for the domain that the given variable belongs to. Domains in the core ABCD study have keywords with two letters while domains within the substudies (where “domain” refers to the substudy name) use more than two letters.
Source
dm_s_tab_item
Source/recipient: Keyword (one letter) for the source / recipient type that provided the data for the given variable.
Table
dm_s_tab_item
Table: Name of the table/form the given variable is a part of.
dm_s_tab__kw_item
Keyword: Keyword for a subsection / group of questions within the table the given variable is a part of (e.g. ph_y_meds__otc_001
for questions related to over the counter medications represented by the keyword otc
).
dm_s_tab__kw__kw_item
Additional keywords: Whenever a table has more levels of nesting/grouping, one or more additional keywords are added (e.g. ph_y_bp__dia__r01_001
uses a second keyword, r01
, to differentiate the first round of diastolic blood pressure readings, represented by the keyword dia
, from later rounds of readings).
A filterable keyword glossary for the ‘table’ component of all variables
Item
dm_s_tab_item
Item: A three-digit, zero-padded number, e.g. 001
, is used for all variables with the variable type “item”, i.e., typically individual questions in a questionnaire/table distinct from “administrative” variables or “summary scores” (see below).
dm_s_tab_admin
dm_s_tab_score
Administrative variables & summary scores: Administrative variables (e.g., language or date of administration) and summary scores (e.g., sums or means of individual items in a table) are marked by letters (e.g. dtt
, lang
, mean
,pc
(principle component) etc.) instead of the three-digit number used for variables of variable type “item” (see above).
dm_s_tab_item__subitem
Subitem: A two-digit, zero-padded number, e.g., 01
, is used to indicate a subitem’s relationship to the main item. This is used to indicate items that are dependent on previous questions through branching logic or to indicate another direct relationship between two questions (e.g. ab_p_demo__empl__prtnr_001
, “Does your partner work?”, has the follow up question ab_p_demo__empl__prtnr_001__01
, “Full or part-time?”; 001__01
is only presented if 001
is endorsed). Sometimes, variables have more than two levels of dependencies, in which case more than one level of subitems are used, e.g., 001__01__01
.
dm_s_tab_itema
Component: Indicator used to mark questions that have multiple components or to indicate two questions are the inverse of each other (e.g. When did the effects begin?, 001a
, and When did the effects end?, 001b
).
dm_s_tab_item__v01
Version: Indicator used to mark a new version of the same question/variable. Generally, questions with the same label are collapsed under one variable, even if they were collected under different variable names. The version indicator is only used in cases where a question has been replaced with a question that is very similar but has a somewhat distinct quality which necessitates to differentiate it from the original question (e.g., another version of the education variable was added to include additional response options after baseline).
dm_s_tab_item__subitem__v1
Subitem version: Indicator for a new, substantially different, version of a subitem question/variable.
dm_s_tab_item__l
Longitudinal marker: Indicator used for questions that are an exact replica of a question but have been slightly altered to account for the fact that the the question is being asked at a follow-up visit. Typically, this indicator is used in cases where the first time a question was asked, it referred to the lifetime up to that point, e.g., “Have you ever done X?”, while the version of the question asked at later visits refers to the time since the last time the question was asked, e.g., “Since we last saw you, have you done X?”.
dm_s_tab_item__tag
Tags: Tags are additional keywords appended to variable names to provide additional context or categorization. Variables may include one or more tags, separated by double underscores (e.g. tag __dk
indicate a “don’t know” response, __rmt
indicates a remote visit question in variable mh_y_cb_dev__rmt
)
dm_s_tab_item___1
Multi-select response options: Some variable names include triple underscores followed by a number (e.g.,in a question like “Which animals do you like? (check all that apply)”, variable names might include ___1
for “cats”, ___2
for “dogs”, and ___3
for “fish”. If a participant selects multiple options, each corresponding variable (e.g., dm_s_tab_item___1
, dm_s_tab_item___3
) will be marked to indicate the selected responses.
A filterable keyword glossary for the ‘item’ component of all variables
Glossary
Below you can find a searchable/filterable table with the complete glossary containing all keywords used in the ABCD naming convention or download it as a .csv
file.
A searchable and filterable keyword glossary for the complete ABCD glossary