ABCD (General)
Domain overview
“ABCD (General)” which provides standard variables, as well as demographic information and surveys. In this new domain we introduce two new tables: ab_g_dyn
and ab_g_stc
. These tables use a new source g
(for General), introduced specifically for these summary and administrative fields. The data in these tables are often derived or computed from multiple sources, including REDCap timestamps, research administrator (RA) input, and participant responses (youth and parent). The variables in these tables are cohort descriptors & useful nesting variables–participant demographic information, visit characteristics (site, date, etc), relational variables (family ID, principal genetic ancestry), etc–that are useful for cohort characterization and modeling. We anticipate that these tables will be of great importance for most research questions using ABCD data.
Parent tables
Demographics
Measure description: Extensive lists of parent-reported demographics about the youth, themselves, and family. Note that this table includes the American Community Survey (ACS) Ranked Propensity score that imputed ranked propensity weight. The ranked propensity weight merges the ACS and ABCD data (with missing data imputed), estimates the propensity model, computes and scales/trims the propensity weights and finally rakes the scaled weights to final ACS control totals by age, sex, and race/ethnicity.
ab_p_demo__income__hhold_001
, ab_p_demo__income__hhold_001__v01
Researchers should carefully evaluate whether "Do Not Know"
or "Decline to Answer"
responses (coded as 777
and 999
) are ignorable or informative when deciding how to handle them in analyses, as they correlate with other key demographic variables across study visits (see Table below).
Current evidence suggests not treating these responses as simply missing. Researchers are encouraged to explore how these response types relate to other variables and consider potential impacts on findings. Not accounting for these responses appropriately could introduce bias and compromise validity of results.
The table below shows that these response types correlate with other important demographic variables across the first four full study visits (Baseline through Three year follow-up) and the partial four year follow-up visit. For instance, participants from households that responded “Do not know” were disproportionately Hispanic and from households with less than some college education. Participants from households that responded “Decline to answer” were disproportionately White and from households with at least a bachelor’s degree. These responses may reflect contextual, cultural, or social factors, such as who manages household finances or potential language barriers, rather than lack of financial knowledge.
Distribution of Income Categories and Non-Response Patterns Across Study Visits by Key Demographics
<50K | ≥50K & <100K | ≥100K | Do not know | Decline to answer | |
Study Visit | |||||
Baseline (n=11,868) | 3222 (25.2) | 3068 (23.7) | 4561 (23.4) | 504 (25.8) | 511 (25.6) |
1 year (n=11,220) | 2932 (22.9) | 2940 (22.7) | 4432 (22.8) | 454 (23.3) | 461 (23.1) |
2 year (n=10,973) | 2859 (22.3) | 2893 (22.3) | 4343 (22.3) | 428 (21.9) | 450 (22.5) |
3 year (n=10,336) | 2608 (20.4) | 2745 (21.2) | 4193 (21.5) | 391 (20.1) | 399 (20.0) |
4 year (n=4,754) | 1171 (9.2) | 1300 (10.0) | 1933 (9.9) | 173 (8.9) | 177 (8.9) |
Child race/ethnicity | |||||
White | 3072 (24.0) | 7686 (59.4) | 14323 (73.6) | 496 (25.4) | 716 (35.8) |
Black | 3829 (29.9) | 1298 (10.0) | 708 (3.6) | 502 (25.7) | 472 (23.6) |
Hispanic | 4301 (33.6) | 2540 (19.6) | 1807 (9.3) | 705 (36.2) | 496 (24.8) |
Asian | 114 (0.9) | 207 (1.6) | 594 (3.1) | 48 (2.5) | 92 (4.6) |
Other | 1476 (11.5) | 1215 (9.4) | 2027 (10.4) | 199 (10.2) | 222 (11.1) |
Household Education | |||||
< HS Diploma | 2004 (15.7) | 245 (1.9) | 23 (0.1) | 531 (27.2) | 182 (9.1) |
HS Diploma/GED | 2902 (22.7) | 934 (7.2) | 316 (1.6) | 426 (21.8) | 335 (16.8) |
Some College | 5675 (44.4) | 4717 (36.4) | 2599 (13.4) | 620 (31.8) | 596 (29.8) |
Bachelor | 1524 (11.9) | 4333 (33.5) | 7617 (39.1) | 226 (11.6) | 490 (24.5) |
Post Graduate Degree | 659 (5.2) | 2717 (21.0) | 8907 (45.8) | 144 (7.4) | 359 (18.0) |
Marital Status | |||||
Yes | 4410 (34.5) | 9391 (72.5) | 17538 (90.1) | 850 (43.6) | 1188 (59.5) |
No | 8188 (64.0) | 3533 (27.3) | 1916 (9.8) | 1057 (54.2) | 731 (36.6) |
Note: Values in parentheses are percentages calculated within each study visit unless otherwise noted
*Race/ethnicity of the child was parent-reported
The total household income measure in ABCD refers to the “total combined family income for the past 12 months.” The non-summary variable includes the following response options:
- Less than $5,000
- $5,000 through $11,999
- $12,000 through $15,999
- $16,000 through $24,999
- $25,000 through $34,999
- $35,000 through $49,999
- $50,000 through $74,999
- $75,000 through $99,999
- $100,000 through $199,999
- $200,000 and greater
Additionally, two special response categories are included: - 999
= “Don’t know” - 777
= “Decline to answer”
In earlier data releases, 999
and 777
responses were coded as missing. As a result, any researcher using the household income variable— for statistical adjustment or as a variable of interest—would have their number of observations reduced by ~10%. More recent releases retained 999
and 777
as distinct levels in the variable rather than coding them as missing.
** Recommendation for handling 777
and 999
responses**
Instead of re-coding 777
(“Decline to answer”) and 999
(“Don’t know”) as missing, consider the following strategies:
Retain as distinct values
Keep777
and999
as valid categories. This allows for analysis of their distribution and potential associations with other variables.Document your approach
Clearly describe in your methods how these values were treated, along with the rationale and implications for interpretation.
These practices help minimize bias and enhance the transparency and validity of research findings.
Reference: Li et al. (2025)
Occupation Survey
Measure description: Asks caregivers about their own and their partner’s occupation and job category according to the ACS job classifications.
Screener (Study eligibility)
Measure description: Data from the Screener Instrument that was used to determine study eligibility prior to enrollment. This also includes estimates of twin participants’ physical similarity.
General tables
ABCD Dynamic variables
Measure description: The ABCD Dynamic Variables table includes dynamic (dyn
), i.e., longitudinally changing variables. This table contains important visit-level (e.g., event dates, visit types, participant age), design/nesting (e.g., school identifiers, MRI scanner details), and cohort description variables (e.g., school grade, caregiver education, household income). The table includes several variables that should be considered when accounting for clustering of participants into different groups in statistical analyses:
The ab_g_dyn__design_site
variable describes the site at which a participant completed a given event.
Using data from the National Center for Education Statistics (NCES), pseudo-identifiers were generated and assigned to each school and school district:
ab_g_dyn__design_id__district
: Pseudo ID for clustering participants by school districtab_g_dyn__design_id__school
: Pseudo ID for clustering participants by school
School Assignment Process:
- Primary Method: Match visit completion date with school calendar dates to identify the school
- Secondary Method: If no match, compare reported grade with school grade ranges
- Fallback Method: Use event name (e.g.,
ses-01A
) to infer attendance - No ID Assigned: If no NCES ID is found, no school or district ID is assigned
Notes: NCES IDs may correspond to educational environments beyond public or public charter schools. Researchers may want to distinguish among them (e.g., public vs. homeschool). Additional information can be found in:
mh_p_kbi__school_004
(baseline)mh_p_kbi__school_004_l
(follow-up)
Pseudo school IDs for specific school settings can be excluded if needed.
ab_g_dyn__cohort_income__hhold__3lvl
, ab_g_dyn__cohort_income__hhold__6lvl
ab_g_dyn__cohort_income__hhold__3lvl
, ab_g_dyn__cohort_income__hhold__6lvl
ab_g_dyn__cohort_prtnrshp__employ
This variable combines marital/partner status with labor force participation to provide a more comprehensive measure of household structure and economic engagement. It is designed to enhance contextualization of household dynamics and improve statistical modeling by addressing both relationship status and economic activity.
Description
- Marital/Partner Status
- Partnered: Includes married individuals and those living with a partner.
- Not Partnered: Includes widowed, divorced, separated, and never married individuals.
- Missing Data: Decline to answer (777) or don’t know (999).
- Labor Force Participation
- In the Labor Force: Includes individuals who are working (full-or part-time), temporarily laid off, looking for work, on sick leave, or on parental leave.
- Not in the Labor Force: Includes retired individuals, those permanently or temporarily disabled, stay-at-home parents, students not seeking work, and those unemployed but not seeking work.
- Missing Data: Decline to answer (777) or don’t know (999).
Labor force classification follows definitions from the American Community Survey and integrates ABCD Study employment data.
Why Combine These Variables - Contextualizing household structure by linking relationship and employment status. - Improves statistical power by reducing missingness and integrating multiple household dimensions. - Provides predictive utility—e.g., improves model fit for longitudinal outcomes.
Construction Details
Source variables
ab_p_demo__prtnr_001
: Marital/partner statusab_p_demo__empl__prtnr_001
: Partner employment statusSkip pattern Only respondents who reported having a partner (≥40% involvement in the child’s activities) were asked about partner employment.
Resulting classifications
Partnered and in the labor force
Partnered and not in the labor force
Not partnered and in the labor force
Not partnered and not in the labor force
Use in Baseline Propensity Scores
A variant of this variable was used in baseline propensity score modeling to balance study groups and improve causal inference by accounting for household context.
Reference: Heeringa, Steven G., and Patricia A. Berglund. “A guide for population-based analysis of the Adolescent Brain Cognitive Development (ABCD) Study baseline data.” BioRxiv (2020): 2020-02.
Potential Uses
- As a confounder: Adjust for household structure and economic activity in outcome models.
- As a predictor: Examine how household dynamics influence longitudinal outcomes (e.g., child mental health).
ABCD Static variables
Measure description: The ABCD Static table includes static (stc
) variables that do not change over time (as such, this and other static tables do not contain a session_id
column which leads to their values being duplicated to every event when they are joined with “dynamic”/longitudinal tables). This table contains participant-level descriptors such as derived family IDs, birth event groupings, ethno-racial identity variables, participant sex, imputed propensity weights, and principal components of genetic ancestry.
Participants belonging to the same family share a family ID (ab_g_stc__design_id__fam
). The table additionally includes a birth ID (ab_g_stc__design_id__birth
), which provides a unique ID for individuals within the same family who share the same birthdate.
Participants who share a birthdate—or whose birthdates are on consecutive days—are assigned the same ab_g_stc__design_id__birth
. This variable can be used to differentiate between siblings and twins/triplets within the sample.
ab_g_stc__design_id__fam__gen
and ab_g_stc__design_id__birth__gen
indicate individuals who are in the same family (as indicated by genetics) and individuals who were born within three months of each other (derived based on genetic information), respectively.
These fields are cross listed from variables in the Genetics domain: - gn_y_renrel_id__fam
- gn_y_genrel_id__birth
For additional documentation, see: Genetics data documentation.
ab_g_stc__cohort_ethn
, ab_g_stc__cohort_ethnrace__leg
, ab_g_stc__cohort_ethnrace__mblack
, ab_g_stc__cohort_ethnrace__meim
, ab_g_stc__cohort_ethnrace__mhisp
, ab_g_stc__cohort_race__nih
A 5-level race/ethnicity variable (ab_g_stc__cohort_ethnrace__leg
) was constructed based on parent or caregiver report of the youth’s race and ethnicity at baseline. The algorithm prioritizes Hispanic ethnicity first and then categorizes based on the selected race(s). The categories are:
- Hispanic If Hispanic/Latino/Latina is selected
- White If White is the only race selected and Hispanic/Latino/Latina is not selected
- Black If Black is the only race selected and Hispanic/Latino/Latina is not selected
- Asian If Asian is the only race selected and Hispanic/Latino/Latina is not selected
- Other If Other race is selected and Hispanic/Latino/Latina is not selected OR If more than one race is selected and Hispanic/Latino/Latina is not selected
Using a large-population dataset like ABCD requires careful consideration of race and ethnicity variables. A 5-level race/ethnicity variable is included in the dataset, but usage may differ depending on study goals. For example:
- In one paper using ABCD data, race/ethnicity categories included White, Black, Hispanic, Asian, and Other, with racial subgroups identified within the “Other” category (Fadus et al. 2021).
- Another study used non-Hispanic White, non-Hispanic Black, Hispanic, Asian, Native American/Alaska Native, Multi-racial, and additional race groups (Barch et al. 2021).
For guidance on appropriate use of race/ethnicity data, see (Cardenas-Iniguez et al. 2024).
ab_g_stc__design_id__fam__gen
This variable indicates participants who were enrolled with the same guardian (i.e. siblings) at the baseline visit. It serves as a clustering variable to capture overlapping environmental and genetic influences, making these participants more similar to each other (on average) than to unrelated participants.
Recommended uses:
Account for nesting within families using random effects in mixed models.
Maintain independence across test and replication sets by assigning all members of a family to the same set.
Variables with stems that include: ab_g_stc__gen_pc
We encourage users to consult the NASEM report for guidance on selecting appropriate population descriptors in analyses. The report emphasizes that terms such as “race”—whether self-identified or otherwise—do not adequately represent the full continuum of human genetic diversity and should not be used as proxies for genetic variation.
The data release includes genetic principal components (PCs) of ancestry, which are unlabeled variables that capture genetic variation. It does not include genetic ancestry factors, which label variation in terms of continental groupings (e.g., European, African, Asian).
Bates, Chin, and Becker (2022)