R/Python Packages

The ABCD Data Analysis, Informatics, & Resource Center (DAIRC) develops and releases software to enhance the transparency and reproducibility of the ABCD data resource and to support users working with ABCD datasets with tools to enhance their analysis workflows. This page provides a short overview of software packages that are available to be installed from the nbdc-datahub GitHub organization. We encourage interested users to explore the documentation websites of the respective packages for more details.

ABCDscores

Documentation website: https://software.nbdc-datahub.org/ABCDscores
GitHub repository: https://github.com/nbdc-datahub/ABCDscores

The ABCDscores R package provides functions to compute all non-proprietary summary scores included in the ABCD tabulated data resource, starting with the 6.0 data release. The package is accompanied by a documentation website that provides extensive details on how to use the package to compute any of the hundreds of summary scores across the different research domains.

One of the goals of the package is to support transparency and reproducibility of ABCD release data by making available the exact algorithms and code that were used to compute the released summary scores, with the ability to tie a given data release version to a specific version of the codebase. The versioning also allows users to fix errors in the codebase and/or add new code to compute additional scores independent of the release timelines, i.e., updated versions of the package can be released at any time to fix errors and/or compute new scores based on the raw/item-level data.

Furthermore, while clearly specifying how the official ABCD summary scores were computed, all functions in the package allow users some level of flexibility to change the computation and retrieve alternative scores. For example, while most functions specify a certain level of missingness that is allowed to still compute a given summary score (typically >=80% completion is required), users can change a function parameter to apply a stricter or more lenient criterion. For some types of scores, the package even provides a set of basic functions that can be used flexibly to compute a variety of different scores (see, e.g., the functions to compute Timeline Followback summary scores).

NBDCtools

R Package:
Documentation website: https://software.nbdc-datahub.org/NBDCtools
GitHub repository: https://github.com/nbdc-datahub/NBDCtools

Python Package:
Documentation website: https://software.nbdc-datahub.org/nbdctools-py
GitHub repository: https://github.com/nbdc-datahub/nbdctools-py

To accommodate dynamic creation of datasets from studies released through the NBDC Data Hub—currently the ABCD and HBCD ¹ studies—the DAIRC team developed the NBDCtools R package and the nbdctools Python package. The packages leverage the regular structure of NBDC datasets, especially standardized metadata (data dictionary and levels table; see here) and the organization of tabulated data as one file per table in the BIDS phenotype/ directory (see here).

The packages assume that users downloaded the complete tabulated dataset as file-based data and saved the files in a local directory. Using functions from the packages, users can then create custom datasets by specifying the study name and any set of variable names and/or table names defined in the data dictionary. By making use of the study’s metadata, the functions automatically retrieve the required columns from different files on disk and join them into a single dataset in memory (e.g., an R tibble or pandas/polars DataFrame). This provides a fast, storage- and memory-efficient, and highly reproducible workflow for working with NBDC data, offering an alternative to creating and downloading multiple datasets (and maintaining on-disk copies) through the DEAP or the NBDC Data Access Platform (see here).

Furthermore, both packages provide several additional functions that assist users in creating analysis-ready datasets from NBDC studies. These include:

transformation functions, such as converting categorical columns to factors (R) or categorical dtypes (Python) based on the data dictionary and levels table, and assigning variable and value labels;
filtering and sub-setting functions, such as selecting specific participants or events, filtering ABCD events using shorthand conventions, or excluding rows or columns containing only missing data; and
a set of utility functions supporting a range of common workflows.

Lastly, both the R and Python packages provide functionality to retrieve, inspect, and utilize study metadata, enabling more transparent and reproducible data analysis workflows across programming environments.

Footnotes

For more details about the data released by the HBCD study, see here ↩︎