File | Documentation |
---|---|
README (A2CPS file) | modality agnostic files |
participants.tsv (dictionary) | participants file |
sub-[record_id]_ses-[ses_id]_scans.tsv tables (dictionary) | scans file |
sub-[record_id]_sessions.tsv tables (dictionary) | sessions file |
sub-[record_id]_ses-[ses_id]_task-cuff_run-[run_id]_events.tsv tables (dictionary) | task events |
dataset_description.json (A2CPS file) | dataset descriptions |
4 Raw MRI Data Starter Kit
This page describes how to work with the raw MRI data from A2CPS. For background on collection and availability, see the “Imaging Data” section of a2cps.org. The data are provided in a format called the Brain Imaging Data Structure (BIDS)1. The raw data are most useful when you need to run custom preprocessing pipelines. If you do not need custom pipelines, please review the preface for a list of imaging products that have already been calculated.
1 For a generic introduction to BIDS, see: Unsure how to get started using BIDS with your current data? - The Brain Imaging Data Structure]
4.1 On Starting Project
Once downloaded, the data will be stored deep inside the image03 folder.
$ ls image03/corral-secure/projects/A2CPS/products/consortium-data/pre-surgery/mris/bids/ | head
dataset_description.json
participants.json
participants.tsv
README
scans.json
sessions.json
sub-10003
sub-10005
sub-10008
sub-10010
4.1.1 Extract Data
The raw MRI data are organized according to v1.9.0 of the BIDS standard.
In BIDS data for individual subjects are stored in folders named “sub-[record_id]”. For an example A2CPS session this format results in the following2:
2 The tree
command is just used here to display the folder structure, and is not required for you to have.
$ tree sub-10003
sub-10003
├── ses-V1
│ ├── anat
│ │ ├── sub-10003_ses-V1_T1w.json
│ │ └── sub-10003_ses-V1_T1w.nii.gz
│ ├── dwi
│ │ ├── sub-10003_ses-V1_dwi.bval
│ │ ├── sub-10003_ses-V1_dwi.bvec
│ │ ├── sub-10003_ses-V1_dwi.json
│ │ └── sub-10003_ses-V1_dwi.nii.gz
│ ├── fmap
│ │ ├── sub-10003_ses-V1_acq-dwib0_dir-AP_epi.json
│ │ ├── sub-10003_ses-V1_acq-dwib0_dir-AP_epi.nii.gz
│ │ ├── sub-10003_ses-V1_acq-dwib0_dir-PA_epi.json
│ │ ├── sub-10003_ses-V1_acq-dwib0_dir-PA_epi.nii.gz
│ │ ├── sub-10003_ses-V1_acq-dwib0_epi.bval
│ │ ├── sub-10003_ses-V1_acq-dwib0_epi.bvec
│ │ ├── sub-10003_ses-V1_acq-fmrib0_dir-AP_epi.json
│ │ ├── sub-10003_ses-V1_acq-fmrib0_dir-AP_epi.nii.gz
│ │ ├── sub-10003_ses-V1_acq-fmrib0_dir-PA_epi.json
│ │ └── sub-10003_ses-V1_acq-fmrib0_dir-PA_epi.nii.gz
│ ├── func
│ │ ├── sub-10003_ses-V1_task-cuff_run-01_bold.json
│ │ ├── sub-10003_ses-V1_task-cuff_run-01_bold.nii.gz
│ │ ├── sub-10003_ses-V1_task-cuff_run-01_events.tsv
│ │ ├── sub-10003_ses-V1_task-rest_run-01_bold.json
│ │ ├── sub-10003_ses-V1_task-rest_run-01_bold.nii.gz
│ │ ├── sub-10003_ses-V1_task-rest_run-02_bold.json
│ │ └── sub-10003_ses-V1_task-rest_run-02_bold.nii.gz
│ └── sub-10003_ses-V1_scans.tsv
└── sub-10003_sessions.tsv
5 directories, 25 files
The raw imaging data are in the (gzip
compressed) NIfTI files, and each image is associated with a .json
sidecar containing metadata about the scan and acquisition parameters (metadata that subsumes the NIfTI header).
The “session” refers to the visit. Currently only the baseline (pre-surgery) data is included, so all sessions have the label “V1”. Information about each participant’s visit is stored in the sub-[record_id]/sub-[record_id]_sessions.tsv
file.
The pressures that were applied during the CUFF scans are recorded in the sub-[record_id]_ses-[ses_id]_task-cuff_run-[run_id]_events.tsv
tables.
BIDS requires that the diffusion gradient information is stored according to the FSL format, in *bval
and *bvec
files in the /dwi
directory. For details see: Magnetic Resonance Imaging - Brain Imaging Data Structure.
Files at the top level of the bids folder contain information that applies to multiple participants. This includes
4.1.2 Data Quality
All raw data have undergone quality review. For details on the review process see A2CPS Imaging Quality Assurance.
The resulting overall quality ratings (red/yellow/green) are included in the *scans.tsv
files. For example
$ cat sub-10003/ses-V1/sub-10003_ses-V1_scans.tsv
filename rating
func/sub-10003_ses-V1_task-cuff_run-01_bold.nii.gz green
dwi/sub-10003_ses-V1_dwi.nii.gz green
func/sub-10003_ses-V1_task-rest_run-01_bold.nii.gz green
func/sub-10003_ses-V1_task-rest_run-02_bold.nii.gz green
anat/sub-10003_ses-V1_T1w.nii.gz green
fmap/sub-10003_ses-V1_acq-fmrib0_dir-AP_epi.nii.gz n/a
fmap/sub-10003_ses-V1_acq-fmrib0_dir-PA_epi.nii.gz n/a
fmap/sub-10003_ses-V1_acq-dwib0_dir-AP_epi.nii.gz n/a
fmap/sub-10003_ses-V1_acq-dwib0_dir-PA_epi.nii.gz n/a
For a description of the reviews, see the scans.json data dictionary. Re-evaluation of quality and inclusion/exclusion criteria in the context of a specific analysis is always reasonable, but we would advise against including scans labeled “red” in most analyses.
Note that some “cuff” scans were collected without cuff inflation. These scans can be identified by the “applied_pressure” column of the *events.tsv
file. For example
$ grep -b1 -HP "0\t450\t0" sub-*/ses-V1/func/*events.tsv | head -n 9
sub-10077/ses-V1/func/sub-10077_ses-V1_task-cuff_run-01_events.tsv-0-onset duration applied_pressure
sub-10077/ses-V1/func/sub-10077_ses-V1_task-cuff_run-01_events.tsv:32:0 450 0
--
sub-10077/ses-V1/func/sub-10077_ses-V1_task-cuff_run-02_events.tsv-0-onset duration applied_pressure
sub-10077/ses-V1/func/sub-10077_ses-V1_task-cuff_run-02_events.tsv:32:0 450 0
--
sub-10103/ses-V1/func/sub-10103_ses-V1_task-cuff_run-01_events.tsv-0-onset duration applied_pressure
sub-10103/ses-V1/func/sub-10103_ses-V1_task-cuff_run-01_events.tsv:32:0 450 0
4.2 Considerations While Working on Project
4.2.1 Data Generation
Each scanner’s manufacturer-specific algorithms and software are used for the reconstruction of image data from k-space (k-space data were not saved). The reconstructed images are exported in the Digital Imaging and Communications in Medicine (DICOM) format and sent electronically from participating sites to the Texas Advanced Computing Center (TACC). Following the Organization for Human Brain Mapping Committee on Best Practices in Data Analysis and Sharing report (Nichols et al., 2017), the DICOM files are converted into the Neuroimaging Informatics Technology Initiative (NIfTI) 1 format (Cox et al., 2004) and organized according to the BIDS specification (Gorgolewski et al., 2016). For this conversion, the pipeline uses HeuDiConv (Halchenko et al., 2024), which in turn uses dcm2niix
(Li et al., 2016). Conversion relies on the ReproIn heuristic file (Visconti di Oleggio Castello et al., 2023), modified to encode the sequence and file naming conventions developed for the A2CPS project. The original DICOMs are stored for posterity but are not included in data releases. For additional details, see Sadil et al. (2024).
All MRI pipeline code is available through github: https://github.com/a2cps/mri_imaging_pipeline. Note that the code used to generate the BIDS derivatives has been updated several times (e.g., the same version of dcm2niix
was not used to convert all participants). The specific version of the conversion code is stored in the GeneratedBy
field of the dataset_description.json
, with entries like
{
"CodeUrl": "https://github.com/a2cps/mri_imaging_pipeline",
"Container": {
"Tag": "221121",
"Type": "Docker",
"URI": "docker://psadil/heudiconv:221121"
},
"Description": "WS20207V1",
"Name": "heudiconv_app",
"Version": "urrutia-heudiconv-0.11.6"
}
This indicates that the presurgery scan session for participant 20207 (who happens to have been collected at Wayne State) was converted with the heudiconv_app
version 0.11.6. That version is recorded here. This version of the app used the docker image psadil/heudiconv:221121.
4.2.2 Skull-Stripping for Privacy
Prior to release, all T1w images are skull-stripped using brain masks from the fMRIPrep pipeline. All DIRC processing is carried out using unmodified raw images, which is recommended for many measures like cortical thickness.
4.2.3 Acquisition Protocol Variation
Over the course of the project, individual scanners have occasionally needed upgrading. For a timeline of acquisition changes, see: Timeline of Imaging Acquisition Changes.
Occasionally, scans were collected with acquisition parameters that differed from the standards listed in the A2CPS Tech Manual. For a review of the kinds of deviations that have been observed, see Imaging Log - Problem Cases. In general, preprocessing pipelines should not assume that there is consistency in the acquisition parameters across participants (even in participants from the same site), and in all cases, the metadata within the json sidecars should be referenced as the best source of information on these parameters.
4.2.4 Software Packages for Working with BIDS
Several packages in multiple languages have been developed to simplify interacting with BIDS. For a complete list, see: Benefits, a subset of which is copied below
- bids-matlab: MATLAB/Octave tools to interact with datasets conforming to the BIDS format
- bidser: Working with Brain Imaging Data Structure in R
- PyBIDS: Python package to quickly parse / search the components of a BIDS dataset. It also contains functionality for running analyses on your data
4.2.5 Joining with Other Modalities
A2CPS relies on several different schemes for labeling and describing metadata entities. For example, across modalities, participants are identified by Record ID, Subject GUID, Individual ID, and NDA GUID. For more details on how these identifiers relate, see Appendix B.
4.2.6 Citations
If you use the imaging data in your research, please cite the imaging pipeline preprint: Sadil et al. (2024).
In publications or presentations including data from A2CPS, please include the following statement as attribution:
Data were provided [in part] by the A2CPS Consortium funded by the National Institutes of Health (NIH) Common Fund, which is managed by the Office of the Director (OD)/ Office of Strategic Coordination (OSC). Consortium components and their associated funding sources include Clinical Coordinating Center (U24NS112873), Data Integration and Resource Center (U54DA049110), Omics Data Generation Centers (U54DA049116, U54DA049115, U54DA049113), Multi-site Clinical Center 1 (MCC1) (UM1NS112874), and Multi-site Clinical Center 2 (MCC2) (UM1NS118922).