[[18]{.chapter-number}  [Manual Derivative QC]{.chapter-title}]{#mri-deriv-qc .quarto-section-identifier}

18 Manual Derivative QC

library(dplyr)
library(duckplyr)
library(purrr)
library(ggplot2)
library(png)
library(terrainr)

Most preprocessing pipelines fail on a subset of scans. For large-scale studies, failure rates of even a few percentage points can amount to dozens of scans. Although there are automated ways of detecting some pipeline failures, the data used to train failure detection algorithms are limited, so manual review remains essential. As of Release 3.0, a subset of the neuroimaging derivatives have undergone manual QC. This kit shows how the ratings are packaged.

18.1 Starting Project

On TACC, the neuroimaging data are stored underneath the releases. For example, data release v2.#.# is underneath

pre-surgery/mris

The Derivative QC metrics are underneath derivatives/qc. They consist of four tables.

$ ls mris/derivatives/qc/
coordinate.parquet  image.parquet  rating.parquet  session.parquet

18.1.1 Extract Data

Two of these tables contain quality ratings (rating.parquet and coordinate.parquet), and the others contain metadata about the ratings (e.g., identity of the rater). Of the tables with quality information rating.parquet stores “overall” rating information. These are for ratings that reviewers can provide a categorical and judgement, identifying whether a target was created “successfully”, whether target creation “failed”, or whether the reviewer is “unsure”. These kinds of ratings are used to assess coregistration and fractional anisotropy maps. Another kind of rating is stored in coordinate.parquet. These ratings refer not to the overall image but rather to specific points within the image that indicate a failure. Reviewers “click” on parts of the target that seem off (e.g., a misplaced tissue boundary), and the coordinates of those clicks are recorded.

Let’s look through the schema of each table.

18.1.1.1 session

This table contains information about the rating session (e.g., helpful for determining rater consistency).

read_parquet_duckdb("data/qc/session.parquet") |>
  head()

id	step	user	created
1	0	NA	2025-07-28 13:25:14
2	3	NA	2025-07-28 13:25:37
3	2	NA	2025-07-28 18:47:04
4	1	NA	2025-07-29 01:09:28
5	2	NA	2025-07-29 01:09:40
6	2	NA	2025-07-29 12:37:20

id: ID of session
step: ID indicating the kind of target being rated (https://github.com/psadil/django-qcapp-ratings/blob/1080ae40304c067fec283fc04bd956834c61d99c/src/django_qcapp_ratings/models.py#L9-L14)
- MASK = 0
- SPATIAL_NORMALIZATION = 1
- SURFACE_LOCALIZATION = 2
- FMAP_COREGISTRATION = 3
- DTIFIT = 4
user: ID user who was doing to rating (note: user id capturing currently exhibits some errors)
created: datetime stamp of when the session started

18.1.1.2 image

This table contains information about the to-be rated targets.

read_parquet_duckdb("data/qc/image.parquet") |>
  select(-img) |> # just for display purposes, so this webpage is not too big
  head()

id	slice	file1	file2	created
1	0	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:11
2	1	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:15
3	2	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:20
4	3	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:25
5	4	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:29
6	5	sub-10003_ses-V1_desc-brain_mask.nii.gz	sub-10003_ses-V1_T1w.nii.gz	2025-07-26 02:38:34

id: ID of the to-be-rated image
img: binary large object (BLOB) containing the image
slice: indicator for which part of the image was displayed
file1: the main source file from which the image was derived (e.g., the name of the mask)
file2: supplementary file used to make the to-be-rated image (e.g., anatomical underlay)
display: ID indicating the orientation of the image (https://github.com/psadil/django-qcapp-ratings/blob/1080ae40304c067fec283fc04bd956834c61d99c/src/django_qcapp_ratings/models.py#L23-L26)
- X = 0
- Y = 1
- Z = 2
step: ID indicating the kind of target (https://github.com/psadil/django-qcapp-ratings/blob/1080ae40304c067fec283fc04bd956834c61d99c/src/django_qcapp_ratings/models.py#L9-L14)
- MASK = 0
- SPATIAL_NORMALIZATION = 1
- SURFACE_LOCALIZATION = 2
- FMAP_COREGISTRATION = 3
- DTIFIT = 4
created: when the image was created (helpful for determining version of the release)

18.1.1.3 rating

This table contains the ratings given as “overal” scores. They apply to ratings for targets fmap coregistration and dtifit.

read_parquet_duckdb("data/qc/rating.parquet") |>
  head()

id	source_data_issue	img_id	session	notes	created
387	FALSE	90142	54	NA	2025-08-14 15:23:35
391	FALSE	106095	55	NA	2025-08-15 15:34:33
395	FALSE	63618	57	NA	2025-08-15 17:56:53
399	FALSE	79030	64	NA	2025-10-31 15:48:42
403	FALSE	86580	64	NA	2025-10-31 15:48:53
407	FALSE	103443	64	NA	2025-10-31 15:49:04

id: ID of each rating
source_data_issue: boolean indicating whether the reviewer noticed an issue with the raw data quality
rating: the rating provided (https://github.com/psadil/django-qcapp-ratings/blob/1080ae40304c067fec283fc04bd956834c61d99c/src/django_qcapp_ratings/models.py#L17-L20)
- PASS = 0
- UNSURE = 1
- FAIL = 2
img_id: ID of the image that was rated (see image.parquet)
session: ID of the rating session (see session.parquet)
notes: freeform text used to provide details about the rating
created: datetime stamp of when the rating was produced

18.1.1.4 coordinate

This table contains the coordinates that were clicked during reviews, locations indicating some source data issue. These were used to rate the mask, spatial normalization, and surface localization.

read_parquet_duckdb("data/qc/coordinate.parquet") |>
  head()

id	source_data_issue	x	y	img_id	session	notes	created
1	FALSE	NA	NA	8561	1	NA	2025-07-28 13:25:22
2	FALSE	NA	NA	5856	1	NA	2025-07-28 13:25:26
3	FALSE	NA	NA	11379	1	NA	2025-07-28 13:25:31
4	FALSE	NA	NA	4790	3	NA	2025-07-28 18:47:11
5	FALSE	NA	NA	11719	3	NA	2025-07-28 18:47:21
6	FALSE	NA	NA	23626	3	NA	2025-07-28 18:47:25

id: ID of each clicked coordinate
source_data_issue: boolean indicating whether the reviewer noticed an issue with the raw data quality
x: horizontal coordinate of the target issue
y: vertical coordinate of the target issue
img_id: ID of the image that was rated (see image.parquet)
session: ID of the rating session (see session.parquet)
notes: freeform text used to provide details about the rating
created: datetime stamp of when the rating was produced

Note: rows where the x and y coordinates are NA indicate that the reviewer did not observe any issues with the target (that is, these images were reviewed, but no problems were found).

18.1.2 Example Image Plotting

Let’s see how we can display the rated images. As an example, we’ll look one of the spatial normalization images.

# grab images associated with spatial normalization
spatial_norm_images <- read_parquet_duckdb("data/qc/image.parquet") |>
  filter(step == 1, display == 0) |>
  select(id, img)

# grab image with lots of clicks
img_with_many_clicks <- read_parquet_duckdb("data/qc/coordinate.parquet") |>
  filter(!is.na(x)) |>
  semi_join(spatial_norm_images, by = join_by(img_id == id)) |>
  summarize(n_clicks = n(), .by = c(img_id)) |>
  slice_head(n = 1, by = n_clicks)
img_with_many_clicks

img_id	n_clicks
15820	8

We’re going to need a helper function to convert the img BLOBs into a tibble for display with ggplot.

to_tbl <- function(.i) {
  apply(.i, 2, rgb) |>
    as_tibble() |>
    mutate(y = row_number()) |>
    tidyr::pivot_longer(-y, names_to = "x") |>
    mutate(
      x = stringr::str_extract(x, "[[:digit:]]+") |> as.integer(),
      y = max(y) - y + 1,
      rgb = map(value, col2rgb),
      r = map_dbl(rgb, pluck, 1),
      g = map_dbl(rgb, pluck, 2),
      b = map_dbl(rgb, pluck, 3)
    )
}

Now, return to the image.parquet file, do a filter join to get just a single row, select it, apply the helper, and plot.

# now show the associated blob
read_parquet_duckdb("data/qc/image.parquet") |>
  semi_join(img_with_many_clicks, by = join_by(id == img_id)) |>
  pluck("img", 1) |>
  readPNG() |>
  to_tbl() |>
  ggplot(aes(x = x, y = y)) +
  geom_spatial_rgb(aes(r = r, g = g, b = b)) +
  coord_fixed() +
  theme_void()

By referring back to img.parquet file, we can see metadata about the source file.

read_parquet_duckdb("data/qc/image.parquet") |>
  semi_join(img_with_many_clicks, by = join_by(id == img_id)) |>
  select(-img)

id	slice	file1	file2	display	step	created
15820	1	sub-20045_ses-V1_space-MNI152NLin2009cAsym_desc-preproc_T1w.nii.gz	NA	0	1	2025-07-28 13:44:04

18.1.3 Use of Ratings to Exclude Images

These ratings can be used to exclude images from analyses. Here, we show how to exclude all functional images that were generated from fieldmaps that were out of register.

# grab all fieldmap images
fmap_coregistration_images <- read_parquet_duckdb("data/qc/image.parquet") |>
  filter(step == 3) |>
  select(id, file1, file2, display)

# search for all coregistrations that were marked as having failed
definite_coregistration_failures <- read_parquet_duckdb(
  "data/qc/rating.parquet"
) |>
  inner_join(fmap_coregistration_images, by = join_by(img_id == id)) |>
  filter(rating == 2) |>
  select(file1, file2, display) |>
  # pluck out useful metadata from the image
  mutate(
    sub = stringr::str_extract(file1, "(?<=sub-)[[:digit:]]{5}") |>
      as.integer(),
    task = stringr::str_extract(file1, "cuff|rest"),
    run = stringr::str_extract(file1, "(?<=run-0)[1|2]")
  ) |>
  distinct(sub, task, run) |>
  collect()

How many failed coregistrations were there?

nrow(definite_coregistration_failures)

[1] 126

This is the table that you’d use to exclude files from analyses.

definite_coregistration_failures |>
  head()

sub	task	run
10165	rest	2
10222	rest	2
10516	cuff	2
20045	rest	2
20078	rest	2
10609	rest	1

18.2 Considerations While Working on the Project

18.2.1 Data Generation

Derivative QC is a work in progress, and so we expect substantial improvements in subsequent releases. As of this release, only fieldmaps coregistrations have been exhaustively checked. In future releases, we expect more targets to be reviewed.

Quality control was done with the A2CPS Derivative Image Review Tool (DIRT), which is under active development here: https://github.com/psadil/django-qcapp-ratings/tree/main/src/django_qcapp_ratings.

18.2.2 Citations

In publications or presentations including data from A2CPS, please include the following statement as attribution:

Data were provided [in part] by the A2CPS Consortium funded by the National Institutes of Health (NIH) Common Fund, which is managed by the Office of the Director (OD)/ Office of Strategic Coordination (OSC). Consortium components and their associated funding sources include Clinical Coordinating Center (U24NS112873), Data Integration and Resource Center (U54DA049110), Omics Data Generation Centers (U54DA049116, U54DA049115, U54DA049113), Multi-site Clinical Center 1 (MCC1) (UM1NS112874), and Multi-site Clinical Center 2 (MCC2) (UM1NS118922).

Note

The following published papers should be cited when referring to A2CPS Protocol and Biomarkers: Sluka et al. (2023) Berardi et al. (2022)

Berardi, G., Frey-Law, L., Sluka, K. A., Bayman, E. O., Coffey, C. S., Ecklund, D., Vance, C. G. T., Dailey, D. L., Burns, J., Buvanendran, A., McCarthy, R. J., Jacobs, J., Zhou, X. J., Wixson, R., Balach, T., Brummett, C. M., Clauw, D., Colquhoun, D., Harte, S. E., … Wandner, L. D. (2022). Multi-site observational study to assess biomarkers for susceptibility or resilience to chronic pain: The acute to chronic pain signatures (A2CPS) study protocol. Frontiers in Medicine, 9. https://doi.org/10.3389/fmed.2022.849214

Sluka, K. A., Wager, T. D., Sutherland, S. P., Labosky, P. A., Balach, T., Bayman, E. O., Berardi, G., Brummett, C. M., Burns, J., Buvanendran, A., et al. (2023). Predicting chronic postsurgical pain: Current evidence and a novel program to develop predictive biomarker signatures. Pain, 164(9), 1912–1926. https://doi.org/10.1097/j.pain.0000000000002938

Sadil, P., Arfanakis, K., Bhuiyan, E. H., Caffo, B., Calhoun, V. D., Clauw, D. J., DeLano, M. C., Ford, J. C., Gattu, R., Guo, X., Harris, R. E., Ichesco, E., Johnson, M. A., Jung, H., Kahn, A. B., Kaplan, C. M., Leloudas, N., Lindquist, M. A., Luo, Q., … Chronic Pain Signatures Consortium, T. A. to. (2024). Image processing in the acute to chronic pain signatures (A2CPS) project. bioRxiv. https://doi.org/10.1101/2024.12.19.627509

When using neuroimaging derivatives, please also cite Sadil et al. (2024).