$ /corral-secure/projects/A2CPS/products/consortium-data/pre-surgery-release-2-0-0/psychosocial/reformatted
Psychosocial data help researchers understand the complex relationship between psychological, emotional, behavioral, and social aspects of life and their influence on health outcomes. The Acute to Chronic Pain Signatures Study (A2CPS) includes several psychosocial assessments and associated Helping End Addiction Long-term (HEAL) Common Data Elements (Wandner et al., 2022) to gain a comprehensive understanding of chronic pain. These assessments include:
- Pain related psychosocial constructs
- These include pain resilience, catastrophizing, kinesiophobia, and multisensory sensitivity. Pain resilience is measured using the Pain Resilience Scale (PRS)(Slepian et al., 2016). Catastrophizing measures negative appraisal of pain using the Pain Catastrophizing Scale-6 (PCS-6) (McWilliams et al., 2015). Kinesiophobia (fear of movement) is measured using the Fear Avoidance Beliefs Questionnaire—Physical Activity subscale (FABQ-PA)(Waddell et al., 1993). Sensitivity to sensory stimulation (light, sound, odor, flavor, touch) and interoception (balance, nausea, heart rate) are measured using an 8-item General Sensory Sensitivity (GSS) (Schrepf et al., 2018).
- Mood disorders (Depression and Anxiety)
- Depression and Anxiety are measured using the 8-item Patient Health Questionnaire-8 (PHQ-8) (Kroenke et al., 2009) and the 7-item General Anxiety Disorder (GAD-7) instruments (Spitzer et al., 2006)
- Social Support
- PROMIS SFv2.0 Instrumental Support 6a questionnaire is used to measure the quality of social support received by the subjects, and PROMIS SFv2.0 Emotional Support 6a questionnaire measures subjects’ perception about being valued and cared for (Hahn et al., 2010).
- Cognitive Function and Sleep
- Multidimensional Inventory of Subjective Cognitive Impairment (MISCI) (Kratz et al., 2016) measures perceived cognitive dysfunction in patients with fibromyalgia. Sleep is assessed using the PROMIS Short Form v1.0 Sleep Disturbance 6a measure (Yu et al., 2012) and Pain-Sleep Duration (Sleep II) form (Buysse et al., 1989).
- Other Assessments (Demographic, Comorbidity Questionnaire (SCQ), The Adverse Childhood Experience questionnaire (ACE), Big Five Inventory−2 short form (BFI-2-S)
- The demographic information form captures age, sex, gender, ethnicity, race, level of education, employment status, relationship status, household income, and disability status. Chronic health conditions are captured using the Self-Administered Comorbidity Questionnaire (SCQ) (Sangha et al., 2003). The Adverse Childhood Experience questionnaire (ACE) (Felitti et al., 1998) is used to assess the history of childhood abuse and dysfunctional family dynamics retrospectively. Also at baseline, five personality constructs are measured using the Big Five Inventory−2 short form (BFI-2-S), which measure five personality domains, i.e., 1. extraversion, 2. agreeableness, 3. conscientiousness, 4. negative emotionality, and 5. open-mindedness (Soto & John, 2017).
1.1 Starting Project
1.1.1 Locate Data
Where are the relevant files?
1.1.2 Example workflow with GAD-7 (Anxiety) and PHQ-9 (Depression)
Load libraries (Uncomment to install libraries if not already installed)
#install.packages(c("tidyverse", "ggplot2"))
#load libraries
library(tidyverse)
library(ggplot2)
Load Data: GAD-7 (Anxiety) and PHQ-9 (Depression)
<- read_csv("data/reformatted_gad.csv")
gad <- read_csv("data/reformatted_phq.csv") phq
Join GAD-7 (Anxiety) and PHQ-9 (Depression) datasets using unique identifiers i.e. record_id
and guid
, and other common data elements, we will call this combined dataset "gad_phq"
. Left join keeps all rows from the left table and matches data from the right. Right join keeps all rows from the right table and matches data from the left. Inner join keeps rows that exist in both tables. Full join keeps all rows from both tables, leaving missing values where there’s no match. The type of join depends on the researchers’ goals. Here we will use an inner join for exploratory data analysis purposes. In cases where the datasets have more than two columns in common the following code is helpful since it uses the common column names between the two datasets.
<- inner_join(gad, phq, by = intersect(names(gad), names(phq))) gad_phq
View columns available after merging the two datasets
head(gad_phq)
record_id | guid | redcap_event_name | redcap_data_access_group | cohort | gad2feelnervscl | gad2notstopwryscl | gad7wrytoomchscl | gad7troubrelxscl | gad7rstlessscl | gad7easyannoyedscl | gad7feelafrdscl | gad7totscore | gad7difficulttowork | generalized_anxiety_disorder_7_item_gad7_scale_sco_complete | gad_diff | imputed_gad_score | phqlitintrstscore | phqdeprssnscore | phqsleepimpairscore | phqtirdlittleenrgyscore | phqabnrmldietscore | phqflngfailrscore | phqconcntrtnimprmntscore | phqmovmntspchimprmntscore | phqtotalscore | patient_health_questionnaire_depression_scale_phq_complete | phq_diff | imputed_phq_score |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10001 | 90F8FC45-5D53-0DE4-6853-284607A8C4E6 | baseline_visit_arm_1 | rush_university_me | TKA | 1 | 1 | 1 | 2 | 1 | 2 | 0 | 8 | 1 | 2 | 1 | 8 | 1 | 1 | 2 | 2 | 0 | 2 | 2 | 0 | 10 | 2 | 1 | 10 |
10003 | 3D64236C-04A5-6B6B-D67B-B8DCE42DCE24 | baseline_visit_arm_1 | rush_university_me | TKA | 2 | 1 | 1 | 1 | 0 | 0 | 1 | 6 | 1 | 2 | 1 | 6 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 6 | 2 | 1 | 6 |
10004 | 3D18263F-58E6-0421-355F-0B40D4B49772 | baseline_visit_arm_1 | rush_university_me | TKA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 |
10005 | FB37DE84-DCC7-5F80-0044-EFA96200E30D | baseline_visit_arm_1 | rush_university_me | TKA | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 1 | 2 | 1 | 2 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 1 | 2 |
10006 | BE03636F-BBB4-424E-875D-1D230C76EC63 | baseline_visit_arm_1 | rush_university_me | TKA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 2 | 2 | 1 | 2 |
10007 | DBFF8148-0AA2-EBC1-26E3-4F3E2B0E59C9 | baseline_visit_arm_1 | rush_university_me | TKA | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 5 | 0 | 2 | 1 | 5 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 2 | 2 | 1 | 2 |
Check data types and ensure that character variables are not stored as numeric, and that numeric variables are stored with the correct numeric type—not as character or factor.
glimpse(gad_phq)
Rows: 1,362
Columns: 29
$ record_id <dbl> 10001, 100…
$ guid <chr> "90F8FC45-…
$ redcap_event_name <chr> "baseline_…
$ redcap_data_access_group <chr> "rush_univ…
$ cohort <chr> "TKA", "TK…
$ gad2feelnervscl <dbl> 1, 2, 0, 1…
$ gad2notstopwryscl <dbl> 1, 1, 0, 0…
$ gad7wrytoomchscl <dbl> 1, 1, 0, 0…
$ gad7troubrelxscl <dbl> 2, 1, 0, 1…
$ gad7rstlessscl <dbl> 1, 0, 0, 0…
$ gad7easyannoyedscl <dbl> 2, 0, 0, 0…
$ gad7feelafrdscl <dbl> 0, 1, 0, 0…
$ gad7totscore <dbl> 8, 6, 0, 2…
$ gad7difficulttowork <dbl> 1, 1, 0, 1…
$ generalized_anxiety_disorder_7_item_gad7_scale_sco_complete <dbl> 2, 2, 2, 2…
$ gad_diff <dbl> 1, 1, 1, 1…
$ imputed_gad_score <dbl> 8, 6, 0, 2…
$ phqlitintrstscore <dbl> 1, 1, 0, 0…
$ phqdeprssnscore <dbl> 1, 1, 0, 1…
$ phqsleepimpairscore <dbl> 2, 1, 0, 1…
$ phqtirdlittleenrgyscore <dbl> 2, 1, 0, 0…
$ phqabnrmldietscore <dbl> 0, 0, 0, 0…
$ phqflngfailrscore <dbl> 2, 1, 0, 0…
$ phqconcntrtnimprmntscore <dbl> 2, 1, 0, 0…
$ phqmovmntspchimprmntscore <dbl> 0, 0, 0, 0…
$ phqtotalscore <dbl> 10, 6, 0, …
$ patient_health_questionnaire_depression_scale_phq_complete <dbl> 2, 2, 2, 2…
$ phq_diff <dbl> 1, 1, 1, 1…
$ imputed_phq_score <dbl> 10, 6, 0, …
1.2 Exploratory data analysis
Anxiety and depression frequently coexist due to shared biological mechanisms and overlapping symptoms. Approximately 50% of patients with depression also meet the criteria for having an anxiety disorder. The co-occurrence of anxiety and depression in an individual increases the risk of suicide, affects social interactions, and results in more hospitalizations (Kircanski et al., 2017). We are interested in looking at the association between Anxiety scores and Depression scores. While looking at the datatypes, spearman correlation would be the measure of choice since we have confirmed using the glimpse()
functiona that those variables are stored as ordered and numeric.
Spearman Correlation:
Since GAD and PHQ scores are ordered but not continuous, spearman correlation here would be a better choice than pearson since it measures monotonic association between two variables by comparing their ranks, we will add the argument use = "complete.obs"
to ignore rows with missing values.
cor(gad_phq$imputed_gad_score, gad_phq$imputed_phq_score, use = "complete.obs")
[1] 0.7626977
As expected we see a very strong correlation between anxiety and depression within the A2CPS cohort i.e. 0.76. Let’s visualize these results with a simple boxplot.
# Remove rows with NA in either column
<- gad_phq %>%
gad_phq_clean filter(!is.na(imputed_gad_score), !is.na(imputed_phq_score))
# Create the plot
ggplot(
gad_phq_clean,aes(
x = factor(imputed_gad_score),
y = imputed_phq_score,
fill = factor(imputed_gad_score)
)+
) geom_boxplot() +
labs(
title = "PHQ Score Distribution by GAD Score",
x = "GAD Score (Imputed)",
y = "PHQ Score (Imputed)",
fill = "GAD Score"
+
) theme_minimal()
We see that there is a positive association between Anxiety and Depression scores.
1.3 Considerations While Working on the Project
1.3.1 Methods
Self-reported assessments are collected electronically through a variety of surveys using Research Electronic Data Capture (REDCap™) and/or MyDataHelps™ (RKStudio™, CareEvolution, LLC, Ann Arbor, MI) (see below for more details). A2CPS includes the core psychosocial assessments and associated HEAL CDEs for harmonization. Remote assessments have been implemented for both English and Spanish-speaking participants. Of note: two psychosocial items were also collected during brain-imaging sessions, a pain assessement and a mood assessment.
1.3.2 Data Quality
The data were exported from REDCap and may contain errors or missing values. Therefore, to ensure high-quality data, the data were cleaned and reformatted. For details, see Appendix D. There are individual reformatted files as well as a combined file for psychosocial data (look for files ending in TKA and/or Thoracic).
1.3.3 Cross-Modality Links
All files include record_id
and guid
, which serve as unique identifiers and enable linkage across psychosocial data files as well as other data modalities.
1.3.4 Missing Data
Proceed with caution when merging data across psychosocial data or different modalities. Merging datasets often leads to a reduced sample size due to missing data or incomplete forms. This can lower statistical power and limit the ability to draw reliable conclusions.
1.3.5 Citations
In publications or presentations including data from A2CPS, please include the following statement as attribution:
Data were provided [in part] by the A2CPS Consortium funded by the National Institutes of Health (NIH) Common Fund, which is managed by the Office of the Director (OD)/ Office of Strategic Coordination (OSC). Consortium components and their associated funding sources include Clinical Coordinating Center (U24NS112873), Data Integration and Resource Center (U54DA049110), Omics Data Generation Centers (U54DA049116, U54DA049115, U54DA049113), Multi-site Clinical Center 1 (MCC1) (UM1NS112874), and Multi-site Clinical Center 2 (MCC2) (UM1NS118922).