Within-Subjects Design

In a within-subjects design, subjects give responses across multiple conditions or across time. In other words, measures are repeated across levels of some condition or across time points. For example, subjects can report how happy they feel when they see a sequence of positive pictures and another sequence of negative pictures. In this case, we’d observe each subjects’ happiness in both positive and negative conditions. As another example, we could measure subjects’ job satisifcation every month for 3 months. Here we’d observe job satisfaction for each subject across time. In both cases, subject scores across conditions or time very likely correlate with eachother – they are dependent.

What are contrasts?

Broadly, contrasts test focused research hypotheses. A contrast comprises a set of weights or numeric values that represent some comparison. For example, when comparing two experimental group means (i.e., control vs. treatment), you can apply weights to each group mean and then sum them up. This is the same thing as subtracting one group’s mean from the other’s.

Here’s a quick demonstration

# group means
control <- 5
treatment <- 3

# apply contrast weights and sum up the results
sum(c(control, treatment) * c(1, -1))
## [1] 2

Model and Conceptual Assumptions for Linear Regression

• Correct functional form. Your model variables share linear relationships.
• No omitted influences. This one is hard: Your model accounts for all relevant influences on the variables included. All models are wrong, but how wrong is yours?
• Accurate measurement. Your measurements are valid and reliable. Note that unreliable measures can’t be valid, and reliable measures don’t necessairly measure just one construct or even your construct.
• Well-behaved residuals. Residuals (i.e., prediction errors) aren’t correlated with predictor variables or eachother, and residuals have constant variance across values of your predictor variables.

Model and Conceptual Assumptions for Repeated Measures ANOVA

• All change scores variances are equal. Similar to the homogenous group variance assumption in between-subjects ANOVA designs, within-subjects designs require that all change score variances (e.g., subject changes between time points, within-subject differences between conditions) are equal. This means that if you compute the within-subject differences between all pairiwse levels (e.g., timepoints, treatment levels), the variances of those parwises differences must all be equal. For example, if you ask people how satisifed they are with their current job every month for 3 months, then the variance of month 2 - month 1 should equal the variance of month 3 - month 2 and the variance of month 3 - month 1. As you might be thinking, this assumption is very strict and probably not realistic in many cases.

A different take on the homogenous change score variance assumption

• Sphericity and a special case, compound symmetry. Sphericity is the matrix algebra equivalent to the homogenous change score variance assumption. Compound symmtry is a special case of sphericity. A variance-covariance matrix that satisifies compound symmetry has equal variances (the values in the diagonal) and equal covariances (the values above or below the diagonal).
Short explanation: Sphericity = homogenous change score variance = compound symmetry

Libraries

library(tidyverse)
library(knitr)
library(AMCP)
library(MASS)
library(afex)
library(lsmeans)

# select from dplyr
select <- dplyr::select
recode <- dplyr::recode

From Chapter 11, Table 5 in Maxwell, Delaney, & Kelley (2018)
From help("C11T5"): “The data show that 12 participants have been observed in each of 4 conditions. To make the example easier to discuss, let’s suppose that the 12 subjects are children who have been observed at 30, 36, 42, and 48 months of age. Essentially, for the present data set, 12 children were each observed four times over an 18 month period. The dependent variable is the age-normed general cognitive score on the McCarthy Scales of Children’s Abilities. Interest is to determine if the children were sampled from a population where growth in cognitive ability is more rapid or less rapid than average.”

data("C11T5")
C11T5$id <- factor(1:12) # print entire dataset C11T5 %>% kable() Months30 Months36 Months42 Months48 id 108 96 110 122 1 103 117 127 133 2 96 107 106 107 3 84 85 92 99 4 118 125 125 116 5 110 107 96 91 6 129 128 123 128 7 90 84 101 113 8 84 104 100 88 9 96 100 103 105 10 105 114 105 112 11 113 117 132 130 12 Restructure data lC11T5 <- C11T5 %>% gather(key = month, value = score, -id) %>% mutate(monthnum = month %>% recode("Months30" = 0, "Months36" = 6, "Months42" = 12, "Months48" = 18), linear = monthnum %>% recode(0 = -3, 6 = -1, 12 = 1, 18 = 3), quadratic = monthnum %>% recode(0 = 1, 6 = -1, 12 = -1, 18 = 1), cubic = monthnum %>% recode(0 = -1, 6 = 3, 12 = -3, 18 = 1), monthfac = month %>% factor(ordered = TRUE)) # print first and last 5 observations from dataset head(lC11T5) %>% kable() id month score monthnum linear quadratic cubic monthfac 1 Months30 108 0 -3 1 -1 Months30 2 Months30 103 0 -3 1 -1 Months30 3 Months30 96 0 -3 1 -1 Months30 4 Months30 84 0 -3 1 -1 Months30 5 Months30 118 0 -3 1 -1 Months30 6 Months30 110 0 -3 1 -1 Months30 tail(lC11T5) %>% kable() id month score monthnum linear quadratic cubic monthfac 43 7 Months48 128 18 3 1 1 Months48 44 8 Months48 113 18 3 1 1 Months48 45 9 Months48 88 18 3 1 1 Months48 46 10 Months48 105 18 3 1 1 Months48 47 11 Months48 112 18 3 1 1 Months48 48 12 Months48 130 18 3 1 1 Months48 Visualize relationships It’s always a good idea to look at your data. Check some assumptions. Children scores + means and 95% confidence intervals at each time point lC11T5 %>% ggplot(mapping = aes(x = month, y = score)) + geom_point(position = position_jitter(0.1)) + stat_summary(geom = "point", fun.data = mean_cl_normal, color = "red", size = 2) + stat_summary(geom = "errorbar", fun.data = mean_cl_normal, color = "red", width = 0) + theme_bw() QQ-plots Do observations look normal at each time point? lC11T5 %>% ggplot(mapping = aes(sample = score)) + geom_qq() + facet_wrap(facets = ~ month, scales = "free") + theme_bw() Variance-covariance matrix among 4 timepoints Does this look like compound symmetry? C11T5 %>% select(-id) %>% cov(use = "pairwise.complete.obs") %>% kable() Months30 Months36 Months42 Months48 Months30 188.0000 154.36364 127.3636 121.18182 Months36 154.3636 200.54545 143.6364 97.45455 Months42 127.3636 143.63636 178.0000 168.09091 Months48 121.1818 97.45455 168.0909 218.00000 Test polynomial contrasts with pooled error term Linear regression method Below I ID use a blocking factor. This could seem nuts, but this approach “controls” for (i.e., partials out variance due to) subject ID. The polynomial contrast tests are equivalent to those you’d see from a special function or command for repeated measures ANOVA. I’ll demonstrate that later in this post. lm01 <- lm(score ~ linear + quadratic + cubic + id, data = lC11T5) ANOVA results anova(lm01) ## Analysis of Variance Table ## ## Response: score ## Df Sum Sq Mean Sq F value Pr(>F) ## linear 1 540 540.00 8.8833 0.005369 ** ## quadratic 1 12 12.00 0.1974 0.659722 ## cubic 1 0 0.00 0.0000 1.000000 ## id 11 6624 602.18 9.9063 1.314e-07 *** ## Residuals 33 2006 60.79 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Interpretation We found evidence that children’s age-normed general cognitive score increased linearly over the course of 18 months, F(1, 33) = 8.88, p = .005. However, we found no sufficient evidence for a quadratic trend, F(1, 33) = 0.20, p = .660, nor a cubic trend, F(1, 33) < .001, p > .99. Get results via algebraic formulas Below, I test the same polynomial contrasts using algebraic equations. There’s no need to follow every last piece of code or understand each formula. I’m presenting this just to be thorough. If you’re curious, these formulas are explained in Chapter 11 of Maxwell, Dalaney, and Kelly (2018). # contrasts linear <- c(-3, -1, 1, 3) quadratic <- c(1, -1, -1, 1) cubic <- c(-1, 3, -3, 1) # estimate (mean difference) linest <- mean(as.matrix(C11T5[, -5]) %*% linear) quadest <- mean(as.matrix(C11T5[, -5]) %*% quadratic) cubest <- mean(as.matrix(C11T5[, -5]) %*% cubic) est <- c(linest, quadest, cubest) # number of participants n <- nrow(C11T5) # sum of squares for the contrast linssc <- n * linest^2 / sum(linear^2) quadssc <- n * quadest^2 / sum(quadratic^2) cubssc <- n * cubest^2 / sum(cubic^2) ssc <- c(linssc, quadssc, cubssc) # mean squared error mse <- anova(lm(score ~ linear + quadratic + cubic + id, data = lC11T5))$Mean Sq[5]

# F-ratio
fratio <- ssc / mse

# number of levels (i.e., timepoints)
nlvls <- nlevels(lC11T5$monthfac) # degrees of freedom df1 <- 1 df2 <- (nlvls - 1) * (n - 1) # p-value p <- 1 - pf(q = fratio, df1 = df1, df2 = df2) # alpha alpha <- 0.05 # F critical fcrit <- qf(p = 1 - alpha, df1 = df1, df2 = df2) # standard error linse <- sqrt(mse * sum(linear^2 / n)) quadse <- sqrt(mse * sum(quadratic^2 / n)) cubse <- sqrt(mse * sum(cubic^2 / n)) se <- c(linse, quadse, cubse) # lower and upper bound lwr <- c(linest, quadest, cubest) - sqrt(fcrit) * se upr <- c(linest, quadest, cubest) + sqrt(fcrit) * se # table of results tibble(est, se, fratio, p, lwr, upr) ## # A tibble: 3 x 6 ## est se fratio p lwr upr ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 30 10.1 8.88 0.00537 9.52 50.5 ## 2 -2 4.50 0.197 0.660 -11.2 7.16 ## 3 0 10.1 0 1 -20.5 20.5 Get the same results using afex::aov_car Fit model aovcar1 <- aov_car(score ~ monthfac + Error(id / monthfac), data = lC11T5) # equivalent # aovez1 <- aov_ez(id = "id", dv = "score", data = lC11T5, within = "monthfac") # aovlmer1 <- aov_4(score ~ monthfac + (1 + monthfac | id), data = lC11T5) Test polynomial contrasts with separate error term Long way 1. Subset each column of data that represents a given timepoint 2. Multiply that entire column of data by the contrast weight 3. Add up the columns (makes one column of data) 4. Test whether the mean of those values = 0 (i.e., one-sample t-test) t.test(C11T5[, "Months30"] * -3 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * 1 + C11T5[, "Months48"] * 3) ## ## One Sample t-test ## ## data: C11T5[, "Months30"] * -3 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * 1 + C11T5[, "Months48"] * 3 ## t = 2.2414, df = 11, p-value = 0.04659 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 0.5403653 59.4596347 ## sample estimates: ## mean of x ## 30 t.test(C11T5[, "Months30"] * 1 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * -1 + C11T5[, "Months48"] * 1) ## ## One Sample t-test ## ## data: C11T5[, "Months30"] * 1 + C11T5[, "Months36"] * -1 + C11T5[, "Months42"] * -1 + C11T5[, "Months48"] * 1 ## t = -0.46749, df = 11, p-value = 0.6493 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## -11.416264 7.416264 ## sample estimates: ## mean of x ## -2 t.test(C11T5[, "Months30"] * -1 + C11T5[, "Months36"] * 3 + C11T5[, "Months42"] * -3 + C11T5[, "Months48"] * 1) ## ## One Sample t-test ## ## data: C11T5[, "Months30"] * -1 + C11T5[, "Months36"] * 3 + C11T5[, "Months42"] * -3 + C11T5[, "Months48"] * 1 ## t = 0, df = 11, p-value = 1 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## -12.69584 12.69584 ## sample estimates: ## mean of x ## 0 Short way %*% conducts a dot product of the two matrices (i.e., a time point and a contrast). This means it sums up the products of the corresponding entries of the two sequences of numbers. This is equivalent to the method above: Apply contrast weights to each column and add up the results. t.test(as.matrix(C11T5[, -5]) %*% linear) ## ## One Sample t-test ## ## data: as.matrix(C11T5[, -5]) %*% linear ## t = 2.2414, df = 11, p-value = 0.04659 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 0.5403653 59.4596347 ## sample estimates: ## mean of x ## 30 t.test(as.matrix(C11T5[, -5]) %*% quadratic) ## ## One Sample t-test ## ## data: as.matrix(C11T5[, -5]) %*% quadratic ## t = -0.46749, df = 11, p-value = 0.6493 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## -11.416264 7.416264 ## sample estimates: ## mean of x ## -2 t.test(as.matrix(C11T5[, -5]) %*% cubic) ## ## One Sample t-test ## ## data: as.matrix(C11T5[, -5]) %*% cubic ## t = 0, df = 11, p-value = 1 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## -12.69584 12.69584 ## sample estimates: ## mean of x ## 0 Custom function for testing via both methods This function takes “wide-form” data and a contrast as input and it outputs a matrix with two columns: p-values and which testing approach they came from, pooled or separate error term. wscont <- function(wdata, cont) { # long data ldata <- stack(wdata) colnames(ldata) <- c("y", "t") # add id factor; make time a factor ldata$id <- factor(rep(1:nrow(wdata), times = ncol(wdata)))
ldata$t <- factor(ldata$t, ordered = TRUE)

# separate variance contrast p-value
svp <- t.test(as.matrix(wdata) %*% cont)$p.value # fit model fit <- lm(y ~ t + id, data = ldata) # extract p-value from linear contrast pvp <- summary.aov(fit, split = list(t = list(lin = 1)))[[1]][, "Pr(>F)"]["lin"] # output cbind(sepvar = c(0, 1), p = c(pvp, svp)) } Simulate false positive error rates that result from both approaches Maxwell, Dalaney, and Kelly (2018) refer to these approaches as the univariate approach and the multivariate approach. To make the labels more descriptive, I refer to each approach by the error term it uses: pooled vs. seperate (MDK2018 do too) True Covariance Matrix Here’s an example of a covariance matrix that satisifes the homogenous change score variance assumption. All timepoint variances are equal and so are the covariances between timepoints = 0. This is also called an identitiy matrix. (truecov <- diag(1, 4, 4)) ## [,1] [,2] [,3] [,4] ## [1,] 1 0 0 0 ## [2,] 0 1 0 0 ## [3,] 0 0 1 0 ## [4,] 0 0 0 1 Sample data Below I generate data that represent 10 subjects’ scores across 4 times points. The mean score at each timepoint is 0, and the correlations among timepoints stem from the true matrix I saved above. # set random seed so results can be reproduced set.seed(8888) dat <- data.frame(mvrnorm(n = 10, mu = c(0, 0, 0, 0), Sigma = truecov)) Sample test Here I test out the function I made above. wscont(wdata = dat, cont = linear) ## sepvar p ## lin 0 0.12806078 ## 1 0.04189252 Generate data that satisfy compound symmetry Compound Symmetric Covariance Matrix (cscovmat <- diag(0.5, nrow = 4, ncol = 4) + 0.5) ## [,1] [,2] [,3] [,4] ## [1,] 1.0 0.5 0.5 0.5 ## [2,] 0.5 1.0 0.5 0.5 ## [3,] 0.5 0.5 1.0 0.5 ## [4,] 0.5 0.5 0.5 1.0 Save true means over time truemeans <- c(100, 100, 100, 100) Data 1. Create a function returns a data.frame with the requested means and variance-covariance structure (compound symmetry in this case) 2. Save a data.frame with 10,000 n = 50 replications # function gendatafun <- function(truecov, ms, n) { data.frame(mvrnorm(n = n, mu = ms, Sigma = truecov)) } # data generation simdata1 <- do.call(rbind, replicate(n = 10000, gendatafun(truecov = cscovmat, ms = truemeans, n = 50), simplify = FALSE)) Generate data with the same variance-covariance structure as example data Save covariance matrix from data used in Chapter 11, Table 5 of Maxwell, Dalaney, and Kelley (2018) (c11t5covmat <- cov(C11T5[, -5])) ## Months30 Months36 Months42 Months48 ## Months30 188.0000 154.36364 127.3636 121.18182 ## Months36 154.3636 200.54545 143.6364 97.45455 ## Months42 127.3636 143.63636 178.0000 168.09091 ## Months48 121.1818 97.45455 168.0909 218.00000 Data These data have a variance-covariance structure like the Chapter 11, Table 5 data. # data generation simdata2 <- do.call(rbind, replicate(n = 10000, gendatafun(truecov = c11t5covmat, ms = truemeans, n = 50), simplify = FALSE)) Add replicate IDs I’ll use this column later to subset each replication. simdata1$i <- rep(1:10000, each = 50)
simdata2\$i <- rep(1:10000, each = 50)

Get p-values via custom wscont() function

In this last line, I add a column, compsymm, that flags which population variance-covariance matrix the data came from (0 = C11T5 matrix, 1 = compound symmetry).

simp1 <- 1:10000 %>% map_df(function(rep) {
simdata1 %>%
filter(i == rep) %>%
select(-5) %>%
wscont(cont = linear) %>%
as_tibble() %>%
mutate(compsymm = 1)
})

simp2 <- 1:10000 %>% map_df(function(rep) {
simdata2 %>%
filter(i == rep) %>%
select(-5) %>%
wscont(cont = linear) %>%
as_tibble() %>%
mutate(compsymm = 0)
})

Plot false positive errors

p < .05 should only occur 5% of the time

bind_rows(simp1, simp2) %>%
mutate(sig = ifelse(p < .05, 1, 0)) %>%
group_by(compsymm, sepvar) %>%
summarise(falsepos = mean(sig)) %>%
ungroup() %>%
mutate(sepvar = sepvar %>% recode(0 = "Pooled Error Term", 1 = "Separate Error Term"),
compsymm = compsymm %>% recode(0 = "Unequal Variances & Covariances", 1 = "Compound Symmetry")) %>%
ggplot(mapping = aes(x = compsymm, y = falsepos, fill = sepvar)) +
geom_bar(stat = "identity", position = position_dodge(0.9)) +
geom_hline(yintercept = 0.05, color = "red") +
scale_y_continuous(breaks = seq(0, 0.5, 0.01)) +
theme_bw() +
theme(legend.position = "top")

Interpretation

Above, I generated 20,000 datasets with known population variance-covariance matrices and columns means. For each dataset, I tested the linear contrast across time using both error term approaches and saved the p-values. When the true population variance-covariance matrice satisfied compound symmetry, both approaches’ contrast test statistics give practically the same results, on average, but when the true population variance-covariance matrice violated compound symmetry – a very common scenario in real datasets – the pooled error term test statistics resulted in false positives over 14% of the time; the separate error term test statistics resulted in false positives only 5% of the time.

Resources

• Boik, R. J. (1981). A priori tests in repeated measures designs: Effects of nonsphericity. Psychometrika, 46(3), 241-255.
• Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing experiments and analyzing data: A model comparison perspective (3rd ed.). Routledge.
• DESIGNING EXPERIMENTS AND ANALYZING DATA includes computer code for analyses from their book, Shiny apps, and more.

General word of caution

Above, I listed resources prepared by experts on these and related topics. Although I generally do my best to write accurate posts, don’t assume my posts are 100% accurate or that they apply to your data or research questions. Trust statistics and methodology experts, not blog posts.

Nicholas M. Michalak
Ph.D. Candidate, Social Psychology

I study how both modern and evolutionarily-relevant threats affect how people perceive themselves and others.