This lab will show you how to estimate a random-slopes model using brms
.
#####
# set up the environment
#####
# load packages
library(brms)
options(mc.cores = parallel::detectCores())
# adjust the plot region for viewing on the projector in class
options(repr.plot.width=12, repr.plot.height=8)
Loading required package: Rcpp Loading 'brms' package (version 2.14.4). Useful instructions can be found by typing help('brms'). A more detailed introduction to the package is available through vignette('brms_overview'). Attaching package: ‘brms’ The following object is masked from ‘package:stats’: ar
We'll use the Tennessee STAR data again, but this time limiting to a 50% random sample to speed up calculations:
d <- read.csv("https://soci620.netlify.com/data/t_star_student_grade1.csv")
dim(d)
# for the sake of speed, we will use a 20% sample
d <- d[runif(nrow(d)) <= 0.5,]
dim(d)
head(d)
student_id | teacher_id | class_type | class_size | born | female | race_ethnicity | re_white | re_black | re_asian | re_hispanic | re_native_american | re_other | reading_score | math_score | listening_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <dbl> | <lgl> | <chr> | <lgl> | <lgl> | <lgl> | <lgl> | <lgl> | <lgl> | <int> | <int> | <int> | |
3 | 3 | 3 | 2 | 21 | 1979.307 | TRUE | Black | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | 483 | 526 | 529 |
4 | 4 | 4 | 3 | 21 | 1980.322 | FALSE | Black | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | 516 | 505 | 556 |
5 | 5 | 4 | 3 | 21 | 1978.416 | TRUE | Black | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | 433 | 463 | 504 |
7 | 7 | 6 | 1 | 17 | 1980.820 | TRUE | Black | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | 471 | 444 | 496 |
8 | 8 | 7 | 1 | 15 | 1979.038 | TRUE | White | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | 487 | 484 | 531 |
9 | 9 | 8 | 2 | 22 | 1979.134 | FALSE | Black | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | 503 | 505 | 541 |
Since we will be using student age as a predictor, we will need to create that variable. In the previous lab, we standardized age to have a mean of zero and a standard deviation of one. This time, we will simply center the student age. This means it will have a mean of zero, but the units will still be "years." We'll use the same trick as last time where we treat the negative year of birth as an uncenetered measure of age.
# define a function that centers any vector around its mean
center <- function(x){
return(x - mean(x, na.rm = TRUE))
}
# apply that function to the negative year of birth
d$age_c <- center(-d$born)
plot(density(d$age_c,na.rm=TRUE))
Our model is:
$$\begin{aligned} S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\ \mu_{ik} &= \beta_{0k} + \beta_{1k}Age_i \\ \\ \beta_{0k} &= \gamma_{00} + \eta_{0k}\\ \beta_{1k} &= \gamma_{10} + \eta_{1k}\\ \\ \begin{bmatrix}\eta_{0k}\\\eta_{1k}\end{bmatrix} &\sim \mathrm{MVNorm}\left(\begin{bmatrix}0\\0\end{bmatrix}, \Phi\right)\\ \\ &\mathrm{(fixed\ priors\ omitted)} \end{aligned}$$Note, we haven't talked about priors on covariance matrices yet — we'll cover that in the next class. For now we will just let brms
pick the priors for us.
model_1 <- bf(
reading_score ~ age_c + (1+age_c | teacher_id)
)
# since we're using default priors, we only
# need to give it the model and the data!
fit_1 <- brm(model_1, data=d)
Warning message: “Rows containing NAs were excluded from the model.” Compiling Stan program... Start sampling
summary(fit_1)
Family: gaussian Links: mu = identity; sigma = identity Formula: reading_score ~ age_c + (1 + age_c | teacher_id) Data: d (Number of observations: 3139) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~teacher_id (Number of levels: 334) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS sd(Intercept) 29.74 1.51 26.84 32.81 1.00 1246 sd(age_c) 4.15 2.59 0.18 9.91 1.00 977 cor(Intercept,age_c) -0.49 0.40 -0.98 0.63 1.00 2722 Tail_ESS sd(Intercept) 2245 sd(age_c) 849 cor(Intercept,age_c) 1737 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 522.48 1.87 518.86 526.16 1.00 905 1728 age_c -11.13 1.88 -14.75 -7.40 1.00 5002 2648 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 47.07 0.63 45.82 48.34 1.00 4470 2887 Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
The "Group-Level Effects" section of the summary gives us the estimates of our entire covariance matrix.