---
title: "SOCI 620: Lab 19"
author: "Peter McMahan"
date: "2023-03-21"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
# Random slopes
This lab will show you how to estimate a random-slopes model using `brms`, and, as a bonus, in the frequentist package `lme4`.
### Get the data
We'll use the Tennessee STAR data again, but this time limiting to a 50% random sample to speed up calculations:
```{r data}
d <- read.csv("https://soci620.netlify.com/data/t_star_student_grade1.csv")
dim(d)
# for the sake of speed, we will use a 50% sample
d <- d[runif(nrow(d)) <= 0.5,]
dim(d)
head(d)
```
Since we will be using student age as a predictor, we will need to create that variable. In the previous lab, we standardized age to have a mean of zero and a standard deviation of one. This time, we will simply *center* the student age. This means it will have a mean of zero, but the units will still be "years." We'll use the same trick as last time where we treat the negative year of birth as an *uncentered* measure of age.
```{r center_age}
# define a function that centers any vector around its mean
center <- function(x){
return(x - mean(x, na.rm = TRUE))
}
# apply that function to the negative year of birth
d$age_c <- center(-d$born)
plot(density(d$age_c,na.rm=TRUE))
```
### Specify the model
Our model is:
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \beta_{0k} + \beta_{1k}Age_i \\
\\
\beta_{0k} &= \gamma_{00} + \eta_{0k}\\
\beta_{1k} &= \gamma_{10} + \eta_{1k}\\
\\
\begin{bmatrix}\eta_{0k}\\\eta_{1k}\end{bmatrix} &\sim \mathrm{MVNorm}\left(\begin{bmatrix}0\\0\end{bmatrix}, \Phi\right)\\
\\
&\mathrm{(fixed\ priors\ omitted)}
\end{aligned}
$$
Note, we haven't talked about priors on covariance matrices yet — we'll cover that in the next class. For now we will just let `brms` pick the priors for us.
```{r model_1}
model_1 <- bf(
reading_score ~ age_c + (1 + age_c | teacher_id)
)
```
### Estimate
```{r fit_1, cache=TRUE}
# since we're using default priors, we only
# need to give it the model and the data!
fit_1 <- brm(model_1, data=d, cores = 4, chains = 4)
```
```{r fit_1_summary}
summary(fit_1)
```
The "Group-Level Effects" section of the summary gives us the estimates of our entire covariance matrix.
## Bonus: frequentist comparison
Though we're approaching multilevel models (and statistical models in general) from a Bayesian perspective in this class, it is possible to estimate this same model in a frequentist context. The most popular R package for this is called `lme4` (where the LME stands for "Linear Mixed Effects"). Instead of using MCMC for the estimation, this package uses _restricted maximum likelihood_, which is similar to the maximum a-posteriori methods we were using at the beginning of the term, but with carefully constructed priors (they don't call them priors, of course) to aid in model convergence.
First, we will need to load the package (and install it if necessary). Installing `lme4` is usually painless.
```{r load_lme4}
# check for LME4, and try to install it if it's not already there
if(!require(lme4)){
install.packages('lme4')
library(lme4)
}
```
Now we can proceed very similarly to the `brms` case. We define the model identically -- except that there is no `bf` function analog to use:
```{r model_lme4}
model_lme4 <- reading_score ~ age_c + (1 + age_c | teacher_id)
```
And then we estimate with the `lmer()` function. (This will probably give you a warning about 'singular' fits, which is just a result of omitted classrooms in our subsample).
```{r fit_lme4}
fit_lme4 <- lmer(model_lme4, data=d)
summary(fit_lme4)
```
Wow, that was fast! Though notice the lack of any kind of confidence/credible intervals -- for those you need to go through more steps with the `confint` function.
For now, let's compare the population level estimates:
```{r compare_fits}
# brms fit
fixef(fit_1)
# lme4 fit
fixef(fit_lme4)
```
The mean estimates are nearly identical. We can also get 95% 'confidence' intervals for the `lme4` fit:
```{r lme4_confint}
# specifying parm='beta_' tells it to just get confidence
# intervals for the 'fixed effects' parameters
confint(fit_lme4, parm='beta_',level=0.95)
```
These also look nearly identical to the credible intervals from `brms`. For simple models like this, the similarity should not be at all surprising.