---
title: "SOCI 620: Lab 18"
author: "Peter McMahan"
date: "3/22/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
library(pander)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
## Random intercept models
Random intercept models are the simplest form of random-effects models (aka multilevel models or partial-pooling models). They allow you to say that each group has a different average outcome, but that the relationship between the outcome and the covariates is invariant across groups.
Multilevel (hierarchical) models also provide a framework for interpreting the change in an outcome variable as it relates to covariates at the level of both the individual observation and of the group. We will look at both cases here, in relation to reading test scores among grade-one students.
We will use all of the grade-one data in the Tennessee STAR data set, which has over 6000 students in more than 300 classes.
```{r data}
d <- read.csv("https://soci620.netlify.com/data/t_star_student_grade1.csv")
head(d)
```
## Random intercept-only model
To begin, we will use the same random intercept model (with *no* covariates) as we used in the previous lab:
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \alpha_k\\
\\
\alpha_k &\sim \mathrm{Norm}(\gamma,\eta)\\
\\
\sigma &\sim \mathrm{Cauchy}(0,50)\\
\gamma &\sim \mathrm{Norm}(500,100)\\
\eta &\sim \mathrm{Cauchy}(0,50)
\end{aligned}
$$
```{r model_interceptonly}
# model
model_interceptonly <- bf(
reading_score ~ 1 + (1 | teacher_id)
)
# prior
prior_interceptonly <- c(
prior(cauchy(0,50),class='sigma'), # σ
prior(normal(500,100),class="Intercept"), # 𝛾
prior(cauchy(0,50),class='sd') # η
)
```
```{r fit_interceptonly, cache=TRUE}
fit_interceptonly <- brm(model_interceptonly, data=d,prior=prior_interceptonly)
```
```{r fit_interceptonly_summary}
summary(fit_interceptonly)
```
Even without predictor variables, this model shows us that there is a significant amount of between-class variation (as estimated in `phi`).
### Including student-level covariates
We will start by seeing how students' age affects their test scores, taking into account inter-class variation.
```{r scale_age}
# construct a standardized age variable from the `born` variable
d$age_s <- scale( -d$born )
plot(density(d$age_s,na.rm=TRUE))
```
```{r model_age}
# model
model_age <- bf(
reading_score ~ 1 + age_s + (1 | teacher_id)
)
get_prior(model_age,data=d)
```
```{r prior_age}
prior_age <- c(
prior(cauchy(0,50),class='sigma'),
prior(normal(500,100),class="Intercept"),
prior(cauchy(0,50),class='sd'),
prior(normal(0,30),class='b') # for age
)
```
```{r fit_age, cache=TRUE}
fit_age <- brm(model_age, data=d,prior=prior_age)
```
```{r fit_age_summary}
summary(fit_age)
pander(fixef(fit_age))
```
Age has a negative relationship with test scores: or every increase of one standard deviation increase in age, the model predicts that student will score about five points lower.
### Including classroom-level covariates
Very often, it is important to be able to look at the relationship between an outcome variable and a covariate that only varies at the group level. This is exactly the case in the STAR data if we are interested in the effect of the experimental treatment. Do smaller classes lead to better test scores? A model that estimates this effect, allowing between-class variation inn average scores, be written mathematically (omitting priors for now) as
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \beta_{0k} + \beta_{1}Age_i\\
\\
\beta_{0k} &= \gamma_{0} + \gamma_1Small_k+ \eta_{0k}\\
\eta_{0k} &\sim \mathrm{Norm}(0, \phi_0)
\end{aligned}
$$
To determine how to specify this model in `brms`, note that we can substitute $\beta_{0k}$ into the individual-level linear model and rearrange, giving us
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \gamma_{0} + \gamma_1Small_k +\beta_{1}Age_i+ \eta_{0k}\\
\\
\eta_{0k} &\sim \mathrm{Norm}(\gamma_0, \phi_0)
\end{aligned}
$$
The classroom-level covariate can be specified as a student-level covariate in this case, so the changes to the formula are quite minimal:
```{r model_age_size}
# create an indicator variable for 'small' classes
d$small <- d$class_type == 1
# specify the model
model_age_size <- bf(
reading_score ~ 1 + age_s + small + (1 | teacher_id)
)
# look at what kind of prior we need
get_prior(model_age_size,data=d)
```
```{r prior_age_size}
prior_age_size <- c(
prior(cauchy(0,50),class='sigma'),
prior(normal(500,100),class="Intercept"),
prior(cauchy(0,50),class='sd'),
prior(normal(0,30),coef='age_s'), # for age
prior(normal(0,50),coef='smallTRUE') # for small classes
)
```
```{r fit_age_size, cache=TRUE}
fit_age_size <- brm(model_age_size, data=d,prior=prior_age_size)
```
```{r fit_age_size_summary}
summary(fit_age_size)
```