---
title: "SOCI 620: Lab 17"
author: "Peter McMahan"
date: "3/16/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
library(pander)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
## Indexing variables within models
The ability to use *indexes* to refer to specific components of complex parameters is one of the biggest benefits of the probabilistic model specification we have been using all sememster. As we saw in the slides, indices (represented there by subscripts) allowed for easy specification of fixed-effects models and, more imporantly, hierarchical (mulilevel) models.
To see how this translates into R code, we will load the Tennesee STAR data and get a subset of 30 classes of grade-one students.
```{r data}
d <- read.csv("https://soci620.netlify.com/data/t_star_student_grade1.csv")
# take a look at the data to see how it's structured
head(d,15)
```
To make sure we have sparse enough data to make our point, we will grab 30 of the classrooms at random from the data.
```{r subsample}
# get rid of cases with no reading score
d <- d[!is.na(d$reading_score),]
# list of all distinct teacher ids
all_teacher_ids <- unique(d$teacher_id)
length(all_teacher_ids)
# get 30 of them
subsamp_teacher_ids <- sample(all_teacher_ids,30)
subsamp_teacher_ids
# subset d
d <- d[d$teacher_id %in% subsamp_teacher_ids,]
head(d)
```
```{r student_counts}
# how many students
dim(d)
# how many students per classroom
plot(table(d$teacher_id))
```
### 'Fixed-effects' (no pooling) models
With this index constructed, we are ready to specify a fixed-effects (a.k.a. no pooling) model. We will use the simple fixed-intercepts model from the slides:
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \alpha_k\\
\\
\alpha_k &\sim \mathrm{Norm}(500,100)\\
\sigma &\sim \mathrm{Cauchy}(0,50)
\end{aligned}
$$
```{r model_fe}
# we can use *either* the `teacher_id` (factor) variable
# or the `class_id` (integer) variable to index groups.
model_fe <- bf(
reading_score ~ -1 + as.factor(teacher_id)
)
get_prior(model_fe,data=d)
prior_fe <- c(
prior(normal(500,100),class="b"),
prior(cauchy(0,50),class="sigma")
)
```
This specifies a "fixed effects" model like we saw in the slides. The `-1` term in the formula tells brms to omit the global intercept, and `as.factor()` is a way to force R to think of the integer-valued variable `teacher_id` as category labels rather than numeric measures.
We can fit the model easily using `brm()`:
```{r fit_fe, cache=TRUE}
fit_fe <- brm(model_fe,data=d, prior=prior_fe)
```
```{r fit_fe_summary}
summary(fit_fe)
# `fixef` gets coefficient estimates
pander(fixef(fit_fe))
```
### Partial pooling (multilevel) model
Specifying a multilevel model is quite easy in brms.
$$
\begin{aligned}
S_{ik} &\sim \mathrm{Norm}(\mu_{ik},\sigma)\\
\mu_{ik} &= \alpha_k\\
\\
\alpha_k &\sim \mathrm{Norm}(\gamma,\eta)\\
\\
\sigma &\sim \mathrm{Cauchy}(0,50)\\
\gamma &\sim \mathrm{Norm}(500,100)\\
\eta &\sim \mathrm{Cauchy}(0,50)
\end{aligned}
$$
```{r model_pp}
model_pp <- bf(
reading_score ~ 1 + (1 | teacher_id)
)
get_prior(model_pp, data=d)
```
```{r prior_pp}
# now we have 'group' values for our priors
prior_pp <- c(
prior(cauchy(0,50),class='sigma'),
prior(normal(500,100),class="Intercept"),
prior(cauchy(0,50),class='sd')
)
```
```{r fit_pp, cache=TRUE}
fit_pp <- brm(model_pp, data=d,prior=prior_pp, iter = 6000)
```
```{r fit_pp_summary}
summary(fit_pp)
```
The `ranef()` function retrievs the (poorly named) "random effects". These are the second-level coefficients in our model.
```{r randef_pp}
ranef(fit_pp)
# to get just the coefficeints we're looking for, we need
# to 'drill down' into this object
pander(ranef(fit_pp)$teacher_id[,,'Intercept'])
```