---
title: "SOCI 620: Lab 21"
author: "Peter McMahan"
date: "4/5/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
library(pander)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
# Multilevel generalized linear models
This lab will provide a basic introduction to MLGLMs by implementing the model from the slides. We'll begin by loading the data and creating a binary outcome variable measuring whether a student scored higher on their math test than on their reading test.
```{r data}
d <- read.csv("https://soci620.netlify.com/data/t_star_student_grade1_2l.csv")
# did they score better on the reading test?
d$math_better <- d$student_reading_score < d$student_math_score
# make (and center) student age
d$student_age_s <- scale(-d$student_born, center = TRUE, scale = FALSE)
# standardize teacher experience
d$teacher_exper_s <- scale(d$teacher_exper)
```
## Multilevel model
We begin by specifying our model in the 'multlevel' format:
$$
\begin{aligned}
M_{ik} &\sim \mathrm{Binomial}(1,p_{ik})\\
\mathrm{logit}(p_{ik}) &= \beta_{0k} + \beta_{1k}Age_i + \beta_{2k}Female_i\\
\\
\beta_{0k} &= \gamma_{00} + \gamma_{01}TExp_k + \gamma_{02}TFemale_k + \eta_{0k}\\
\beta_{1k} &= \gamma_{10} + \eta_{1k}\\
\beta_{2k} &= \gamma_{20} + \gamma_{21}TFemale_k + \eta_{2k}\\
\end{aligned}
$$
## Expanded model
We expand the model into a "mixed effects" format to see how we will specify it in R:
$$
\begin{aligned}
\mathrm{logit}(p_{ik}) &= \gamma_{00} + \gamma_{01}TExp_k + \gamma_{02}TFemale_k + \gamma_{10}Age_i + \gamma_{20}Female_i + \gamma_{21}TFemale_kFemale_i + \\
&\phantom{{}={}} \eta_{0k} + \eta_{1k}Age_i + \eta_{2k}Female_i
\end{aligned}
$$
```{r model_math}
model_math <- bf(
math_better ~
teacher_exper_s + teacher_female*student_female + student_age_s +
(1 + student_age_s + student_female | teacher_id),
family=bernoulli
)
```
```{r default prior}
get_prior(model_math,data=d)
```
```{r prior}
pr <- c(
set_prior("normal(0,2)",class="Intercept"),
set_prior("normal(0,1.5)",class="b"),
set_prior("cauchy(0,2)",class="sd"),
set_prior("lkj(2)",class="cor")
)
```
This is a standard estimation, but it illustrates some of the ways you can alter your sampling. In this case we have 8 chains, each running on a separate thread (or 'core'). Each chain will run for 2500 iterations, but only throw out the first 1000 of those iterations (rather than the default of half the iterations, in this case 1250). So this will yield $8 \times (2500 - 1000) = 12000$ total samples from the posterior.
```{r fit_math, cache=TRUE}
fit_math <- brm(model_math,data=d, prior=pr,
cores = 8, chains = 8,
iter = 2500,warmup=1000)
```
Let's look at the summary. `prob=0.95` tells R to report 95% credible intervals (instead of the default 90%), and `priors=TRUE` asks R to report the priors that were used
```{r fit_math summary}
summary(fit_math, priors=TRUE, prob=0.95)
```
`fixef` will report the population-level estimates, which are sometimes called the "fixed effects".
```{r fit_math fixef}
fixef(fit_math)
pander(fixef(fit_math))
```
`ranef` will report the group-level estimates (the estimates for each classroom), which are sometimes called "random effects."
```{r fit_math ranef}
fit_ranef <- ranef(fit_math)
```
The output from `ranef` can be a bit tricky to navigate, since this format needs to handle all kinds of complex model specifications. `fit_ranef` is a list, containing the group-level estimates for all of the grouping elements we included. Since we are only doing a two-level model, this means that the list only has one item in it, named `teacher_id`. We can access this using the `$` notation:
```{r ranef1}
# the size of each dimension
dim(fit_ranef$teacher_id)
# the names of each element across each dimension
dimnames(fit_ranef$teacher_id)
```
This tells us that there are 334 classrooms, each of which has 4 pieces of data, and each of which is associated with three different coefficient We can 'slice' this object to look at the results for just one classrom:
```{r ranef oneteacher}
fit_ranef$teacher_id['15917110',,]
```
Or just one coefficient across all classrooms:
```{r ranef conecoef}
fit_ranef$teacher_id[,,"student_age_s"]
```
We can also get the full posterior sample for each of these classrooms by specifying `summary=FALSE`
```{r ranef posteriorsample}
s <- ranef(fit_math,summary = FALSE)
dim(s$teacher_id)
```
This will return the full posterior sample for the intercept for classroom "15917110":
```{r ranef singleposteriorsample}
s$teacher_id[,'15917110',"Intercept"]
```
With these posterior samples, we can test very specific hypothese such as "what is the posterior probability that the intercept for classroom 15917110 is smaller than the intercept for classroom 15917110
```{r compareclassrooms}
mean(s$teacher_id[,'15917110',"Intercept"] < s$teacher_id[,'15917110',"Intercept"])
```