---
title: "SOCI 620: Lab 14"
author: "Peter McMahan"
date: "3/08/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
library(pander)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
# Problems with sampling
We'll be looking at some of the things that can go wrong with Monte-Carlo sampling (in particular, with HMC and the No U-Turn Sampling we'll be using).
To start, load the adolescent delinquency dataset we've looked at before:
```{r data}
d <- read.csv("https://soci620.netlify.app/data/delinquency.csv")
head(d)
```
## Trace plot
Create a quick logistic model predicting smoking with gender, and estimate it with only 150 iterations on 4 chains. We'll get some warnings about posterior sample size, but that's ok -- we are just interested in an example of a posterior traceplot. We'll enforce using just one core (running the 4 chains one after another) to see what it's along the way.
```{r traceplot_fit, cache=TRUE}
m0 <- bf(ever_smoked ~ male, family=bernoulli)
f0 <- brm(m0,data=d,cores=1,chains=4,iter=150,warmup = 20)
```
Calling `plot()`on a `brms` fit object will give us traceplots for the parameters -- in this case just the intercept and the coefficient on `male`.
```{r traceplot1}
summary(f0)
plot(f0)
```
## Not enough observations
As a simple example of what can go wrong with sampling, let's pretend we only have a sample of a few people but we want to predict adolescent smoking. We'll use gender and grade as predictors, and restrict ourselves to an sample of just the first three respondents. We'll also leave our priors 'improper' and flat.
```{r tiny_sample, cache=TRUE}
tiny_d <- na.omit(d)[1:3,]
m1 <- bf(ever_smoked ~ male + grade, family=bernoulli)
f1 <- brm(m1,data=tiny_d,cores=4,chains=4,refresh=0)
```
```{r tiny_sample_summary}
summary(f1)
```
Those 'Rhat' values (well above 1.0) show that something's wrong. So do the coefficient estimates. To see what's going on we can look at the traceplots.
```{r tiny_sample_traceplot}
plot(f1)
pairs(f1)
```
There are two things going on there. First, the coeffcient on `male` is estimated wildly differently for each chain. This is because in our tiny sample, none of the respondents are male. This means we have no information in our data to inform the coefficient, and each value is just as likely as any other.
The other problem is more subtle. With only three data points to work with, our model finds some very extreme values of the `grade` coefficient to be perfectly likely as long as the intercept is adjust to offset them. So grade could have a huge positive effect as long as the intercept is large and negative, or vice versa.
What if we add priors to our model?
```{r tiny_sample_prior, cache=TRUE}
m1 <- bf(ever_smoked ~ male + grade, family=bernoulli)
prior1 <- c(
prior(normal(0,1.5),class="Intercept"),
prior(normal(0,1.0),class="b")
)
f1prior <- brm(m1,prior=prior1,data=tiny_d,cores=4,chains=4,refresh=0)
```
```{r tiny_sample_withprior_summary}
summary(f1prior)
plot(f1prior)
```
Because our data is lacking any substanive information on gender or grade, those coefficients are being sampled from the prior.
## Model misspecification
Estimation can also go wrong if your model is under-determined or otherwise misspecified. One obvious way that this can happen is with colinear predictors. To see what this looks like, let's pretend that we want to predict the effect of being white or nonwhite on the probability of smoking.
```{r colinear, cache=TRUE}
d$race_nonwhite <- 1-d$race_white
m2 <- bf(ever_smoked ~ male + grade + race_white + race_nonwhite, family=bernoulli)
f2 <- brm(m2,data=d[1:400,],cores=4,chains=4,refresh=0)
```
```{r colinear_summary}
summary(f2)
```
```{r colinear_plot}
plot(f2)
plot(f2,variable=c('b_Intercept','b_race_white','b_race_nonwhite'))
plot(f2,variable=c('b_male','b_grade'))
```
Here, `race_white` plus `race_nonwhite` will be equal to one no matter which respondent we look at, meaning that the intercept and the two race coefficients are acting like two separate intercepts. This means that (a) the two race coefficients will have nearly identical trace plots, and (b) that the trace plot for the intercept will also be approximately the same, but flipped around the 'true' intercept.
## Proper convergence
For comparison, we can look at the model with a reasonable sample size and a well-specified model.
```{r full, cache=TRUE}
m3 <- bf(ever_smoked ~ male + grade + race_white, family=bernoulli)
f3 <- brm(m3,prior=prior1,data=d[1:400,],cores=4,chains=4,refresh=0)
```
```{r colinear_summary}
summary(f3)
```
```{r colinear_plot}
plot(f3)
plot(f3,variable=c('b_Intercept','b_race_white'))
plot(f3,variable=c('b_male','b_grade'))
```