---
title: "SOCI 620: Lab 13"
author: "Peter McMahan"
date: "2/24/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
library(pander)
# this will tell brms to use as many cores as your computer has
options(mc.cores = parallel::detectCores())
```
# Problems with sampling
We'll be looking at some of the things that can go wrong with Monte-Carlo sampling (in particular, with HMC and the No U-Turn Sampling we'll be using).
To start, load the adolescent delinquency dataset we've looked at before:
```{r data}
d <- read.csv("https://soci620.netlify.app/data/delinquency.csv")
head(d)
```
## Trace plot
```{r traceplot_fit, cache=TRUE}
m0 <- bf(ever_smoked ~ male, family=bernoulli)
f0 <- brm(m0,data=d,cores=4,chains=4,iter=150)
```
```{r traceplot1}
summary(f0)
plot(f0)
```
## Not enough observations
As a simple example of what can go wrong with sampling, let's pretend we only have a sample of a few people but we want to predict adolescent smoking. We'll use gender and grade as predictors, and restrict ourselves to an sample of just the first three respondents. We'll also leave our priors 'improper' and flat.
```{r tiny_sample, cache=TRUE}
tiny_d <- d[1:3,]
m1 <- bf(ever_smoked ~ male + grade, family=bernoulli)
f1 <- brm(m1,data=tiny_d,cores=4,chains=4,refresh=0)
```
```{r tiny_sample_summary}
summary(f1)
```
Those 'Rhat' values (well above 1.0) show that something's wrong. So do the coefficient estimates. To see what's going on we can look at the traceplots.
```{r tiny_sample_traceplot}
plot(f1)
```
```{r tiny_sample_traceplot2}
plot(f1,variable='b_male')
```
What if we add a prior?
```{r tiny_sample_prior, cache=TRUE}
m1 <- bf(ever_smoked ~ male + grade, family=bernoulli)
prior1 <- c(
prior(normal(0,1.5),class="Intercept"),
prior(normal(0,1.0),class="b")
)
f1prior <- brm(m1,prior=prior1,data=tiny_d,cores=4,chains=4,refresh=0)
```
```{r tiny_sample_withprior_summary}
summary(f1prior)
plot(f1prior)
```
Because our data is lacking any substanive information on gender or grade, those coefficients are being sampled from the prior.
## Model misspecification
Estimation can also go wrong if your model is under-determined or otherwise misspecified. One obvious way that this can happen is with colinear predictors. To see what this looks like, let's pretend that we want
```{r colinear, cache=TRUE}
d$race_nonwhite <- 1-d$race_white
m2 <- bf(ever_smoked ~ male + grade + race_white + race_nonwhite, family=bernoulli)
f2 <- brm(m2,data=d[1:400,],cores=4,chains=4,refresh=0)
```
```{r colinear_summary}
summary(f2)
```
```{r colinear_plot}
plot(f2)
plot(f2,variable=c('b_Intercept','b_race_white','b_race_nonwhite'))
plot(f2,variable=c('b_male','b_grade'))
```
## Proper convergence
For comparison, we can look at the model with a reasonable sample size and a well-specified model.
```{r full, cache=TRUE}
m3 <- bf(ever_smoked ~ male + grade + race_white, family=bernoulli)
f3 <- brm(m3,prior=prior1,data=d[1:400,],cores=4,chains=4,refresh=0)
```
```{r colinear_summary}
summary(f3)
```
```{r colinear_plot}
plot(f3)
plot(f3,variable=c('b_Intercept','b_race_white'))
plot(f3,variable=c('b_male','b_grade'))
```