---
title: "SOCI 620: Lab 7"
author: "Peter McMahan"
date: "2/1/2022"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
```
# Calculating deviance and information criteria
Calculating information criteria in R is simple. AIC, BIC, DIC, WAIC, and others are included in the base installation of R.
To see how it is done, we will estimate some models using the US income data.
```{r get_income_data}
d <- read.csv("https://soci620.netlify.com/data/USincome_subset_2016.csv")
# standardize age and prestige
d$age.s <-scale(d$age)
d$occ_income.s <- scale(d$occ_income)
head(d)
```
We will predict log income using standardized age, and occupational prestige. But first, we need to do somethign about the cases with missing data. We'll talk about other strategies for dealing with missing data later in the course, but for now we'll just drop them. The `complete.cases()` function is useful for this
```{r complete_cases}
# drop incomplete cases
comp <- complete.cases(d[c('log_income','age.s','occ_income.s')])
summary(comp)
d <- d[comp,]
```
```{r fit1}
fit1 <- quap(alist(
log_income ~ dnorm(mu,sigma),
mu <- a + b_age*age.s,
a ~ dnorm(9,3),
b_age ~ dnorm(0,3),
sigma ~ dunif(0,3)
),data=d)
precis(fit1)
```
```{r fit2}
fit2 <- quap(alist(
log_income ~ dnorm(mu,sigma),
mu <- a + b_occ*occ_income.s,
a ~ dnorm(9,3),
b_occ ~ dnorm(0,3),
sigma ~ dunif(0,3)
),data=d)
precis(fit2)
```
```{r fit12}
fit12 <- quap(alist(
log_income ~ dnorm(mu,sigma),
mu <- a + b_age*age.s + b_occ*occ_income.s,
a ~ dnorm(9,3),
b_occ ~ dnorm(0,3),
b_age ~ dnorm(0,3),
sigma ~ dunif(0,3)
),data=d)
precis(fit12)
```
Look at the deviance for all three models.
```{r deviance_three_times}
deviance(fit1)
deviance(fit2)
deviance(fit12)
```
Even though the coefficients for age and occupational prestige are almost identical, the latter provides a much better fit with the data.
Next we will compute the AIC and WAIC for the three models.
```{r aic_three_times}
AIC(fit1)
AIC(fit2)
AIC(fit12)
```
```{r waic_three_times}
WAIC(fit1)
WAIC(fit2)
WAIC(fit12)
```
Notice that in this case, the two measures are very similar. This is because we have relatively uninformative priors, many observations, and our posterior is approximately normal.
We can also use the `compare` function to easily compare multiple WAIC values.
```{r waic_compared}
compare(fit1,fit2,fit12)
```
The model with both predictors is the unambiguous winner.
We can use this method to see which is a better predictor of income, college education or high-school education (controlling for age, occupational prestige, and gender).
```{r completer_cases}
# get complete cases across all of the variables in the data
d_c <- d[complete.cases(d),]
dim(d_c)
```
```{r fit_a_b}
fit_a <- quap(alist(
log_income ~ dnorm(mu,sigma),
mu <- a + b_age*age.s + b_occ*occ_income.s + b_f*female + b_col*ed_college,
a ~ dnorm(9,3),
b_occ ~ dnorm(0,3),
b_age ~ dnorm(0,3),
b_f ~ dnorm(0,3),
b_col ~ dnorm(0,3),
sigma ~ dunif(0,3)
),data=d_c)
fit_b <- quap(alist(
log_income ~ dnorm(mu,sigma),
mu <- a + b_age*age.s + b_occ*occ_income.s + b_f*female + b_hs*ed_highschool,
a ~ dnorm(9,3),
b_occ ~ dnorm(0,3),
b_age ~ dnorm(0,3),
b_f ~ dnorm(0,3),
b_hs ~ dnorm(0,3),
sigma ~ dunif(0,3)
),data=d_c)
compare(fit_a,fit_b)
```