---
title: "Problem Set 4"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
library(pander)
```
_Due Tue, Feb 15_
In this assignment you will investigate fatality of police shootings for Black victims in the United States. You will use data collected by reporters at VICE News on police shootings in the 50 largest police departments in the US between 2010 and 2016 for the article [Shot by cops and forgotten](https://news.vice.com/en_us/article/xwvv3a/shot-by-cops). The data, along with some limited documentation, was made available on [VICE News' Github repository](https://github.com/vicenews/shot-by-cops).
# Q1: The data (*3 points*)
## 1A Download
> *First, download the data directly from Github (https://raw.githubusercontent.com/vicenews/shot-by-cops/master/incident_data.csv) and save it as a data frame.*
## 1B Limit observations
> *To simplify the analysis, limit the observations to those with only one officer and one victim ('subject' in the variable names).*
## 1C Create indicators
> *Create two indicator variables: one indicating whether the shooting was fatal for the victim, and one indicating whether the victim is Black. (Note: normally, you would include indicators for all but one of the racial categories, but we will create a simple dichotomous categorization here.) Be sure to look at all the values of the relevant variables in the downloaded data to make sure you properly colde missing data (`table()` can be very useful for this).*
# Q2: Simple logistic regression (*6 points*)
> *Create, specify, and estimate a logistic regression model predicting fatality in a police shooting by whether the victim is Black. According to this model, what is the estimated probability that a non-Black victim will die after being shot by the police? What is the 90% credible interval on that probability? What is the estimated probability and 90% credible interval for Black victims? Discuss the results.*
# Q3: Armed victims (*7 points*)
> *Curious to investigate the dynamics of the disparity you observed in this simple model, you decide to include a moderator variable for whether the victim was armed. To do so, create, specify, and estimate a model that predicts fatality in police shootings based on whether the victim was Black, whether the victim was armed, and the interaction between these variables. You will need to create an indicator variable for whether the victim was armed. Report the expected probability of death (with credible intervals) for unarmed non-Black, unarmed Black, armed non-Black, and armed Black victims. What do you conclude about the moderating effect of being armed?*
# Q4: Comparing models (*4 points*)
> *Finally, you want look more closely at the role of the interaction term you included in the previous model. Specify and estimate a final model that includes the same predictor variables as before (whether the victim was Black and whether the victim was armed), but do **not** include the interaction term. Use the Widely Applicable Information Criterion (WAIC) to discuss the relative predictive power of the model with and without the interaction. (Note: you will need to include the extra argument `log_lik=TRUE` in your `ulam()` command for any models you want to calculate the WAIC forâ€”you can add it to the code for your previous responses). What do you conclude about the interaction term. Should you include it in the model? Why or why not?*