---
Title: "SOCI 620: Worksheet 1"
Author: "your name here"
due: 2023-02-07
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
```
# Introduction
In this assignment, you will be looking at income inequality in a cross-national perspective. The data for the assignment is available from . This CSV combines data from the World Bank's [World Development Indicators database (accessed January 22, 2021)](https://databank.worldbank.org/reports.aspx?source=World-Development-Indicators) and R. Hensel's (2018) ["ICOW Colonial History Data Set, version 1.1."](http://www.paulhensel.org/icowcol.html). Variable descriptions are available at the end of this document.
# Part 1: Explore the data
**Question 1.1:** Load the data directly from the URL provided above into a data frame named `dev`.
```{r q1.1}
(your code here)
```
**Question 1.2:** Take a look at this data. How many rows and columns does it have? What does each row represent (i.e. what is the unit of observation)?
```{r q1.2}
(your code here)
```
(your response here)
**Question 1.3:** The Gini index is a measure of resource inequality in a population. A value of 0 would mean that each individual in the population has exactly the same amount of that resource, while a population in which one person had all of the resources would have a Gini value close to 100 (a Gini index of exactly 100 is impossible in a finite population). The variable `latest_gini` is the most recent World Bank's estimate the country-level inequality in income.
Create a density plot for `latest_gini` and describe what you observe. Note: you will need to handle missing values in some way.
```{r q1.3}
(your code here)
```
(your description here)
# Part 2: Build a Gaussian model
**Question 2.1:** Build a Gaussian model of income inequality that will estimate the average Gini index and the standard deviation in the Gini index across countries. This model should be stored in a variable named `model1`, using the `alist` notation for the `rethinking` package. Be sure to specify the full model, including all priors and any stochastic or deterministic relationships.
```{r q2.1}
(your code here)
```
**Question 2.2:** Estimate `model1` using the `quap()` function from the `rethinking` package, storing the fit in a variable `fit1`. Describe the posterior distributions of the mean and standard deviation, including the mean of the marginal posteriors and 90% credible intervals. What do these mean in plain language? Note: you will again need to deal with missing values.
```{r q2.2}
```
# Part 3: Adding covariates
**Question 3.1:** The variable `colonized` contains an indicator of whether the country "was a dependency ruled by a foreign power before achieving independence" (quote from original codebook of Hensel, 2018). We want to see if there is a systematic difference in income inequality between former colonies and countries that were never colonized.
Adapt your model from part 2 question to create a regression model allowing for different average inequalities for colonized versus never-colonized nations. Again, write out the full model, including all priors and any stochastic or deterministic relationships. Store this model in a variable named `model2`
```{r q3.1}
```
**Question 3.2:** Estimate this model using `quap()`, storing the fit model in a variable named `fit2`. Describe the posterior distributions the regression coefficients, including the mean of the marginal posteriors and 90% credible intervals (again, you'll have to deal with missing values). What do these mean in plain language?
```{r q3.2}
```
**Question 3.3:** Use the `extract.samples()` function to draw 8,000 samples from the joint posterior of the regression model you just estimated (`fit2`). Store the sample in a variable named `samp2`. Use the sample to recreate the posterior mean and 90% credible intervals from the previous question. Do they match up?
```{r q3.3}
```
**Question 3.4:** Use the same sample (`samp2`) to create a single figure comparing the posterior density of average inequality for the two types of countries (former colonies and non-colonized countries). Do these posterior densities tell the same story as the results from question 3.2?
```{r q3.4}
```
# Part 4: Expanding the model
**Question 4.1:** Create another regression model using a variable of your choice from the data set (see variable descriptions below). The new model should include both `colonized` and your new choice of variable as predictors, and be stored in a variable named `model3`. Estimate the model using `quap()`, storing the result as `fit3`, and describe the marginal posterior distributions of your parameters. What does this expanded model tell you that the previous model did not?
_Note: Several of the variables in the data set have missing values. You may need to exclude rows from the dataset that have a missing value (`NA`) in the covariate you choose._
```{r q4.1}
```
**Question 4.2:** Now use WAIC to compare how well your `model3` fits the data in comparison to `model2` (which used only `colonized` as a predictor). Note: you will need to re-fit `model2` using the same data as you did for `model3` (assuming you introduced missing values). Store this new fit in a variable named `fit2_sub`. Which model has a 'better' WAIC? Which model do you think you would prefer for an analysis?
```{r q4.2}
```
# Appendix: Data description
-------------------------------------------------------------------------------
Variable Description Source
-------------------------------- -------------------------------- -------------
`country_name` Name of country World Bank
`country_code` Three-letter country code World Bank
`latest_gini` Most recent World bank estimate World Bank
of income inequality since 2010
(Gini index)
`latest_gini_year` Year of Gini index estimate World Bank
`population_2015` Total population (2015) World Bank
`population_growth_2015` Percent population growth (2015) World Bank
`population_density_2015` Population per sq. km (2015) World Bank
`population_pct_urban_2015` Percent urban pop. (2015) World Bank
`population_pct_immigrant_2015` Percent immigrant pop. (2015) World Bank
`gdp_usd2010_2015` GDP in 2010 USD (2015) World Bank
`gdp_percap_usd2010_2015` GDP per capita in 2010 USD World Bank
(2015)
`fertility_rate_2015` Births per woman (2015) World Bank
`independence_year` Year of national independence ICOW
`independece_from` Country Independence won from ICOW
`independence_violent` Violent independence indicator ICOW
`colonized` Was colony before independence ICOW
`primary_colonial_ruler` Primary colonial ruler ICOW
-------------------------------------------------------------------------------