---
title: "Assignment 2, SOCI 620, Winter 2022"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
```
_Due Tues, Feb 1_
In this assignment, you will be looking at income inequality in a cross-national perspective. The data for the assignment is available from .
0. Load the data into R using the following commands. This data has a number of missing values, which we will ignore for the moment. The second line in the code below will remove any rows with missing values in the `latest_gini` or `colonized` columns. (A description of the variables and the data sources are listed at the bottom of this document).
```{r data_load}
d <- read.csv('https://soci620.netlify.app/data/dev_and_col.csv')
d <- d[!is.na(d$latest_gini) & !is.na(d$colonized),]
```
1. The Gini index is a measure of resource inequality in a population. A value of 0 would mean that each individual in the population has exactly the same amount of that resource, while a population in which one person had all of the resources would have a Gini value close to 100 (a Gini index of exactly 100 is impossible in a finite population). The variable `latest_gini` is the most recent World Bank's estimate the country-level inequality in income.
Create a density plot for `latest_gini` and describe what you observe.
2. Build a Gaussian model of income inequality that will estimate the average Gini index and the standard deviation in the Gini index across countries. Write out the full model, including all priors and any stochastic or deterministic relationships.
Then, estimate the model using the `quap()` function from the `rethinking` package. Describe the posterior distributions of the mean and standard deviation, including the mean of the marginal posteriors and 90% credible intervals. What do these mean in plain language?
3. The variable `colonized` contains an indicator of whether the country "was a dependency ruled by a foreign power before achieving independence" (quote from original codebook of Hensel, 2018). We want to see if there is a systematic difference in income inequality between former colonies and countries that were never colonized.
Adapt your model from the last question to create a regression model allowing for different average inequalities for colonized versus never-colonized nations. Again, write out the full model, including all priors and any stochastic or deterministic relationships.
Estimate this model using `quap()`. Describe the posterior distributions the regression coefficients, including the mean of the marginal posteriors and 90% credible intervals. What do these mean in plain language?
4. Use the `extract.samples()` function to draw a few thousand samples from the joint posterior of the regression model you just estimated. Use the sample to recreate the posterior mean and 90% credible intervals from the previous question. Do they match up?
Use the same sample to create a single figure comparing the posterior density of average inequality in for the two types of countries (former colonies and non-colonized countries). Do these posterior densities tell the same story as the results from the previous question?
5. Create another regression model using a variable of your choice from the data set (see variable descriptions below). The new model should include both `colonized` and your new choice of variable as predictors. Estimate the model using `quap()`, and describe the marginal posterior distributions of your parameters. What does this expanded model tell you that the previous model did not?
_Note: Several of the variables in the data set have missing values. You may need to exclude rows from the dataset that have a missing value (`NA`) in the covariate you choose. See step (0) above for one way to do this._
### Data description ###
-------------------------------------------------------------------------------
Variable Description Source
-------------------------------- -------------------------------- -------------
`country_name` Name of country World Bank
`country_code` Three-letter country code World Bank
`latest_gini` Most recent World bank estimate World Bank
of income inequality since 2010
(Gini index)
`latest_gini_year` Year of Gini index estimate World Bank
`population_2015` Total population (2015) World Bank
`population_growth_2015` Percent population growth (2015) World Bank
`population_density_2015` Population per sq. km (2015) World Bank
`population_pct_urban_2015` Percent urban pop. (2015) World Bank
`population_pct_immigrant_2015` Percent immigrant pop. (2015) World Bank
`gdp_usd2010_2015` GDP in 2010 USD (2015) World Bank
`gdp_percap_usd2010_2015` GDP per capita in 2010 USD World Bank
(2015)
`fertility_rate_2015` Births per woman (2015) World Bank
`independence_year` Year of national independence ICOW
`independece_from` Country Independence won from ICOW
`independence_violent` Violent independence indicator ICOW
`colonized` Was colony before independence ICOW
`primary_colonial_ruler` Primary colonial ruler ICOW
-------------------------------------------------------------------------------
[World Bank data from World Development Indicators database (accessed January 22, 2021).](https://databank.worldbank.org/reports.aspx?source=World-Development-Indicators)
[ICOW data from R. Hensel (2018). "ICOW Colonial History Data Set, version 1.1."](http://www.paulhensel.org/icowcol.html)