Assignment 2, SOCI 620, Winter 2021

Due Thu, Feb 4

In this assignment, you will be looking at income inequality in a cross-national perspective. The data for the assignment is available from https://soci620.netlify.com/data/dev_and_col.csv.

  1. Load the data into R using the following commands. This data has a number of missing values, which we will ignore for the moment. The second line in the code below will remove any rows with missing values in the latest_gini and colonized columns. (A description of the variables and the data sources are listed at the bottom of this document).
d <- read.csv('https://soci620.netlify.com/data/dev_and_col.csv')
d <- d[!is.na(d$latest_gini) & !is.na(d$colonized),]
  1. The Gini index is a measure of resource inequality in a population. A value of 0 would mean that each individual in the population has exactly the same amount of that resource, while a population in which one person had all of the resources would have a Gini value close to 100 (a Gini index of exactly 100 is impossible in a finite population). The variable latest_gini is the most recent World Bank’s estimate the country-level inequality in income.

    Create a density plot for latest_gini and describe what you observe.

    (1 point)

  2. Build a Gaussian model of income inequality that will estimate the average Gini index and the standard deviation in the Gini index across countries. Write out the full model, including all priors and any stochastic or deterministic relationships.

    Then, estimate the model using the quap() function from the rethinking package. Describe the posterior distributions of the mean and standard deviation, including the mean of the marginal posteriors and 90% credible intervals. What do these mean in plain language?

    (2 points)

  3. The variable colonized contains an indicator of whether the country “was a dependency ruled by a foreign power before achieving independence” (quote from original codebook of Hensel, 2018). We want to see if there is a systematic difference in income inequality between former colonies and countries that were never colonized.

    Adapt your model from the last question to create a regression model allowing for different average inequalities for colonized versus never-colonized nations. Again, write out the full model, including all priors and any stochastic or deterministic relationships.

    Estimate this model using quap(). Describe the posterior distributions the regression coefficients, including the mean of the marginal posteriors and 90% credible intervals. What do these mean in plain language?

    (2 points)

  4. Use the extract.samples() function to draw a few thousand samples from the joint posterior of the regression model you just estimated. Use the sample to recreate the posterior mean and 90% credible intervals from the previous question. Do they match up?

    Use the same sample to create a single figure comparing the posterior density of average inequality in for the two types of countries (former colonies and non-colonized countries). Do these posterior densities tell the same story as the results from the previous question?

    (2 points)

  5. Create another regression model using a variable of your choice from the data set (see variable descriptions below). The new model should include both colonized and your new choice of variable as predictors. Estimate the model using quap(), and describe the marginal posterior distributions of your parameters. What does this expanded model tell you that the previous model did not?

    (3 points)

    Note: Several of the variables in the data set have missing values. You may need to exclude rows from the dataset that have a missing value (NA) in the covariate you choose. See step (0) above for one way to do this.

Data description

Variable Description Source
country_name Name of country World Bank
country_code Three-letter country code World Bank
latest_gini Most recent World bank estimate of income inequality since 2010 (Gini index) World Bank
latest_gini_year Year of Gini index estimate World Bank
population_2015 Total population (2015) World Bank
population_growth_2015 Percent population growth (2015) World Bank
population_density_2015 Population per sq. km (2015) World Bank
population_pct_urban_2015 Percent urban pop. (2015) World Bank
population_pct_immigrant_2015 Percent immigrant pop. (2015) World Bank
gdp_usd2010_2015 GDP in 2010 USD (2015) World Bank
gdp_percap_usd2010_2015 GDP per capita in 2010 USD (2015) World Bank
fertility_rate_2015 Births per woman (2015) World Bank
independence_year Year of national independence ICOW
independece_from Country Independence won from ICOW
independence_violent Violent independence indicator ICOW
colonized Was colony before independence ICOW
primary_colonial_ruler Primary colonial ruler ICOW

World Bank data from World Development Indicators database (accessed January 22, 2021).

ICOW data from R. Hensel (2018). “ICOW Colonial History Data Set, version 1.1.”