Examining fit

This lab will introduce two ways of inspecting the "fit" of a model: posterior mean plots and posterior predictive plots.

Start by loading the data on 2016 annual income, and taking sample of 150 individuals to make our posterior distributions less precise.

We can start by building a simple model that predicts a person's log-income using just their age.

If the estimate for age seems small, that is because we are measuring age in years and the range of ages in the data is large.

Let's re-estimate using standardized age.

We can start to look at the predictions of our model by plotting our "point estimate" of $\mu$ as a function of standardized age.

This shows us the overall trend predicted by the data, but ignores the uncertainty in our estimates of $\mu$ that comes from uncertainty in a and b1.

One simple way to try to express this uncertainty is to draw the same line as above multiple times, using draws from our posterior rather than the maximum a-posteriori estimates.

This gives us an idea of the spread of uncertainty in our model, but can be hard to read or interpret. A common approach here is to replace the pile of spaghetti in this plot with a shaded region that will show, say, the 90% credible interval for our posterior.

The procedure for this is straightforward:

  1. For every point along grid, draw a large number of posterior samples (say 1000) for a and b1.
  2. Calculate the value of $\mu$ for each of those samples.
  3. Calculate a credible interval on $\mu$ for each point long grid.

There are lots of ways to do this in R, and the rethinking package has some tools to make it easier. But we will first do it "manually" to make it clear what is going on.

Looks good! But a bit of a pain. Let's try the same thing using some functions from the rethinking package: