## Bayesian regressions¶

In this notebook, we will learn how to estimate linear regression models in a Bayesian framework. This all may seem like a lot of extra work since lm() is fast and easy to use, but learning these techniques on a model you are familiar with will help later on when we start working with more complex models (I promise).

Let's build our first model. We'll make an alist describing a model predicting log income as normally distributed, but with a mean that depends on a person's gender.

(Note: the US census, which provided this data, does not collect information on non-binary gender—everyone is either a man or a woman. This is an unfortunate limitation of a great deal of existing survey data and is only just beginning to change. Although I recoded and formatted this data, I left the label "female" as it is in the raw data.)

The big difference between this model and models we've been using so far is that mu is defined using <- instead of ~. In the rethinking package and in Stan, the "tilde" ~ is used for stochastic or random relations, while the "assignment arrow" <- is used for deterministic relations. This tells the model that log_income is considered random, even if you know mu and sigma, but that you can calculate mu exactly as long as you have speific values for a, b, and female.

We can estimate this model using quap():

The precis() function will describe the marginal posterior distributions for each of our parameters, producing output similar to a regular regresssion:

This means that the mean of the marginal posterior for a is about 10.496.

We can extract the coefficients from this object to get the expected income for men and women, conditional on the data we have seen.

What about the joint posterior? We can tell something about it by examining the variance-covariance matrix of the coefficients. A variance-covariance matrix tells you the variance (standard deviation squared) of each variable in a joint distribution, along with the covariance of each pair of variables. (Covariance is similar to correlation, but it has not been standardized to fit into [-1,1]).

In the above variance-covariance matrix, the elements along the diagonal tell us that $Var(a)=0.01527501$, $Var(b)=0.02882057$, and $Var(\sigma)=0.00358973$ (these are just the standard deviations reported above, squared). But more interestingly, it tells us that a and b have a pretty significant covariance $Cov(a,b)=-0.01527478$. This means that higher values for a tend to go along with lower values for b, and vice versa.

We can look a little more closely at this distribution by taking a sample from it.