Multinomial logistic regression

We'll be using the US microdata sample that we have seen before to predict the association between income and marital status.

What categories exist for marital status?

That's more fine-grained than we want. We can make four categories from these six by putting "Divorced" and "Separated" in one category and "Married, spouse absent" and "Married, spouse present" in another category.

At the same time, let's code the new variable so that married respondents are the reference category. (Choosing a reference category is arbitrary as far as the model is concerned, so it should be based on your specific research question.)

When recoding a categorical variable it is always a good idea to look at a cross-tabulation to make sure you made no mistakes.

Estimating with brms

Define the model using R's formula syntax and the bf() function from brms:

Priors in brms are specified separately with the prior() function. You can see the 'default' priors that would be used with get_prior():

This table can be a little confusing to read! Each row represents a different parameter or class of paramters in the model.

The fifth row, for instance, tells us the prior for mu2, which is the intercept for category 2: student_t(3,1,2.5).

The fourth row shows us the prior for log_income for category 2. But it's blank! By default, brms uses 'improper' flat priors for coefficients. These should amost always be changed!

The rows that do not specify a particular coefficient or intercept are 'classes' of priors. This allows you to set priors for "all of the coefficients" or "all of the intercepts."

Here we'll define a prior of $\mathrm{Norm}(0,2)$ for all of the intercepts and $\mathrm{Norm}(0,3)$ for all of the coefficients on log_income:

We 'fit' the model using the brm() function (this can be slow!):

Aside from the sign and confidence of the estimates, it is very hard to interpret these coefficients on their own. We can say that income is negatively associated with being single, divorced, or widowed in comparison with being married. That is, as income increases, the model predicts a higher probability of being married rather than single, of being married rather than divorced, and of being married rather than widowed.

We can get a better idea of the specific relationships by predicting outcomes for specific sets of covariates. We'll start by comparing people that make \$10,000 per year to people that make \\$100,000 per year.

The predict() function makes it very easy to talk about specific cases (like $10k vs $100k incomes):