Ordered logistic regressions

Ordered logistic regressions allow us to model an outcome that is categorical but that for which the order of those categories matters. Examples include educational attainment (which we'll look at here), employment (unemployed, part-time employed, full-time employed), or, very commonly, Likert-scale survey responses.

We will use respondents' ethnicity as hispanic or latino to predict educational attainment among adults in the United States. First, load the data and create a categorical variable for educational attainment with levels of "None", "High School", and "College".

To formulate our model, we will use the cumulative family in brms. Using the R formula notation, brms hides a lot of the model. The deceptively simple specification ed_level ~ hispanic, when paired with family = cumulative(), corresponds to the model:

$$ \begin{aligned} \mathrm{ed\_level}_i &\sim \mathrm{Categorical}(p_1,\dots,p_k)\\ p_k &= q_k - q_{k-1}\\ \mathrm{logit}(q_k) &= \alpha_k - \phi_i\\ \phi_i &= \beta \mathrm{hispanic}_i \end{aligned} $$

That is, it specifies $k-1$ cutpoints, the logit link function, and the cumulative probability translation for creating appropriate probabilities for the categorical distribution.

Since we are using brms, we need to specify the priors separately. It's usually a good idea to first look at the options with get_prior().

The model can be estimated using brm(), but we need to be a little bit careful. Note that brm() automatically drops rows that have missing data within our model. This is convenient, but can be dangerous.

The estimates for Intercept[1] ($\alpha_1$) and Intercept[2] ($\alpha_2$) are a little hard to interpret on their own, but we could notice that they are pretty far apart from one another. This suggests that the middle category (high school) has a large overall probability in the model.

We can also see that the estimate for hispanic is negative, which says that hispanic respondents are predicted to have less education overall.

Using the predict() function, we can calculate probabilities based on these estimates.

More covariates

We can add a couple more covariates to see how that affects these calculations

Visualizations

The posterior_epred() function lets you make prediction about education across different income levels.