Using sample weights and setting priors in brms

In this lab, we will look a little more at model specification in brms and then use brms to demonstrate the importance and use of sampling weights.

We will use a slightly different subset of the Tennessee STAR data that includes information on the teachers.

Building a model

To demonstrate how this sampling weights, we will specify a simple regression predicting student math test score using data on their class size and their race and ethnicity. The formula part of the model specification is straightforward:

Before we try to estimate this, we want to figure out how to set appropriate priors. The function get_prior is very useful for figuring out how to indicate priors for different parameters.

By default, the model gives each intercept term a Student-t prior, but we will want to over-ride that. It leaves the rest of the piors blank, indicating an 'improper' flat prior over the entire real number line.

You can use the class, coef, and dpar values of the different parameters to micro-manage your priors. We will set one prior for the intercept, identical priors for all of the coefficients, and a half-cauchy prior for sigma:

Sampling weights

The Tennessee STAR data did not use any kind of stratified sampling, but we can simulate stratified sampling by taking a subsample of the students.

We start by calculating a sampling probability for each student. The students in the full sample are overwhelmingly white, and there are almost no students who are neither white nor Black. To make sure our subsample captures the under-represented races and ethnicities, we will sample white students with a probability of 3% Black students with a probability of 6% and everyone else with a probability of 100%.

We can use these sampling probabilities to take a stratified sample of the full population. We also want to create a variable indicating the 'weight' of each student — this can be thought of as the number of students in the full population that each sampled student is supposed to represent.

This is done in R by drawing a random number between 0 and 1 for each student using runif(). Then we only keep rows for which that random number is less than the sampling probability.

Remember that the sampling weight is the inverse of the sampling probability ($w = 1/p$)

For comparison, we will also take a subsample withuniform probability, and therefore no weights.

Estimates on a uniform sample

We will start by estimating the model that predicts student test scores based on race/ethnicity from the unweighted subsample.

Estimates on a stratified sample, using sampling weights

We estimate the same model, but incorporating sample weights and using the stratified sample data