Predicting video game use

We will use another set of variables from the first wave of Add Health (collected in the United States from 1990 to 1991) to look at the time students spent playing video games.

Start by loading the data and looking at the first few rows to get a sense of it.

In additionn to demographic data on gender, grade, race, ethnicity, and family income, this data lists the number of hours per week the students report spending consuming various media. We will be looking at hours_games, which codes students' responses to the question "How many hours a week do you play video or computer games?"

Start by looking at an overall summary variable and seeing how many values are missing.

So to summarize, about half of the students say they played no video games. A significant number also reported spending a suspisciously large amount of time playing, at least for the 1990s (8 hours per day and 16 hours per day on the weekend is 72 hours).

Intercept-only model

Start by building a model with just an intercept to get a feel for the Poisson regression model:

$$ H_i \sim Pois(\lambda)\\ \log(\lambda) = \alpha\\ \alpha \sim Norm(\mbox{??},\mbox{??}) $$

Before we estimate this model, we need to figure out a good prior for α by looking at different priors' implications for λ. To get the implied prior for λ, you simply exponentiate the values of the prior for α. Performing this transformation on a normal distribution gives you a new distribution that is already built in to R: the log-normal distribution. This makes it easy to expiriment with.

Remember, our prior over $\alpha$ should allow for a range of possible values of $\lambda$, and $\lambda$ is the average number of hours per week across all of the students.

If we pick the second line from the above ploot ($\mathrm{Norm}(3,1)$), that will give the highest prioor likelihood to an average of 0 to 50 hours per week. But the 'fat' tail on the right side means that our prior isn't too opinionated, and it will not insist on a low value if the data disagrees with it.

Using that prior, we can estimate our model:

$\exp(\hat\alpha)$ is the median of the posterior distribution of the average rate of video game consumption by our students. This means that we might predict that an "average" student would play about 2.84 hours of video games per week.

We can see more by extracting samples.

Note, when using ulam(), it extracts samples during estimation. If you try to get more samples than it originally gathered (default 1000 ⨉ number of chains), it will give return NA for any beyond that limit. To get more you have to give it a value of iter that is at least twice the number of samples you want. (We will talk about why this is when we discuss MCMC and other Monte-Carlo methods.)

We fit four chains with 4000 iterations each, so we have 8000 samples to work with.

To get this in terms of actual rates, just exponentiate the posterior sample.

This tells us a little bit more: we are 90% confident that the average rate of game use among these students is between about 2.80 and 2.88 hours per week.

Adding a covariate: do boys play more?

We can us a Poisson regression to estimate the difference in weekly rates of video game playing by boys and girls (keeping in mind the problematic way that Add Health collects data on gender).

$$ H_i \sim Pois(\lambda_i)\\ \log(\lambda_i) = \alpha + \beta M_i\\ \alpha \sim Norm(4,1)\\ \beta \sim Norm(0,0.5)\\ $$

It is usally a good idea for your non-intercept coefficients to have a mean of zero. This is because $\exp(0)=1$, so a value of zero for $\beta$ would mean being male has no association.

For the standard deviation on a coefficient prior, you almost always want a pretty narrow distribution when using a Poisson regression. We'll use 0.5 here, but you can experiment with others. Prior predictive plots can be useful here.

This tells a very different story. Girls in the sample play an average of about 1.50 hours per week, while boys play about 2.83 times more than girls:

\begin{aligned} \exp(\alpha + \beta) &= \exp(\alpha)\times\exp(\beta)\\ \exp(0.38 + 1.10) &= \exp(0.38)\times\exp(1.10) \\ 1.46\times2.99&=4.37 \end{aligned}

Let's compare the posterior distriibutions of $\lambda$ for boys and girls.