Working with random samples

Samples from random variables are central to Bayesian data analysis. As we discussed in class, the exact posterior distribution for a given model is most often difficult or impossible to calculate. But it is relatively straightforward to calculate a non-normalized function that is proportional to the exact posterior. For decades, researchers have been coming up with clever ways to use that non-normalizied posterior to generate random samples from the exact posterior. The upshot of this is that as statisticians, we can say just about anything we want to about a distribution using just a large sample from the posterior.

Start by taking a sample of 2,000 values from a Beta distribution with parameters $\alpha=2,\beta=50$: $\mathrm{Beta}(2,50)$

(As a side note, the Beta distribution is the exact posterior for a binomial model like our unemployment model if $p$ has a Beta prior. Because $\mathrm{Beta}(1,1)$ is just the uniform distribution, $\mathrm{Beta}(2,50)$ is the posterior with a uniform prior if we have seen exactly 49 employed people and 1 unemployed person.)

Use a histogram to get a sense of the shape of the distribution.

Another way to see the overall shape of a distribution is using R's density() function, which takes a random sample from a continuous distribution and tries to construct a 'smooth' probability density function from it. By default density() assumes the sample is from an unbounded continuous random variable. If you have strict upper and lower bounds like we do (Beta is bounded in [0,1]), it is important to let the function know this information. (What happens if you leave off the from and to arguments in the code below?)

Caclulating the mean and median of the sample is simple, but calculating the mode is a little more complicated

We can use R's quantile function to easily calculate percentile intervals. We should expect the percentile interval to be approximately centered around the median.

The highest posterior density interval is more complex, but we can use the HPDI() function from the rethinking package. (Note: this function is just an easier way to call the HPDinterval() function in the coda package). We should expect the HPDI estimate to be approximately centered around the mode, and so a little to the left of the percentile interval.