This notebook will walk you through calculating an approximate posterior distribution for the unemployment rate in Newfoundland and Labrador.

Start by simulating the entire population...

In [1]:

```
# Let's pretend there are 354,949 employed and 65,108 unemployed adults
pop_unemp <- sample(rep(c('E','U'),times=c(354949, 65108)))
# (the sample() function with argument just shuffles the vector)
```

Take a sample of 50 individuals.

In [2]:

```
ssize <- 50
samp <- sample(pop_unemp,ssize,replace=FALSE)
# count employed and unemployed
table(samp)
```

samp E U 42 8

Recall that our goal is to estimate the relative probability that the unemployment rate is equal to $p$ for all possible values of $p$. For now, we will just consider 11 possible values $p\in(0.1, 0.2, …, 1.0)$.

In [3]:

```
n_grid <- 11
grid <- seq(from=0,to=1,length.out=n_grid)
print(grid)
```

[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Remember: to calculate the (proportional) posterior $Pr(p|\mathtt{ssize},\mathtt{samp})$ for a particular point on the grid (e.g. $p=0.3$), we simply calculate the *likelihood* of the data at $p=0.3$, and our *prior* probability that $p=0.3$.

Start with the prior, which for now we will just set as uniform. This vector just says that we think each of the 11 possible values of $p$ is equally likely—i.e. that we don't have any a priori belief about unemployment.

In [4]:

```
prior <- rep(1,times=n_grid)
print(prior)
```

[1] 1 1 1 1 1 1 1 1 1 1 1

Now calculate the likelihood. As we saw in the slides, the likelihood in this situation is the Binomial probability mass function (PMF). R has a function `dbinom()`

that calculates the binomoial PMF.

In [5]:

```
# we are counting unemployment as a binomial 'success'
#print(samp)
#print(samp=='U')
n_unemp <- sum(samp=='U')
print(n_unemp)
lik <- dbinom(n_unemp,size=ssize,prob=grid)
print(cbind(grid,round(lik,5)))
```

Note that we did something a little tricky with R there. Normally, `dbinom`

is though of as taking a fixed value for `size`

and `prob`

, and calculating different values of the PMF for different success counts. The *support* of the binomial distribution is all counts between zero and the sample size.

But R is good with vectors. When we asked it for `dbinom(n_unemp,size=ssize,prob=grid)`

the first two arguments (`n_unemp`

and `ssize`

) were both single numbers, while the third argument (`grid`

) was a vector as big as our grid. What R did in this case was to construct `n_grid`

*different* binomial distributions, one for each point on our grid, and calculate the probability of seeing `n_unemp`

successes for that particular distribution.

We can visualize this by plotting the binomial distribution for each value of $p$ along our grid. The large dot is the sample we observe.

In [6]:

```
# this command tells jupyter what dimensions to use for its plots.
options(repr.plot.width=12, repr.plot.height=8)
layout(matrix(1:12,nrow=4,byrow=TRUE))
k <- seq(ssize)
pch <- rep(1,ssize)
pch[n_unemp] <- 16
cex <- rep(.3,ssize)
cex[n_unemp] <- 1.2
for(p in grid){
plot(k,dbinom(k,ssize,p),type='b',pch=pch,cex=cex,ylab='',main=p)
}
layout(1)
```

Now that we have a prior and likelhood, we can just multiply them together and normalize them to get our posterior grid approximation:

In [7]:

```
posterior <- prior*lik
posterior <- posterior/sum(posterior)
plot(grid,posterior,type='b',pch=16,xlab='p',ylab='',main="Posterior approximation")
```

That looks accurate enough, but pretty sparse. Let's repeat all that but with `n_grid`

=100.

In [8]:

```
n_grid <- 100
grid <- seq(from=0,to=1,length.out=n_grid)
prior <- rep(1,times=n_grid)
lik <- dbinom(n_unemp,size=ssize,prob=grid)
posterior <- prior*lik
posterior <- posterior/sum(posterior)
plot(grid,posterior,type='b',pch=16,xlab='p',ylab='',main="Posterior approximation")
```

How does this change if we alter the prior probability for $p$?

First, let's pretend that we are absolutely certain that the unemployment rate is no higher than 20%, and we want to enforce that by building it into our model.

In [9]:

```
# dunif() is like dbinom(), but for the uniform distribution rather than the binomial distribution
prior <- dunif(grid,min=0,max=0.1)
print(prior)
# plot that prior
plot(grid,prior,type='b',pch=16,xlab='p',ylab='',main="Prior approximation")
```

Apply that extreme prior to our data:

In [10]:

```
lik <- dbinom(n_unemp,size=ssize,prob=grid)
posterior <- prior*lik
posterior <- posterior/sum(posterior)
plot(grid,posterior,type='b',pch=16,xlab='p',ylab='',main="Posterior approximation")
```

As one last example of an extreme prior, pretend we believe very deeply in our hearts that the true unemployment rate must be somewhere close to 100%. As very bad researchers, we could encode that belief into our model using a prior from a distribution called a 'beta' distribution. (The beta distribution with less extreme parameters is a very common prior for models like this).

In [11]:

```
prior <- dbeta(grid,50,2)
# plot the beta prior
plot(grid,prior,type='b',pch=16,xlab='p',ylab='',main="Prior approximation")
```