SOCI 620: Quantitative methods 2

Agenda

Transformations & assessing fit with predictive plots

  1. Administrative
  2. Interpretation with
    transformed variables
  3. Prior predictive plots
  4. Visualizing model
    predictions
  5. Hands on:
    Visualizing model
    predictions in R

Slides are licensed under CC BY-NC-SA 4.0

Transfor­mations

Black and white photo of Alfred Hitchcock standing in a suit in front of a funhouse mirror, his image in the mirror distorted to make him look tall and twisted.

Montreal bike traffic

Photo of a blue monolith emergining from the snowy ground next to a bike path. It has a green digital display that alleges to count the number of bikes that have passed by this day and this year, along with the date and time. It also says 'Montreal' and 'REV MTL'.

Daily 2024 ridership at the 27 highest-traffic bike counters

Interpreting coefficients

Predicting ridership using temperature

Post. mean

878.28

90.96

If the temperature changes from to , how much do we expect ridership to change? Note: and

Interpreting coefficients

Predicting ridership using temperature

Post. mean

878.28

90.96

Units of temperature:

Degrees celsius

Units of ridership:

Number of riders

Interpretation of :

For every increase of one degree celsius, the model predicts an average of 90.96 more riders per day at each bike counter

Standardized variables

Predicting St(ridership) using St(temperature)

Post. mean

0.0

0.644

Units of temperature:

Standard deviations of temperature

Units of ridership:

Standard deviations of ridership

Interpretation of :

For every increase of one standard deviation of temperature, the model predicts an average of 0.644 more standard deviations of ridership per day at each bike counter

Log of outcome variable

Predicting Log(ridership) using temperature

Post. mean Exp(mean)

6.12 454.47

0.085 1.089

If the temperature changes from to , how much do we expect ridership to change?

Log of outcome variable

Log of outcome variable

Predicting Log(ridership) using temperature

Post. mean Exp(mean)

6.12 454.47

0.085 1.089

Units of temperature:

Degrees celsius

Units of ridership:

Log ridership

Interpretation of :

For every increase of one degree celsius, the model predicts an average increase of 8.9% in ridership per day at each bike counter

Prior and posterior predictive plots

Vintage sepia photograph of a knife thrower and assistant. The assistant is standing against a board with knives piercing the board all round her.

Prior predictive plot

Prior predictive plots allow you to visualize the implications of a set of priors

Prior predictive plot

12.45 2.58

Prior predictive plot

12.45 2.58
8.01 0.51

Prior predictive plot

12.45 2.58
8.01 0.51
12.55 1.42

Prior predictive plot

12.45 2.58
8.01 0.51
12.55 1.42
8.01-3.34

Prior predictive plot

12.45 2.58
8.01 0.51
12.55 1.42
8.01-3.34
18.19 3.85

Prior predictive plot

12.45 2.58
8.01 0.51
12.55 1.42
8.01-3.34
18.19 3.85
13.11 3.24
11.01 0.66
15.54 1.36
8.97-4.10
8.11-0.42
8.48 3.76

Visualizing predictions

To better illustrate uncertainty in posterior estimates, we will use a random subsample of 400 counter–days.

Visualizing predictions

Post. median 80% post. interval

6.877 (6.823, 6.931)

0.940 (0.887, 0.996)

0.841 (0.804, 0.981)

Posterior distribution of mean:

  1. Take a sample of size from posterior .
  2. For each value of standardized temperature , calculate values .
  3. Calculate quantiles (say, 10% and 90%) of the posterior draws at each value of .

Visualizing predictions

Post. median 80% post. interval

6.877 (6.823, 6.931)

0.940 (0.887, 0.996)

0.841 (0.804, 0.981)

Posterior predictive distribution:

  1. Take a sample of size from posterior .
  2. For each value of standardized temperature , calculate values .
  3. For each of the posterior draws for temperature , draw .
  4. Calculate quantiles (say, 10% and 90%) of the posterior draws at each value of .

Posterior mean vs. predicted

For any given mean temperature, is the value of log ridership that is “expected” by the model on a day with that temperature.

The posterior distribution of describes our uncertainty about the value of after seeing the data.

This distribution takes into account the model coefficients and , but not the model standard deviation .

The 80% posterior interval of should get narrower as more data is added.

For any given mean temperature, the posterior predictive distribution describes the range of riderships we would expect for any day with that temperature.

The posterior predictive distribution describes our uncertainty about the value of after seeing the data.

This distribution takes into account all the model parameters: , , and .

The 80% posterior predictive interval should contain about 80% of the data, and should not get appreciably narrower as more data is added.

Posterior mean vs. predicted

Image credit

Figures by Peter McMahan (source code)

Black and white photo of Alfred Hitchcock standing in a suit in front of a funhouse mirror, his image in the mirror distorted to make him look tall and twisted.

Alfred Hitchcock in Funhouse Mirror by Globe Photos

Photo of a blue monolith emergining from the snowy ground next to a bike path. It has a green digital display that alleges to count the number of bikes that have passed by this day and this year, along with the date and time. It also says 'Montreal' and 'REV MTL'.

Photo by Peter McMahan

Vintage sepia photograph of a knife thrower and assistant. The assistant is standing against a board with knives piercing the board all round her.

Photo via Flickr user Midnight Believer