SOCI 620: Quantitative methods 2

Agenda

Probability distributions & random samples

  1. Administrative
  2. Probability distributions
  3. From distributions to models
  4. Summarizing random variables
  5. Hands on: random samples in R

Slides are licensed under CC BY-NC-SA 4.0

Administrative

Worksheet 1 posted

  • Available in MyCourses
    Content > Worksheets > Worksheet 1
  • Download the linked .Rmd file to your computer and open in your preferred editor (e.g. RStudio)
  • When you’re done, upload to the same spot in MyCourses

Lab

  • First lab today, directly after class in Leacock 808 (one floor down)

Probability distributions

Illustration of a massive shark swimming upwards underwater toward a person swimming on the surface of the water at the top of the frame (detail from the poster for the move Jaws)

A discrete distribution

Probability mass function

E.g. the sum of two fair dice:
Categorical distribution

two dice

Support: Integers from 2 to 12 (discrete)

A continuous distribution

Probability density function

E.g. the time between shark attacks in Austrailia:
Exponential distribution ()

Support: Non-negative real numbers

A discrete bivariate distribution

Contingency table

E.g. questions measuring authoritarian attitudes:
Bivariate categorical distribution

X2
Women should have to promise to obey their husbands when they get married.
Agree Disagree
X1 Gays and lesbians are just as healthy and moral as anybody else. Agree0.050.53
Disagree0.330.09

Joint probability distributions measure probability across multiple variables and the association between those variables.

A discrete bivariate distribution

X2
Women should have to promise to obey their husbands when they get married.
Agree Disagree
X1 Gays and lesbians are just as healthy and moral as anybody else. Agree0.050.53
Disagree0.330.09

Conditional probability measures probability of one
variable in a joint distribution, holding the other constant at a specific value. The probabilities must be normalized.

A discrete bivariate distribution

X2
Women should have to promise to obey their husbands when they get married.
Agree Disagree
X1 Gays and lesbians are just as healthy and moral as anybody else. Agree0.050.53 0.58
Disagree0.330.09 0.42
0.38 0.62

Marginal probability measures probability of one variable in
a joint distribution, across all possible values of the other.

A continuous bivariate distribution

A continuous bivariate distribution

Some common distributions

Type Parameters Support
Binomial Discrete

Poisson Discrete

Normal
(Gaussian)
Continuous

Cauchy Continuous

Beta Continuous

Exponential Continuous

(Statisticians have devised and named innumerable distributions over time.
See https://en.wikipedia.org/wiki/List_of_probability_distributions for an incomplete list)

Probability models

A stack of dice, but instead of numbers they all have smiley faces with different expressions

Describing models

A language for describing probabilistic models

Using probability distributions to link (known) data with (unknown) parameters succinctly and clearly communicates a model.

Example from last week:
Estimating the unemployment rate p from count of unemployed (Y) in our sample of n individuals

Describing models

A language for describing probabilistic models

Using probability distributions to link (known) data with (unknown) parameters succinctly and clearly communicates a model.

Changes to the model are clear:

Sum­ma­riz­ing random variables

Close up of the wheel from Wheel of Fortune with the spinner landing between two 'Bankrupt' slots to point to the 'One Million' slot

Summarizing distributions

Describing the shape of a distribution

Posterior distributions contain a lot of information

Summarizing distributions

Point summaries

Describe the “center” of the distribution
Mean, median, and mode all have different meanings

Summarizing distributions

Mean

  • Colloquially “average”
  • Accounts for the magnitude of all data
  • Sensitive to extreme values

Median

  • 50th percentile
  • Not sensitive to extreme values

Mode

  • Value of X with highest probability density
  • Maximum likelihood methods find the mode

Summarizing distributions

Credible intervals

Describe the “spread” of the distribution

Percentile (aka quantile) intervals leave the same amount of density on either end of the distribution.

Highest posterior density intervals find the narrowest possible interval containin the target density.

Image credit

Figures by Peter McMahan (source code)

Illustration of a massive shark swimming upwards underwater toward a person swimming on the surface of the water at the top of the frame (detail from the poster for the move Jaws)

Poster detail for Jaws (1975)

Close up of the wheel from Wheel of Fortune with the spinner landing between two 'Bankrupt' slots to point to the 'One Million' slot

Still from Wheel of Fortune