SOCI 620: Quantitative methods 2

Agenda

Survival analysis

  1. Administrative
  2. Modeling “survival”
  3. Time-to-event models
    & censored data
  4. Hazard/survival models

Slides are licensed under CC BY-NC-SA 4.0

Administrative

Presentation order:

  • Mahjoube
  • Jordan
  • Zacharie
  • Ella
  • Kaitlin
  • Zekai
  • Terhas
  • Trent

Time-to-
event
models

Vintage 16mm film leader counding down from 8 to 2 (from https://www.youtube.com/watch?v=TKezMJV2_HU)

Time to event

For many real-world problems, a model needs to account for uncertainty in event timing

  • Recidivism
  • Career length
  • Life expectancy
  • Time to residency
  • Regime length
  • Protest event length

Time to (re)arrest

Rossi (1980) recidivism data

Observational data for one year after release for 432 convicted people in Maryland in the late 1970s.

Treatment: receiving “financial aid”

Rossi, Peter Henry, Richard A. Berk, and Kenneth J. Lenihan. 1980. Money, Work, and Crime: Experimental Evidence. Academic Press.

ID week_arrested arrested financial_aid black
1 20 1 0 1
2 52 0 1 1
3 52 0 0 0
4 17 1 0 1

Duration vs. Hazard

Two common approaches to modeling sources of differences in timing:

Duration:

Model the duration of the event as a random variable with expectation determined by individual characteristics

Hazard:

Model the event as a (conditional) Bernoulli random variable with the probability determined by individual characteristics

Modeling duration

Vintage 16mm film leader counting down from 8 to 2 (from https://www.youtube.com/watch?v=TKezMJV2_HU)

Modeling duration

Choose an outcome distribution (e.g. Gamma, Weibull, exponential, log-normal, …) and follow the same strategy as we have for other GLM models:

Problem:
What do we do with people that were not arrested during the year? (censored data)

Censored data

Option 1 (bad)

We could drop observations for whom we do not know whether or when they were arrested. This is a bad idea and will almost certainly lead to biased results (see lecture on missing data)

N = 114 (out of 432)

Censored data

Option 2 (better)

A better approach is to treat the arrest timing for those who were not arrested during the 52 week sample period as missing data for which we have a definite lower limit.


There are many robust ways to deal with this sort of censored data. In a Bayesian context the most common approach is treat the missing values as parameters with strong priors

Types of censoring:

  • Right-censored — We only observe the data before a specific time
  • Left-censored — We only observe the data after a specific time
  • Interval-censored — We only observe the data within a specfic time interavl

Censored data

Dropping censored observations (bad)

Modeling censoring process (better)

Modeling hazard

Vintage 16mm film leader counting down from 8 to 2 (from https://www.youtube.com/watch?v=TKezMJV2_HU)

Hazard & survival functions

Survival function

  • Probability that the event will happen after a given time t.

Hazard function

  • “Instantaneous” probability of the even happening, conditional on it not having happened already

Neither of these is observed

Proportional hazard model

Discrete version of Cox Proportional Hazard Model: