Count and Duration Models

Similar documents
Count and Duration Models

Continuing with Binary and Count Outcomes

Logit Regression and Quantities of Interest

From Model to Log Likelihood

Logit Regression and Quantities of Interest

Survival Models. Patrick Lam. February 1, 2008

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Part 3: Parametric Models

Generalized Linear Models

High-Throughput Sequencing Course

GOV 2001/ 1002/ Stat E-200 Section 1 Probability Review

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Varieties of Count Data

Proportional hazards regression

The Derivative of a Function

A Re-Introduction to General Linear Models

Lecture #11: Classification & Logistic Regression

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1

MAS3301 / MAS8311 Biostatistics Part II: Survival

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Overfitting, Bias / Variance Analysis

Generalized Linear Models for Non-Normal Data

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

CPSC 340: Machine Learning and Data Mining

Probability Distributions Columns (a) through (d)

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Linear Regression Models P8111

Linear Regression With Special Variables

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

of 8 28/11/ :25

Sequences and infinite series

Week 2: Review of probability and statistics

Comparing IRT with Other Models

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

STAT 526 Spring Final Exam. Thursday May 5, 2011

Joint Probability Distributions and Random Samples (Devore Chapter Five)

GOV 2001/ 1002/ E-2001 Section 8 Ordered Probit and Zero-Inflated Logit

POLI 8501 Introduction to Maximum Likelihood Estimation

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Part 3: Parametric Models

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Gov 2000: 9. Regression with Two Independent Variables

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

Semiparametric Regression

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

MODELING COUNT DATA Joseph M. Hilbe

Semiparametric Generalized Linear Models

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

One-sample categorical data: approximate inference

GOV 2001/ 1002/ Stat E-200 Section 8 Ordered Probit and Zero-Inflated Logit

Generating Function Notes , Fall 2005, Prof. Peter Shor

CS 361: Probability & Statistics

Generalized Linear Models Introduction

Big Bang, Black Holes, No Math

Systematic Uncertainty Max Bean John Jay College of Criminal Justice, Physics Program

Regression Model Building

Matching. Stephen Pettigrew. April 15, Stephen Pettigrew Matching April 15, / 67

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

Learning Objectives for Stat 225

Stat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.

Dependence structures with applications to actuarial science

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

12 Statistical Justifications; the Bias-Variance Decomposition

CSCI2244-Randomness and Computation First Exam with Solutions

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Lecture-19: Modeling Count Data II

Precept Three: Numerical Optimization and Simulation to Derive QOI from Model Estimates

The Central Limit Theorem

ST495: Survival Analysis: Maximum likelihood

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Contents 1. Contents

Conditional probabilities and graphical models

Expected Value II. 1 The Expected Number of Events that Happen

Last few slides from last time

Solution to Proof Questions from September 1st

Week 1 Quantitative Analysis of Financial Markets Distributions A

Course Review. Kin 304W Week 14: April 9, 2013

CSC2515 Assignment #2

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

Gov 2000: 9. Regression with Two Independent Variables

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Physics 6720 Introduction to Statistics April 4, 2017

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

MAS3301 / MAS8311 Biostatistics Part II: Survival

ISQS 5349 Spring 2013 Final Exam

Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

Advanced Quantitative Methods: limited dependent variables

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

Duration Analysis. Joan Llull

Lecture 14: Introduction to Poisson Regression

Transcription:

Count and Duration Models Stephen Pettigrew April 2, 2014 Stephen Pettigrew Count and Duration Models April 2, 2014 1 / 61

Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 2 / 61

Logistics Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 3 / 61

Logistics Final papers Due Wednesday April 30 (last day of class before reading period) If you need an extension on the paper, you don t have to ask permission. We ll accept papers until Monday, May 5 at 5pm By tonight you should receive memos from the group re-replicating your paper. If you haven t sent your memo to the other group, do this as soon as possible Stephen Pettigrew Count and Duration Models April 2, 2014 4 / 61

Logistics Other logistics Problem set 6 will be posted tonight. It s due a week from today. Gov2001 party at Gary s house! Saturday, April 19 at 12:30 Stephen Pettigrew Count and Duration Models April 2, 2014 5 / 61

Last week s assessment question Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 6 / 61

Last week s assessment question Data for assessment question The adjacency matrix: The party of the five senators: 0 1 1 1 0 1 0 0 0 0 Y = 1 0 0 0 1 1 0 0 0 1 0 0 1 1 0 x = (0, 0, 1, 1, 1) The upper triangle of the adjacency matrix, in vector form: Y ij i < j = (1, 1, 1, 0, 0, 0, 0, 0, 1, 1) Indicator variables for whether each of ( 5 2) pairs of senators are in the same party X i X j i < j = (0, 1, 1, 1, 1, 1, 1, 0, 0, 0) Stephen Pettigrew Count and Duration Models April 2, 2014 7 / 61

Last week s assessment question Data for assessment question Colors indicate party of each of the 5 senators. Lines connecting a pair of senators indicates that they cosponsored Stephen Pettigrew Count and Duration Models April 2, 2014 8 / 61

Last week s assessment question The model We re told that the model for our data is: Y ij = Bern(π ij ) { p s if X i = X j π ij = if X i X j p d Stephen Pettigrew Count and Duration Models April 2, 2014 9 / 61

Last week s assessment question Question 1A Rewrite the systematic component as a one-line equation, where π ij is on the left-hand side and the right-hand side is expressed in terms of p s, p d, and X i X j. (Hint: X i X j is 0 when X i = X j and 1 when X i X j ). Note also that you should not reparametrize the parameters. How can we rewrite the systematic component { p s if X i = X j π ij = if X i X j so that it s a single equation? p d Since X i X j can only be either 0 or 1, we can use it as an indicator variable to switch between p s and p d : π ij = p s + (p d p s) X i X j or π ij = p X i X j d p 1 X i X j s Stephen Pettigrew Count and Duration Models April 2, 2014 10 / 61

Last week s assessment question Question 1B Write the log likelihood. L(π ij y, x) = i<j (π ij ) y ij (1 π ij ) 1 y ij Now substitute in the systematic component you solved for in 1A: [y ij ln(p s + (p d p s ) X i X j ) + (1 y ij ) ln(1 p s (p d p s ) X i X j )] i<j i<j or [ ] y ij ln(p X i X j d p 1 X i X j s ) + (1 y ij ) ln(1 p X i X j d p 1 X i X j s ) Stephen Pettigrew Count and Duration Models April 2, 2014 11 / 61

Last week s assessment question Question 1D Optimize and find the MLE. p s = 0.75 p d = 0.33 These results should be intuitive from the graph: Stephen Pettigrew Count and Duration Models April 2, 2014 12 / 61

Last week s assessment question Question 1G Simulate 1000 networks where all 5 senators are from the same party. Use these simulations to calculate the probability that there are more than 5 cosponsorship links among the ( 5 2) possible pairs of senators simp <- rmvnorm(1000, mean=optim.out$par, sigma=(solve(-optim.out$hessian))) simp[simp>1] <- 1 simp[simp<0] <- 0 Since our network only has senators of the same party we only care about the first column simp <- simp[,1] simy <- rbinom(length(simp), size = 10, prob = simp) mean(simy>5) Stephen Pettigrew Count and Duration Models April 2, 2014 13 / 61

Counts: Poisson Model Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 14 / 61

Counts: Poisson Model Counts Any time your outcome variable is a count of the number of times as event happened, you ll want to use a count model. If you know the number of trials and the probability of success is bigger than a really tiny decimal, you ll want to use a Bernoulli or binomial model Arithmomaniacs love count models If the number of trials is too large (or impossible) to easily count and the rate of success is very low, you ll want to use another type of count model Stephen Pettigrew Count and Duration Models April 2, 2014 15 / 61

Counts: Poisson Model Examples: 1. number of terrorist attacks in a given year 2. number of publications by a professor in a career 3. number of times word hope is used in a Barack Obama speech 4. number of raindrops that hit a 1 x 1 square on the sidewalk during a rainstorm 5. number of correctly predicted games in your March Madness bracket Stephen Pettigrew Count and Duration Models April 2, 2014 16 / 61

Counts: Poisson Model The Poisson Distribution The Poisson distribution is a discrete probability distribution which gives the probability that some number of events will occur in a fixed period of time. Here s the probability density function (PDF) for a random variable Y that is distributed Pois(λ): Pr(Y = y) = λy y! e λ Stephen Pettigrew Count and Duration Models April 2, 2014 17 / 61

Counts: Poisson Model Pr(Y = y) = λy y! e λ Suppose Y Pois(3). What s Pr(Y = 4)? Pr(Y = 4) = 34 4! e 3 = 0.168. Poisson Distribution Pr(Y=y) 0 0.05 0.1 0.15 0 2 4 6 8 10 y Stephen Pettigrew Count and Duration Models April 2, 2014 18 / 61

Counts: Poisson Model The Poisson Distribution One more time, the probability mass function (PMF) for a random variable Y that is distributed Pois(λ): Pr(Y = y) = λy y! e λ Using a little bit of geometric series magic, it isn t too hard to show that E[Y ] = y=0 y λy y! e λ = λ It also turns out that Var(Y ) = λ, a feature of the model we will discuss later on. Stephen Pettigrew Count and Duration Models April 2, 2014 19 / 61

Counts: Poisson Model The Poisson Distribution Poisson data arises when there is some discrete event which occurs (possibly multiple times) at a constant rate for some fixed time period. Another way to state this constant rate assumption is that the probability of an event occurring at any moment is independent of whether an event has occurred at any other moment. So whether or not raindrop i hits our 1 x 1 square on the sidewalk doesn t affect the probably that raindrop i + 1 hits the within the square Whether there s a civil war in country i at time t doesn t affect whether there s a civil war in country j at time t + 1 Stephen Pettigrew Count and Duration Models April 2, 2014 20 / 61

Counts: Poisson Model The Poisson Model for Event Counts So what s the model? 1 The stochastic component: 2 The systematic component: Y i Pois(λ i ) λ i = exp(x i β) In the Generalized Linear Models framework, exp(x i β) is our link function. Why do we use this? Because λ i is the rate parameter which must be greater than zero, and exp(x i β) > 0 Stephen Pettigrew Count and Duration Models April 2, 2014 21 / 61

Counts: Poisson Model Deriving the log likelihood of the Poisson Model If the PMF of the Poisson is: then the likelihood is: Pr(Y = y) = λy y! e λ L(λ y) = n i=1 i=1 λ y i i y i! e λ i And the log-likelihood is: l(λ y) = [ n λ y ] i i log y i! e λ i = n [ λ y i ] i log y i! e λ i i=1 Stephen Pettigrew Count and Duration Models April 2, 2014 22 / 61

Counts: Poisson Model Deriving the log likelihood of the Poisson Model l(β y, X ) = = n i=1 [ λ y i ] i log y i! e λ i n y i log λ i log(y i!) λ i i=1 n y i log λ i λ i i=1 n y i log(exp(x i β) exp(x i β) i=1 n y i X i β exp(x i β) i=1 Stephen Pettigrew Count and Duration Models April 2, 2014 23 / 61

Counts: Poisson Model Example: Terrorist attacks Now let s look at an application of the Poisson model. The data: Outcome variable: the number of deaths from terrorist attacks in a particular country over a 30 year period. Explanatory variable: the unemployment rate in each month during the 30 year period. Stephen Pettigrew Count and Duration Models April 2, 2014 24 / 61

Counts: Poisson Model Example: Terrorist attacks The model: Let Y i = # of terrorist attack deaths in a month. Our sole predictor for the moment will be: U = the unemployment rate during that month. Our model is: and or more generally: Y i Pois(λ i ) λ i = E(Y i U i ) = exp(β 0 + β 1 U i ) λ i = E(Y i X i ) = exp(x i β) Stephen Pettigrew Count and Duration Models April 2, 2014 25 / 61

Counts: Poisson Model Let s estimate this thing! We could estimate it by coding up the log likelihood function: pois.ll <- function(par, y, x){ x <- as.matrix(cbind(1, x)) xb <- x %*% par sum(y * xb - exp(xb)) } And then we could use optim(): opt.pois <- optim(par = rep(0,2), fn = pois.ll, x = attacks$unemploy, y = attacks$deaths, hessian = T, method = "BFGS", control = list(fnscale = -1)) Stephen Pettigrew Count and Duration Models April 2, 2014 26 / 61

Counts: Poisson Model Let s estimate this thing! Or we could use Zelig, since the Poisson model is already incorporated into that package: require(zelig) zel.pois <- zelig(deaths ~ unemploy, data = attacks, model = "poisson") summary(zel.pois)$coefficients Estimate Std. Error z value Pr(> z ) (Intercept) 1.230495 0.04203808 29.27095 2.430411e-188 unemploy 15.668797 0.17671220 88.66845 0.000000e+00 Stephen Pettigrew Count and Duration Models April 2, 2014 27 / 61

Counts: Poisson Model Checking our model We now have estimates of our coefficients. Are we ready to move on to generating quantities of interest and pretty graphs? Not yet First we need to check our assumption that the data generation process was Poisson. How can we do this? What might we look for? Stephen Pettigrew Count and Duration Models April 2, 2014 28 / 61

Counts: Poisson Model Dispersion Recall from earlier that with the Poisson distribution, we assume that E(Y X ) = Var(X X ) = λ If E(Y X ) < Var(X X ) then our data are overdispersed. If E(Y X ) > Var(X X ) then our data are underdispersed. How could we check for over or underdispersion? Stephen Pettigrew Count and Duration Models April 2, 2014 29 / 61

Counts: Poisson Model Overdispersion A lot more than 95% of our data is falling outside the 95% confidence intervals, so it looks like we have overdispersion attacks$deaths 5 10 15 0.16 0.18 0.20 0.22 0.24 0.26 0.28 attacks$unemploy Stephen Pettigrew Count and Duration Models April 2, 2014 30 / 61

Adding extra parameters Overdispersion is a problem with the Poisson model in part because the model only has one parameter, λ, which restricts the variance to be a specific value Recall from the Learning Catalytics question from Monday s lecture that this is a similar problem with exponential duration models, stylized normal models, and other models with only one parameter. But it s not a problem with Bernoulli models. Why the heck not? Before we figure out why the heck the Bernoulli model is different from Poissons, exponentials, and others, let s take a step back. Stephen Pettigrew Count and Duration Models April 2, 2014 31 / 61

Adding extra parameters Big picture Remember that when we re modeling, our big goal is to take data that s messy and complicated and summarize it using a distribution Then we take that distribution and its parameters and make assessments about our original messy data. When we do this, we re essentially collapsing our big messy data down to a few numbers (the parameters) Stephen Pettigrew Count and Duration Models April 2, 2014 32 / 61

Adding extra parameters Let s look at an example of some data. Here s a histogram of an outcome variable you might be interested in modeling: Frequency 0 500 1000 1500 2000 It appears that this data was generated from a stylized normal distribution, N(µ, 1). If we used that as our model, we d be summarizing our dataset using just one number, µ 46 48 50 52 54 outcome variable Stephen Pettigrew Count and Duration Models April 2, 2014 33 / 61

Adding extra parameters What if our data looked like this? Would we want to use a stylized normal distribution? Frequency 0 500 1000 1500 30 40 50 60 70 outcome variable No we wouldn t, because if we did we d be making the incorrect assumption that the variance was 1. To properly describe these data we d want to use two parameters, one for the mean and one for the variance. Are we now fully characterizing the distribution of these data? Is there any information we could add to our model to better summarize the data? Stephen Pettigrew Count and Duration Models April 2, 2014 34 / 61

Adding extra parameters What if our outcome looked like this? What assumption would we be violating if we used a N(µ, σ 2 ) model? The data are skewed and not symmetric, as the normal distribution assumes. Frequency 0 500 1000 1500 2000 2500 But we could model this by using the skew-normal distribution, which is a normal distribution with a third parameter to model the skewness. 45 50 55 60 65 70 outcome variable Stephen Pettigrew Count and Duration Models April 2, 2014 35 / 61

Adding extra parameters Why does this matter? So then what s the point of all of this? When you choose to model your data using just about any distribution, you re making some assumptions about the data. Generally you could relax those assumptions by adding another parameter which helps you to better characterize the attributes of the data, like the mean, variance, skewness, kurtosis, etc. Stephen Pettigrew Count and Duration Models April 2, 2014 36 / 61

Adding extra parameters Why does this matter? Then why don t we do this? Or put another way, what s one way to perfectly capture all the information in our dataset? Print off a spreadsheet with all your observations and variables and give it to your reader to interpret. Obviously that s not helpful The more and more parameters we add to our model, the closer we get to this extreme and the more our estimates become tightly fit to our specific dataset. There s a tradeoff between summarizing too little and summarizing too much. Another way to say this is that you have to strike a balance between underfitting your model and making incorrect assumptions about the data, and overfitting your model and not being able to make any general conclusions about the world outside of your dataset Stephen Pettigrew Count and Duration Models April 2, 2014 37 / 61

Adding extra parameters Why are binary outcome variables different? So adding parameters to models can help to relax assumptions and better fit the data. Why don t we have to do this to get the variance correct in binary outcome models (i.e. models with Bernoulli stochastic components)? Why is it that binary outcome models will have the correct variance with only one parameter, π, but exponential or Poisson models often won t? Stephen Pettigrew Count and Duration Models April 2, 2014 38 / 61

Adding extra parameters Why are binary outcome variables different? If the figure below shows binary outcome data for 1000 observations, how many numbers do you need to fully describe the data generation process for each of those 1000 observations? 0 100 200 300 400 500 600 700 Failures Successes You can fully characterize the underlying data generation process for this outcome by just knowing that π = 0.7. With binary outcome models there s no other information to be squeezed out of your data except this proportion, so adding another parameter to model variance (or anything else) can t help you to better describe the data generation process. Stephen Pettigrew Count and Duration Models April 2, 2014 39 / 61

Counts: Negative Binomial Model Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 40 / 61

Counts: Negative Binomial Model The Negative Binomial Model Recall that with our Poisson model we found evidence of overdispersion, meaning that the assumption that Var(Y ) = E(Y ) = λ was violated in the data. Just like we saw with the jump from stylized normal to normal (or normal to skewed normal) we can add a parameter to relax this assumption. In particular, we can derive a different distribution which relaxes the constant rate assumption and independence of events assumption of the Poisson The trick is to assume that λ varies according to a new parameter we will introduce call ζ. Stephen Pettigrew Count and Duration Models April 2, 2014 41 / 61

Counts: Negative Binomial Model Alternative Parameterization Let s derive the negative binomial model by adding a parameter to the Poisson. Here s the new stochastic component: Y i λ i, ζ i Poisson(ζ i λ i ) ( ) 1 ζ i Gamma σ 2 1, 1 σ 2 1 Note that this Gamma distribution has a mean of 1. Therefore, Poisson(ζ i λ i ) has mean λ i. Note that the variance of this new Poisson distribution is σ 2 1. This means that as σ 2 goes to 1, the distribution of ζ i collapses to a spike over 1. Stephen Pettigrew Count and Duration Models April 2, 2014 42 / 61

Counts: Negative Binomial Model Alternative Parameterization Using a similar approach to that described in UPM pgs. 51-52 we can derive the marginal distribution of Y as where the PDF is: Notes: Y i Negbin(λ i, σ 2 ) f nb (y i λ i, σ 2 ) = Γ( λ i σ 2 1 + y i) (σ 2 1) yi y!γ( λ i σ 2 1 ) σ 2 (σ 2 ) 1. λ i > 0 and σ > 1 λ i σ 2 1 2. E[Y i ] = λ i and Var[Y i ] = λ i σ 2. What value of σ 2 would be evidence against overdispersion? 3. We still have the same old systematic component: λ i = exp(x i β). Stephen Pettigrew Count and Duration Models April 2, 2014 43 / 61

Counts: Negative Binomial Model Negative binomial In the problem set, you ll be asked to take this count model and use it to find the MLE for some data. One thing that will help code this likelihood in R is knowing that Γ( ) is the gamma function, which is a generalization of factorials for non-integers. Note: The gamma function is a completely separate from the gamma distribution. In R, the function is gamma(). Stephen Pettigrew Count and Duration Models April 2, 2014 44 / 61

Counts: Negative Binomial Model Other Models Note that there are many other count models: Generalized Event Count (GEC) Model Zero-Inflated Poisson Zero-Inflated Negative Binomial Zero-Truncated Models Hurdle Models Stephen Pettigrew Count and Duration Models April 2, 2014 45 / 61

Counts: Negative Binomial Model Questions so far? Stephen Pettigrew Count and Duration Models April 2, 2014 46 / 61

Duration Model Basics Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model 4 Adding extra parameters 5 Counts: Negative Binomial Model 6 Duration Model Basics Stephen Pettigrew Count and Duration Models April 2, 2014 47 / 61

Duration Model Basics What are duration models used for? Survival models = duration models = event history models Dependent variable Y is the duration of time that observations spend in some state before experiencing an event (aka failure, death) Used in biostatistics and engineering: i.e. how long until a patient dies Models the relationship between duration and covariates (how does an increase in X affect the duration Y ) In social science, used in questions such as how long a coalition government lasts, how long a war lasts, how long a regime stays in power, or how long until a legislator leaves office Observations should be measured in the same (temporal) units, i.e. don t have some units duration measured in days and others in months Stephen Pettigrew Count and Duration Models April 2, 2014 48 / 61

Duration Model Basics Why not just use OLS? Three reasons: 1. OLS assumes Y is Normal but duration dependent variables are always positive (number of years, number of days. etc.) 2. Duration models can handle censoring observations 0 1 2 3 4 Observation 3 is Censored Observation 3 is censored in that it has not experienced the event at the time we stop collecting data, so we don t know its true duration 0 1 2 3 4 time Stephen Pettigrew Count and Duration Models April 2, 2014 49 / 61

Duration Model Basics Why not use OLS? 3. Duration models can handle time-varying covariates If Y is duration of a regime, GDP may change during the duration of the regime OLS cannot handle multiple values of GDP per observation You can set up data in a special way with duration models such that you can accomodate time-varying covariates We won t cover this today but its the same principle as censoring Stephen Pettigrew Count and Duration Models April 2, 2014 50 / 61

Duration Model Basics Duration/Survival Model Jargon Let T denote a continuous positive random variable representing the duration/survival times (T = Y ) T has a probability density function f (t) Stephen Pettigrew Count and Duration Models April 2, 2014 51 / 61

Duration Model Basics Duration/Survival Model Jargon F(t): the CDF of f (t), t 0 f (u)du = P(T t), which is the probability of an event occurring before (or at exactly) time t Survivor function: The probability of surviving (i.e. no event occuring) until at least time t: S(t) = 1 F (t) = P(T > t) Eye of the Tiger: 1982 album by the band Survivor, which reached number 2 on the US Billboard 200 chart. CDF F(t) Survivor Function S(t) F(t) 0.0 0.2 0.4 0.6 0.8 1.0 S(t) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 t 0 2 4 6 t Stephen Pettigrew Count and Duration Models April 2, 2014 52 / 61

Duration Model Basics Duration/Survival Model Jargon Hazard rate (or hazard function): h(t) is roughly the probability of an event at time t given survival up to time t h(t) = P(t T < t + τ T t) = P(event at t survival up to t) P(survival up to t event at t)p(event at t) = P(survival up to t) P(event at t) = P(survival up to t) = f (t) S(t) Stephen Pettigrew Count and Duration Models April 2, 2014 53 / 61

Duration Model Basics Relating the Density, Survival, and Hazard Functions f (t) }{{} density function = h(t) }{{} hazard function S(t) }{{} survival function Stephen Pettigrew Count and Duration Models April 2, 2014 54 / 61

Duration Model Basics Modeling with Covariates We can model the mean of the duration times as a function of covariates via a link function g( ) g(e[t i ]) = X i β and estimate β via maximum likelihood. Stephen Pettigrew Count and Duration Models April 2, 2014 55 / 61

Duration Model Basics How to estimate parametric survival models They might seem fancy and complicated, but we estimate these models the same as every other model! 1 Make an assumption that T i follows a specific distribution f (t) (i.e. choose the stochastic component). 2 Model the hazard rate with covariates (i.e. specify the systematic component). 3 Estimate via maximum likelihood. 4 Interpret quantities of interest (hazard ratios, expected survival times). Stephen Pettigrew Count and Duration Models April 2, 2014 56 / 61

Duration Model Basics What s Special About Survival Models? Censoring: Observation 3 is Censored observations 0 1 2 3 4 0 1 2 3 4 time... it makes modeling a little tricky. But not too tricky Observation 3 is censored because it had not experienced the event when we collected the data, so we don t know its true duration. Stephen Pettigrew Count and Duration Models April 2, 2014 57 / 61

Duration Model Basics Censoring Observations that are censored give us information about how long they survive. For censored observations, we know that they survived at least until some observed time, t c, and that the true duration, t is greater than or equal to t c. For each observation, let s create a censoring indicator variable, c i, such that { 1 if not censored c i = 0 if censored Stephen Pettigrew Count and Duration Models April 2, 2014 58 / 61

Duration Model Basics Censoring We can incorporate the information from the censored observations into the likelihood function. L = = = n i=1 n i=1 n i=1 [f (t i )] c i [P(T i t c i )] 1 c i [f (t i )] c i [1 F (t i )] 1 c i [f (t i )] c i [S(t i )] 1 c i So uncensored observations contribute to the density function and censored observations contribute to the survivor function in the likelihood. Stephen Pettigrew Count and Duration Models April 2, 2014 59 / 61

Duration Model Basics Next week Next week we ll go through a couple specific examples of duration models, as well as some stuff about assessing model fit Stephen Pettigrew Count and Duration Models April 2, 2014 60 / 61

Duration Model Basics Questions? Stephen Pettigrew Count and Duration Models April 2, 2014 61 / 61