Chapter 20 Duration Analysis

Similar documents
Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

The Occurrence and Timing of Events: The Application of Event History Models in Accounting and Finance Research

An Introduction to Censoring, Truncation and Sample Selection Problems

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Limited Dependent Variables

1 Binary Response Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Stat 543 Exam 2 Spring 2016

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Binomial Distribution: Tossing a coin m times. p = probability of having head from a trial. y = # of having heads from n trials (y = 0, 1,..., m).

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

STK4080/9080 Survival and event history analysis

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Stat 543 Exam 2 Spring 2016

Web-based Supplementary Materials for Inference for the Effect of Treatment. on Survival Probability in Randomized Trials with Noncompliance and

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

A joint frailty-copula model between disease progression and death for meta-analysis

Chapter 13: Multiple Regression

Limited Dependent Variables and Panel Data. Tibor Hanappi

6. Stochastic processes (2)

6. Stochastic processes (2)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Credit Card Pricing and Impact of Adverse Selection

NUMERICAL DIFFERENTIATION

Time to dementia onset: competing risk analysis with Laplace regression

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Analysis of Discrete Time Queues (Section 4.6)

Basic R Programming: Exercises

Linear Regression Analysis: Terminology and Notation

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Primer on High-Order Moment Estimators

Diagnostics in Poisson Regression. Models - Residual Analysis

RELIABILITY ASSESSMENT

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Sample Size Calculation Based on the Semiparametric Analysis of Short-term and Long-term Hazard Ratios. Yi Wang

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lecture 3: Probability Distributions

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Maximum Likelihood Estimation

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 9: Statistical Inference and the Relationship between Two Variables

Interval Regression with Sample Selection

January Examinations 2015

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Andreas C. Drichoutis Agriculural University of Athens. Abstract

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture Notes on Linear Regression

Computing MLE Bias Empirically

Rockefeller College University at Albany

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

Hydrological statistics. Hydrological statistics and extremes

Economics 130. Lecture 4 Simple Linear Regression Continued

The young are not forever young:

First Year Examination Department of Statistics, University of Florida

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Applied Stochastic Processes

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Financing Innovation: Evidence from R&D Grants

9. Binary Dependent Variables

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

,, MRTS is the marginal rate of technical substitution

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Convergence of random processes

Exam. Econometrics - Exam 1

4DVAR, according to the name, is a four-dimensional variational method.

A Comparative Study for Estimation Parameters in Panel Data Model

CIE4801 Transportation and spatial modelling Trip distribution

e i is a random error

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Lecture 10 Support Vector Machines II

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Equilibrium Analysis of the M/G/1 Queue

Estimating a Semi-Parametric Duration Model without Specifying Heterogeneity

Estimation of the Mean of Truncated Exponential Distribution

Properties of Least Squares

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

6 Supplementary Materials

Composite Hypotheses testing

Markov Chain Monte Carlo Lecture 6

Biostatistics 360 F&t Tests and Intervals in Regression 1

Semiparametric Methods of Time Scale Selection

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Data Abstraction Form for population PK, PD publications

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

The Geometry of Logit and Probit

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

Markov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal

Probability and Random Variable Primer

Transcription:

Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state and observed whether ext from state or censored (stll n ths state). Example: Unemployment Tme to leave UE Medcne: Tme to death untl specfc treatment Traffc: Tme untl accdent Frm: Tme untl frm closes Model dependence of T on covarates!

Approaches: ) We could model duraton as Y, where Y mght be censored and censorng pont mght vary between ndvduals Tobt type approach. Why should we use any other methods than Tobt? 2) Instead of modellng the duraton, one often models the hazard rate Ths permts for tme-varyng covarates 3) Also more helpful to extend to competng rsk models multple ext states there not only duraton s of nterest Other ssues: - left and rght censorng - endogenous varables - multple spells

8.2 Hazard Functon: Hazard Functons wthout Covarates Notaton T 0 s tme at whch a person leaves ntal state, whch has some dstrbuton n populaton t denotes a partcular value of T. The cdf of T s F(t) = P(T t), t 0. Survval functon: S(t) = - F(t) = P(T > t) - probablty of survvng past tme t. The pdf of T s f(t) = df (t)/dt. Hazard functon gves probablty of leavng ntal state n the nterval [t, t + h) gven survval up untl tme t: λ () t = ( < + ) P t T t ht t lm h 0 h For small h, t follows: P( t T < t+ ht t) h λ ( t)

Examples:. Unemployment Duraton T s length of tme unemployed n weeks then λ(20) s probablty of becomng employed between weeks 20 and 2, condtonal on havng been unemployed up to week 20. 2. Recdvsm Duraton T s number of months before a former prsoner s arrested for a crme then λ(2) s probablty of beng arrested durng the 3th month, condtonal on not havng been arrested durng the frst year. The hazard functon can be expressed n terms of the pdf and cdf of T: P( t T < t+ h) F( t+ h) F( t) P. ( t T < t+ ht t) = = P T t F t 2. λ () t ( + ) ( ) ( ) ( ) () ( ) () () ( )/ () F t h F t f t f t ds t dt dlog S() t = lm = = = = h 0 h F() t F t S t S t dt

Usng F(0) = 0, we can ntegrate to get t F t s ds t 0 t f t t s ds 0 () = exp λ (), 0 and () = λ() exp λ( ) All probabltes can be computed usng the hazard functon: ( a T < a2) P( T a ) a2 P P( a T < a2 T a) = = exp λ ( s) ds, t 0 a Shape of the hazard functon: duraton dependence ) If the hazard functon s constant, λ() t = λ process drvng T s wthout memory: the probablty of ext n the next nterval does not depend on how much tme has been spent n the ntal state. A constant hazard mples: F( t) exp[ λt] = s the cdf of the exponental dstrbuton

2) Webull dstrbuton: α F() t = exp γt, γ 0, α 0 f t = t γ t α and ( ) γα exp ( t) = f ( t) / S( t) = t α λ γα If α =, the Webull dstrbuton reduces to the exponental If α >, the hazard s monotcally ncreasng postve duraton dependence If α <, the hazard s monotcally decreasng negatve duraton dependence α 3) Log-logstc hazard functon: α γαt λ γ γα γ α + γ t α α α () t =, F() t = ( + t ), and f () t = t ( + t ) 2 Accordng to the sgn of α, the hazard exhbts postve or negatve duraton dependence

Illustraton of some survvor and hazard functons hazard 0.5.5 Webull hazard functons hazard 0.5.5 Log-logstc hazard functons 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 Webull Survvor functons Log-logstc Survvor functons Survvor 0.25.5.75 Survvor 0.25.5.75 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5

8.2.2 Hazard Functons Condtonal on Tme-Invarant Covarates Condtonal hazard s: P ( t T < t+ ht t, x) λ ( t; x ) = lm h 0 h where x s a vector of explanatory varables ( ) ( x ) f t x λ ( t; x) = F t Important class wth tme-nvarant regressors: proportonal hazard models λ ( t; ) = ( ) λ ( t) x κ x 0 wth κ(.) > 0 of x () and λ 0 t > 0 s the baselne hazard (captures the duraton dependence). Often κ(.) s parameterzed as κ ( x) = exp( x β ) then, log λ( t; x) = x β + log λ0 ( t) wth β j s the elastcty of the hazard w.r.t. z j such that xj = log( zj )

8.2.3 Hazard Functons Condtonal on Tme-Varyng Covarates Let x(t) the vector of regressors at tme t; for t 0, X() t denotes the covarate path up through tme t: { :0 } ( ) ( ) X t x s s t ( ) The condtonal hazard functon at tme t by λ t X () t Strct exogenety of the covarates: ( X( tt+ h) T t+ hx( t) ) = X( tt+ h) X ( t) P,, P, ( ) ; = lm ( < + X( + )) P t T t ht t, t h h 0 h Proportonal Hazard wth tme-varyng covarates: λ ( t; x ( t )) = κ ( x( t) ) λ 0 ( t), wth κ ( x( t) ) = exp x( t) β

8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates Populaton of nterest are ndvduals enterng the ntal state durng a gven nterval of tme [0,b], where b > 0 a known constant. We use at most one completed spell per ndvdual sngle-spell data 8.3. Flow Samplng Indvduals enterng the state at some pont durng the nterval [0,b]. Length of tme each ndvdual s n the ntal state s recorded. Data on covarates known at the tme the ndvdual entered the state are collected. Rght censorng: spells are not completed, because stop trackng ndvduals at a fxed tme. 8.3.2 ML under Flow Samplng and Rght Censorng For a random draw from the populaton, let a [0,b] denote the tme at whch ndvdual enters the ntal state, let t * denote the length of tme n the ntal state (duraton), and let x the vector of observed covarates. ( θ ) f t x ;, t 0 condtonal densty of ( ) * t * Rght censorng: t = mn t, c, where c s the censorng tme for ndvdual and t s the observed duraton

8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.2 ML under Flow Samplng and Rght Censorng (contnued) Condtonal on covarates, true duraton s ndependent of the startng pont * * and the censorng tme c : D ( t x, a, c) = D( t x ) a Under ths assumpton, the dstrbuton of t ( x, a, c ) does not depend on ( a, c ) * gven f duraton not censored, the contrbuton to the lkelhood s the densty: f ( t x ; θ ) f duraton s censored, contrbuton to the lkelhood s the survvor: F( c x ; θ ) Let d be a censorng ndcator ( = f uncensored and = 0 f censored). d Condtonal lkelhood for observaton s: f ( t ; ) ( ; ) x θ F t x θ Then MLE of θ s obtaned by maxmsng: dlog f ( t x; θ ) + ( d) log F( t x ; θ ) MLE s N -consstent and asymptotcally normal. N = d

Parameters of nterest are effects of covarates on expected duraton rather than the hazard We can apply a censored Tobt analyss to the log of the duraton. Suppose logt xδ φ σ * 2 λ(, t x) = log( t ) x ~ N( xδ, σ ) Hazard functon s σt logt x δ Φ σ the Hazard s not monotonc and does not have the PH form The estmates of the δ are easy to nterpret, because the model s equvalent to * 2 ( t ) = xδ + e e x N( σ ) log, where ~ 0, These are sem-elastctes (or elastctes f regressors n log form) on the expected duraton Webull model can also be represented n regresson form wth δ j = β j / α (Webull densty s (, ) exp ( ) α exp exp θ = β α ( β) x x x α ) f t t t Resdual n regresson equaton s extreme-value-i dstrbuted. Log-logstc model can also be represented n regresson form wth e has a 0 mean logstc dstrbuton and s ndependent of x δ = α β. (log-logstc hazard s ( ) exp ( ) α λ = β α / + exp( β) t x x α t x t )

8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.3 Stock Samplng Indvduals that are n ntal state are sampled at a gven pont n tme. Now, rght and left censorng are possble Wthout correcton: stock samplng bas Left censorng occurs when some startng tmes a are not observed. The sample selecton problem caused by stock samplng s called length-based samplng. Assumptons: a) startng tmes a for all ndvduals sampled at tme b are observed b) the sampled ndvduals can be observed for a certan length of tme Let ( a, c, x, t) a random draw from the populaton of all spells startng n [0,b]. Ths vector s observed f the person s stll n ntal state at tme b.

Under the condtonal ndependence assumpton: D * * ( t,, ) D x a c = ( t x ) * ( t b a x a c) = F( b a x a) P,,, the log-lkelhood functon wth truncated densty and probablty can be wrtten as N = ( θ ) ( ) ( θ ) F( b a x θ ) dlog f t ; + d log F t ; log ; x x where t = c when d = 0 If all unts are rght censored at ntervew date, the prevous log lkelhood does not dentfy θ. Even when all observed duratons are censored at the ntervew date, θ can stll be estmated gven a model for the condtonal dstrbuton of the startng tmes D( a x ) s specfed. D( a x ) s assumed to be contnuous on [0,b) wth densty k (., η ) x. Let s a sample selecton ndcator equal to f a random draw s observed.e. t b a *

Estmaton of θ and η can proceed by applyng CMLE to the densty of a condtonal on and s =. x Ths condtonal densty s ( x, = ) = ( x ; η ) ( x ; θ) /P( = x ; θ, η) p a s k a F b a s b where 0 < a < b and P( = x ; θη, ) = ( ; θ) x ( x ; η) s F b u k u du 0

8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.4 Unobserved Heterogenety The key assumptons used n most models that ncorporate unobserved heterogenety are () heterogenety s ndependent of the observed covarates, as well as startng tmes and censorng tmes; (2) heterogenety has a dstrbuton known up to a fnte number of parameters; (3) heterogenety enters the hazard functon multplcatvely. Example: Webull hazard functon condtonal on x and v α ( x ) ( x ) λ t, v = v exp β αt, where x = and v > 0 Identfcaton of α and β requres a normalsaton E( v ) = Integrate out the unobserved effect: ( x, θ, ρ) ( x,, θ) ( ; ρ) G t = F t v h v dv 0 The densty can also be obtaned. So the same methods of secton 8.3.2 and 8.3.3 can be F t, θ g t, θ, ρ used by replacng Gt ( x, θ, ρ ) by ( x ) and ( x ) by f ( t x, θ )

If a gamma-dstrbuted heterogenety, v ~ Gamma ( δ, δ ). The cdf of t * ( x, v ) ( x, ) exp t ( ; ) x exp ( ; ) 0 ξ x F t v = v k s ds = v t where λ ( t x, v) = vk ( t; x ) ( ; x ) t ( ; x ) ξ t = k s ds 0 δ δ ( ) = δ exp ( δ )/ Γ ( δ) hv v v Then, G( t x ) = + ξ( t; x ) / δ δ and g ( t x ) ( ) ( ) k t x t x ( δ ) = ; + ξ ; / δ wth the Webull hazard, the resultng duraton dstrbuton Burr dstrbuton. Importance of unobserved heterogenety because of the duraton dependence: condtonal on x only there can be some duraton dependence whle condtonal on x and v there s no duraton dependence. Example: Tme constant hazards wth dscrete heterogenety!

8.4 Analyss of Grouped Data Grouped data arse when each duraton s only known to fall nto a certan tme nterval. Panel data allow to treat grouped duratons. Tmelne s dvded nto M + ntervals, [0, a),[ a, a2),...,[ am, ), where a s are known constants Let c m be a bnary censorng ndcator equal to f the duraton s censored n nterval m. Smlarly, y m s a bnary ndcator equal to f the duraton ends n the mth nterval. {,,...,,, } For each person, we observe ( ) ( M M ) A parametrc hazard functon s specfed as λ ( t, θ ) y c y c x whch s a balanced panel. x. Let T denote the tme untl ext from the ntal state.

T s not fully observed, we know whch nterval t falls nto and whether t was censored nto a partcular nterval. We can thus obtan ( ym = ym = x cm = ) ( m m x m ) P 0 0,, 0 P y = y = 0,, c = 0, m=,..., M Under the assumpton that T s ndependent of c,..., c M gven x (random censorng), we have am ( m = m = x m = ) = ( m m m x) = λ( x θ) P y y 0,, c 0 P a T a T a, exp s;, ds a m 4444244443 α m ( x, θ) Therefore, P( ym = 0 ym = 0, x, cm = 0 ) = αm ( x, θ ) We can use these probabltes to construct the lkelhood functon for observaton. m log αh x, θ + log αm x, θ h= ( ) d ( ) ( )

To mplement the CMLE, a hazard functon must be specfed. A popular hazard functon s a pecewse-constant PH ( t x, ) = ( x, ) m, am- t < a m where ( x, ) > 0 ( ( x, ) = exp ( x, )) λ θ κ β λ κ β κ β β Wth ths functon, we have α ( x, θ m ) exp exp( x β ) λ m( a a m m ) and β and λ can be estmated. Wthout covarates, MLE of λ m leads to a well-known estmator of the survvor functon: Kaplan-Meer estmator. The survvor functon at tme a m s ( m) = P( > m) = P( > r > r ) S a T a T a T a m r= N r denotes the number of persons n the rsk set for nterval r (who have nether left the state nor been censored at tme r a whch s the begnnng of nterval r) and E r the number of persons observed to leave the state n the rth nterval Therefore, a consstent estmator of the survvor functon at tme a m s m Sˆ ( am) = ( Nr Er) / N r, m=,2,..., M r=

8.4 Analyss of Grouped Data 8.4.2 Tme-Varyng Covarates Dervng the log-lkelhood functon n ths case s more complcated, especally when the strct exogenety s not assumed. Nevertheless, f the regressors are constant whthn each tme nterval [ am, am), the form of log-lkelhood s same as n secton 8.4. wth replacng x by xm n nterval m. Under the condtonal ndependence assumpton on the censorng ndctor that D TT a, x, c = D TT a, x, m=,..., M ( m m m) ( m m) Under ths assumpton, the probablty of ext (wthout censorng) s am ( m = m = xm m = ) = ( m m m xm) = λ( xm θ) a P y y 0,, c 0 P a T a T a, exp s;, ds m 4444244443 Therefore, the partal log-lkelhood s gven by equaton () wth αh( x, θ ) replaced by αh( x, h θ ) and αm (, θ ) αm x, m, θ. x by ( ) α m ( x, θ) m

If the covarates are strctly exogenous and f the censorng s strctly exogenous ( m x ) ( m x m) D TT a,, c = D TT a,, m=,..., M Wth tme-varyng covarates, the hazard specfcaton s λ t x, θ = κ x, β λ, a t < a ( ) ( ) m m m m- m 8.4.3 Unobserved Heterogenety Wth tme-varyng covarates and unobserved heterogenety, t s dffcult to relax the strct exogenety assumpton. It s assumed that regressors are strctly exogenous condtonal on unobserved heterogenety and that the unobserved heterogenety s ndependent of the regressors. In the leadng case of the pecewse-constant baselne hazard, the hazard becomes λ tv, x, θ = v κ x, β λ, a t< a ( ) ( ) m m m m- m The densty of ( y,..., y M ) gven ( v, x, c) s m d { α, x, θ } α, x, θ 2 ( v ) ( v ) ( ) h h m, m h=

8.4 Analyss of Grouped Data 8.4.3 Unobserved Heterogenety (contnued) because equaton (2) depends on the unobserved heterogenety, we cannot use t drectly to consstently estmate θ. We can ntegrate out the unobserved effect n equaton (2) to obtan the densty of y s gven the regressors and censorng ndcators. Based on ths densty, the CMLE can be used.