Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state and observed whether ext from state or censored (stll n ths state). Example: Unemployment Tme to leave UE Medcne: Tme to death untl specfc treatment Traffc: Tme untl accdent Frm: Tme untl frm closes Model dependence of T on covarates!
Approaches: ) We could model duraton as Y, where Y mght be censored and censorng pont mght vary between ndvduals Tobt type approach. Why should we use any other methods than Tobt? 2) Instead of modellng the duraton, one often models the hazard rate Ths permts for tme-varyng covarates 3) Also more helpful to extend to competng rsk models multple ext states there not only duraton s of nterest Other ssues: - left and rght censorng - endogenous varables - multple spells
8.2 Hazard Functon: Hazard Functons wthout Covarates Notaton T 0 s tme at whch a person leaves ntal state, whch has some dstrbuton n populaton t denotes a partcular value of T. The cdf of T s F(t) = P(T t), t 0. Survval functon: S(t) = - F(t) = P(T > t) - probablty of survvng past tme t. The pdf of T s f(t) = df (t)/dt. Hazard functon gves probablty of leavng ntal state n the nterval [t, t + h) gven survval up untl tme t: λ () t = ( < + ) P t T t ht t lm h 0 h For small h, t follows: P( t T < t+ ht t) h λ ( t)
Examples:. Unemployment Duraton T s length of tme unemployed n weeks then λ(20) s probablty of becomng employed between weeks 20 and 2, condtonal on havng been unemployed up to week 20. 2. Recdvsm Duraton T s number of months before a former prsoner s arrested for a crme then λ(2) s probablty of beng arrested durng the 3th month, condtonal on not havng been arrested durng the frst year. The hazard functon can be expressed n terms of the pdf and cdf of T: P( t T < t+ h) F( t+ h) F( t) P. ( t T < t+ ht t) = = P T t F t 2. λ () t ( + ) ( ) ( ) ( ) () ( ) () () ( )/ () F t h F t f t f t ds t dt dlog S() t = lm = = = = h 0 h F() t F t S t S t dt
Usng F(0) = 0, we can ntegrate to get t F t s ds t 0 t f t t s ds 0 () = exp λ (), 0 and () = λ() exp λ( ) All probabltes can be computed usng the hazard functon: ( a T < a2) P( T a ) a2 P P( a T < a2 T a) = = exp λ ( s) ds, t 0 a Shape of the hazard functon: duraton dependence ) If the hazard functon s constant, λ() t = λ process drvng T s wthout memory: the probablty of ext n the next nterval does not depend on how much tme has been spent n the ntal state. A constant hazard mples: F( t) exp[ λt] = s the cdf of the exponental dstrbuton
2) Webull dstrbuton: α F() t = exp γt, γ 0, α 0 f t = t γ t α and ( ) γα exp ( t) = f ( t) / S( t) = t α λ γα If α =, the Webull dstrbuton reduces to the exponental If α >, the hazard s monotcally ncreasng postve duraton dependence If α <, the hazard s monotcally decreasng negatve duraton dependence α 3) Log-logstc hazard functon: α γαt λ γ γα γ α + γ t α α α () t =, F() t = ( + t ), and f () t = t ( + t ) 2 Accordng to the sgn of α, the hazard exhbts postve or negatve duraton dependence
Illustraton of some survvor and hazard functons hazard 0.5.5 Webull hazard functons hazard 0.5.5 Log-logstc hazard functons 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 Webull Survvor functons Log-logstc Survvor functons Survvor 0.25.5.75 Survvor 0.25.5.75 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5 0 2 3 4 5 t alpha = 0.5 alpha = alpha =.5
8.2.2 Hazard Functons Condtonal on Tme-Invarant Covarates Condtonal hazard s: P ( t T < t+ ht t, x) λ ( t; x ) = lm h 0 h where x s a vector of explanatory varables ( ) ( x ) f t x λ ( t; x) = F t Important class wth tme-nvarant regressors: proportonal hazard models λ ( t; ) = ( ) λ ( t) x κ x 0 wth κ(.) > 0 of x () and λ 0 t > 0 s the baselne hazard (captures the duraton dependence). Often κ(.) s parameterzed as κ ( x) = exp( x β ) then, log λ( t; x) = x β + log λ0 ( t) wth β j s the elastcty of the hazard w.r.t. z j such that xj = log( zj )
8.2.3 Hazard Functons Condtonal on Tme-Varyng Covarates Let x(t) the vector of regressors at tme t; for t 0, X() t denotes the covarate path up through tme t: { :0 } ( ) ( ) X t x s s t ( ) The condtonal hazard functon at tme t by λ t X () t Strct exogenety of the covarates: ( X( tt+ h) T t+ hx( t) ) = X( tt+ h) X ( t) P,, P, ( ) ; = lm ( < + X( + )) P t T t ht t, t h h 0 h Proportonal Hazard wth tme-varyng covarates: λ ( t; x ( t )) = κ ( x( t) ) λ 0 ( t), wth κ ( x( t) ) = exp x( t) β
8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates Populaton of nterest are ndvduals enterng the ntal state durng a gven nterval of tme [0,b], where b > 0 a known constant. We use at most one completed spell per ndvdual sngle-spell data 8.3. Flow Samplng Indvduals enterng the state at some pont durng the nterval [0,b]. Length of tme each ndvdual s n the ntal state s recorded. Data on covarates known at the tme the ndvdual entered the state are collected. Rght censorng: spells are not completed, because stop trackng ndvduals at a fxed tme. 8.3.2 ML under Flow Samplng and Rght Censorng For a random draw from the populaton, let a [0,b] denote the tme at whch ndvdual enters the ntal state, let t * denote the length of tme n the ntal state (duraton), and let x the vector of observed covarates. ( θ ) f t x ;, t 0 condtonal densty of ( ) * t * Rght censorng: t = mn t, c, where c s the censorng tme for ndvdual and t s the observed duraton
8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.2 ML under Flow Samplng and Rght Censorng (contnued) Condtonal on covarates, true duraton s ndependent of the startng pont * * and the censorng tme c : D ( t x, a, c) = D( t x ) a Under ths assumpton, the dstrbuton of t ( x, a, c ) does not depend on ( a, c ) * gven f duraton not censored, the contrbuton to the lkelhood s the densty: f ( t x ; θ ) f duraton s censored, contrbuton to the lkelhood s the survvor: F( c x ; θ ) Let d be a censorng ndcator ( = f uncensored and = 0 f censored). d Condtonal lkelhood for observaton s: f ( t ; ) ( ; ) x θ F t x θ Then MLE of θ s obtaned by maxmsng: dlog f ( t x; θ ) + ( d) log F( t x ; θ ) MLE s N -consstent and asymptotcally normal. N = d
Parameters of nterest are effects of covarates on expected duraton rather than the hazard We can apply a censored Tobt analyss to the log of the duraton. Suppose logt xδ φ σ * 2 λ(, t x) = log( t ) x ~ N( xδ, σ ) Hazard functon s σt logt x δ Φ σ the Hazard s not monotonc and does not have the PH form The estmates of the δ are easy to nterpret, because the model s equvalent to * 2 ( t ) = xδ + e e x N( σ ) log, where ~ 0, These are sem-elastctes (or elastctes f regressors n log form) on the expected duraton Webull model can also be represented n regresson form wth δ j = β j / α (Webull densty s (, ) exp ( ) α exp exp θ = β α ( β) x x x α ) f t t t Resdual n regresson equaton s extreme-value-i dstrbuted. Log-logstc model can also be represented n regresson form wth e has a 0 mean logstc dstrbuton and s ndependent of x δ = α β. (log-logstc hazard s ( ) exp ( ) α λ = β α / + exp( β) t x x α t x t )
8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.3 Stock Samplng Indvduals that are n ntal state are sampled at a gven pont n tme. Now, rght and left censorng are possble Wthout correcton: stock samplng bas Left censorng occurs when some startng tmes a are not observed. The sample selecton problem caused by stock samplng s called length-based samplng. Assumptons: a) startng tmes a for all ndvduals sampled at tme b are observed b) the sampled ndvduals can be observed for a certan length of tme Let ( a, c, x, t) a random draw from the populaton of all spells startng n [0,b]. Ths vector s observed f the person s stll n ntal state at tme b.
Under the condtonal ndependence assumpton: D * * ( t,, ) D x a c = ( t x ) * ( t b a x a c) = F( b a x a) P,,, the log-lkelhood functon wth truncated densty and probablty can be wrtten as N = ( θ ) ( ) ( θ ) F( b a x θ ) dlog f t ; + d log F t ; log ; x x where t = c when d = 0 If all unts are rght censored at ntervew date, the prevous log lkelhood does not dentfy θ. Even when all observed duratons are censored at the ntervew date, θ can stll be estmated gven a model for the condtonal dstrbuton of the startng tmes D( a x ) s specfed. D( a x ) s assumed to be contnuous on [0,b) wth densty k (., η ) x. Let s a sample selecton ndcator equal to f a random draw s observed.e. t b a *
Estmaton of θ and η can proceed by applyng CMLE to the densty of a condtonal on and s =. x Ths condtonal densty s ( x, = ) = ( x ; η ) ( x ; θ) /P( = x ; θ, η) p a s k a F b a s b where 0 < a < b and P( = x ; θη, ) = ( ; θ) x ( x ; η) s F b u k u du 0
8.3 Analyss of Sngle-Spell Data wth Tme-Invarant Covarates 8.3.4 Unobserved Heterogenety The key assumptons used n most models that ncorporate unobserved heterogenety are () heterogenety s ndependent of the observed covarates, as well as startng tmes and censorng tmes; (2) heterogenety has a dstrbuton known up to a fnte number of parameters; (3) heterogenety enters the hazard functon multplcatvely. Example: Webull hazard functon condtonal on x and v α ( x ) ( x ) λ t, v = v exp β αt, where x = and v > 0 Identfcaton of α and β requres a normalsaton E( v ) = Integrate out the unobserved effect: ( x, θ, ρ) ( x,, θ) ( ; ρ) G t = F t v h v dv 0 The densty can also be obtaned. So the same methods of secton 8.3.2 and 8.3.3 can be F t, θ g t, θ, ρ used by replacng Gt ( x, θ, ρ ) by ( x ) and ( x ) by f ( t x, θ )
If a gamma-dstrbuted heterogenety, v ~ Gamma ( δ, δ ). The cdf of t * ( x, v ) ( x, ) exp t ( ; ) x exp ( ; ) 0 ξ x F t v = v k s ds = v t where λ ( t x, v) = vk ( t; x ) ( ; x ) t ( ; x ) ξ t = k s ds 0 δ δ ( ) = δ exp ( δ )/ Γ ( δ) hv v v Then, G( t x ) = + ξ( t; x ) / δ δ and g ( t x ) ( ) ( ) k t x t x ( δ ) = ; + ξ ; / δ wth the Webull hazard, the resultng duraton dstrbuton Burr dstrbuton. Importance of unobserved heterogenety because of the duraton dependence: condtonal on x only there can be some duraton dependence whle condtonal on x and v there s no duraton dependence. Example: Tme constant hazards wth dscrete heterogenety!
8.4 Analyss of Grouped Data Grouped data arse when each duraton s only known to fall nto a certan tme nterval. Panel data allow to treat grouped duratons. Tmelne s dvded nto M + ntervals, [0, a),[ a, a2),...,[ am, ), where a s are known constants Let c m be a bnary censorng ndcator equal to f the duraton s censored n nterval m. Smlarly, y m s a bnary ndcator equal to f the duraton ends n the mth nterval. {,,...,,, } For each person, we observe ( ) ( M M ) A parametrc hazard functon s specfed as λ ( t, θ ) y c y c x whch s a balanced panel. x. Let T denote the tme untl ext from the ntal state.
T s not fully observed, we know whch nterval t falls nto and whether t was censored nto a partcular nterval. We can thus obtan ( ym = ym = x cm = ) ( m m x m ) P 0 0,, 0 P y = y = 0,, c = 0, m=,..., M Under the assumpton that T s ndependent of c,..., c M gven x (random censorng), we have am ( m = m = x m = ) = ( m m m x) = λ( x θ) P y y 0,, c 0 P a T a T a, exp s;, ds a m 4444244443 α m ( x, θ) Therefore, P( ym = 0 ym = 0, x, cm = 0 ) = αm ( x, θ ) We can use these probabltes to construct the lkelhood functon for observaton. m log αh x, θ + log αm x, θ h= ( ) d ( ) ( )
To mplement the CMLE, a hazard functon must be specfed. A popular hazard functon s a pecewse-constant PH ( t x, ) = ( x, ) m, am- t < a m where ( x, ) > 0 ( ( x, ) = exp ( x, )) λ θ κ β λ κ β κ β β Wth ths functon, we have α ( x, θ m ) exp exp( x β ) λ m( a a m m ) and β and λ can be estmated. Wthout covarates, MLE of λ m leads to a well-known estmator of the survvor functon: Kaplan-Meer estmator. The survvor functon at tme a m s ( m) = P( > m) = P( > r > r ) S a T a T a T a m r= N r denotes the number of persons n the rsk set for nterval r (who have nether left the state nor been censored at tme r a whch s the begnnng of nterval r) and E r the number of persons observed to leave the state n the rth nterval Therefore, a consstent estmator of the survvor functon at tme a m s m Sˆ ( am) = ( Nr Er) / N r, m=,2,..., M r=
8.4 Analyss of Grouped Data 8.4.2 Tme-Varyng Covarates Dervng the log-lkelhood functon n ths case s more complcated, especally when the strct exogenety s not assumed. Nevertheless, f the regressors are constant whthn each tme nterval [ am, am), the form of log-lkelhood s same as n secton 8.4. wth replacng x by xm n nterval m. Under the condtonal ndependence assumpton on the censorng ndctor that D TT a, x, c = D TT a, x, m=,..., M ( m m m) ( m m) Under ths assumpton, the probablty of ext (wthout censorng) s am ( m = m = xm m = ) = ( m m m xm) = λ( xm θ) a P y y 0,, c 0 P a T a T a, exp s;, ds m 4444244443 Therefore, the partal log-lkelhood s gven by equaton () wth αh( x, θ ) replaced by αh( x, h θ ) and αm (, θ ) αm x, m, θ. x by ( ) α m ( x, θ) m
If the covarates are strctly exogenous and f the censorng s strctly exogenous ( m x ) ( m x m) D TT a,, c = D TT a,, m=,..., M Wth tme-varyng covarates, the hazard specfcaton s λ t x, θ = κ x, β λ, a t < a ( ) ( ) m m m m- m 8.4.3 Unobserved Heterogenety Wth tme-varyng covarates and unobserved heterogenety, t s dffcult to relax the strct exogenety assumpton. It s assumed that regressors are strctly exogenous condtonal on unobserved heterogenety and that the unobserved heterogenety s ndependent of the regressors. In the leadng case of the pecewse-constant baselne hazard, the hazard becomes λ tv, x, θ = v κ x, β λ, a t< a ( ) ( ) m m m m- m The densty of ( y,..., y M ) gven ( v, x, c) s m d { α, x, θ } α, x, θ 2 ( v ) ( v ) ( ) h h m, m h=
8.4 Analyss of Grouped Data 8.4.3 Unobserved Heterogenety (contnued) because equaton (2) depends on the unobserved heterogenety, we cannot use t drectly to consstently estmate θ. We can ntegrate out the unobserved effect n equaton (2) to obtan the densty of y s gven the regressors and censorng ndcators. Based on ths densty, the CMLE can be used.