The Occurrence and Timing of Events: The Application of Event History Models in Accounting and Finance Research

Size: px

Start display at page:

Download "The Occurrence and Timing of Events: The Application of Event History Models in Accounting and Finance Research"

Ruth Roberts
6 years ago
Views:

1 The Occurrence and Tmng of Events: The Applcaton of Event Hstory Models n Accountng and Fnance Research Marc J. LeClere Assstant Professor Department of Accountng School of Busness Admnstraton Loyola Unversty Chcago (O) (F) mlecle@luc.edu September 1999 I wsh to thank Felca LeClere for comments on an earler verson of ths paper.

2 1.0 INTRODUCTION Survval analyss s a statstcal method concerned wth the study of events and the substantve processes that govern the occurrence and tmng of events [Allson, 1984; Blossfeld and Rohwer, 1995]. 1 An event s a qualtatve change that occurs at a gven pont n tme to an ndvdual, organzaton, entty, or socety [hereafter referred to collectvely as ndvdual(s) ]. The substantve process s characterzed by unts changng from one dscrete state to another dscrete state at any pont n tme as a result of tme-constant or tme-dependent covarates [Coleman, 1981]. Researchers n bomedcne, socology, demography, crmnology, engneerng, economcs, and management use the method to study a wde range of problems such as dsease, patent survval, brth, death, marrage, dvorce, mgraton, revolutons, crme, arrests, recdvsm, manufacturng durablty, equpment falure, strkes, unemployment, labor force partcpaton, job changes, promotons, retrements, and mergers. Most accountng research studes nterested n events use qualtatve response models such as probt or logstc regresson. Probt and logstc regresson use a dchotomous dependent varable to dstngush between the occurrence and non-occurrence of an event of nterest ['0-1' or 'yes - no']. These statstcal methods typcally examne the relaton between the probablty of an event and a set of covarates for a sngle year. As an example, consder accountng research n the area of bankruptcy [e.g., see Ohlson, 1980]. Bankruptcy research typcally uses cross-sectonal observatons and the standard research desgn nvolves selectng a sample of bankrupt and non- 2

3 bankrupt frms for a gven year and dstngushng between these frms on the bass of ndependent varables that are thought to nfluence frm bankruptcy. Although bankruptcy studes may focus on the year of bankruptcy, one year pror to bankruptcy, or n -years pror to bankruptcy, the analyss s cross-sectonal and nvolves examnng the relaton between the probablty of bankruptcy and a set of covarates for a specfc pont n tme. Survval analyss, on the other hand, uses longtudnal rather than crosssectonal data. The analyss requres data for ndvduals for a tme perod pror to the occurrence of a gven event. Because survval analyss focuses on the duraton of tme precedng an event, the researcher must stpulate a tme orgn, a tme scale, and an exact defnton of the event that termnates the duraton [Kefer, 1988]. Assume that a researcher was nterested n the duraton of tme precedng the bankruptcy of startup frms. The tme orgn would be the date at whch the frm began operatons. In a large sample of frms, each frm would have a dfferent tme orgn as they most lkely began busness on dfferent dates. The tme scale would be calendar months. The duraton would end when the frm experenced the event termed 'bankruptcy'. Whereas a cross-sectonal study analyzes the relaton between the probablty of bankruptcy and a set of covarates for a specfc year, survval analyss uses longtudnal data from the tme perod precedng bankruptcy because the focus s on the process that governs bankruptcy rather than the observaton of the bankrupt/non-bankrupt state. Survval analyss examnes the effect of changes n the covarates on the duraton of tme precedng the event as well as the probablty that the event wll 3

4 occur. If an underlyng process generates the event, the choce of crosssectonal rather than longtudnal data reles on the assumpton that the process s n statstcal equlbrum [Coleman, 1981]. 1.1 Benefts of Survval Analyss The prmary advantage of survval analyss, relatve to regresson and qualtatve response models such as logstc regresson and probt, les n the manner n whch t handles censored observatons and tme-varyng covarates. Censorng occurs when ncomplete nformaton s avalable on the occurrence of a specfc event. In event hstory studes, ndvduals are n orgn states and are observed for the occurrence of a specfc event such as marrage, death, dvorce, or job termnaton. For example, n studes modelng the tmng of frst brth, women who have not gven brth to a chld are n the orgn state termed nullparous. When a woman gves brth to the frst chld, the event ( frst brth ) has occurred, and the woman s no longer n the orgn state. Tme spent n the orgn state s defned as the duraton of tme precedng the brth of the chld and tme orgn. Tme orgn mght be defned as 0, age of menarche, age of marrage, or date of last contraceptve use. 1.2 Censorng Censorng occurs when knowledge of the tme that the ndvdual spends n the orgn state s ncomplete and the exact duraton of tme ('lfetme') s known for only a porton of a sample. Survval analyss models use methods of 4

5 estmaton (generally ether maxmum or partal lkelhood) that ncorporate nformaton from censored and uncensored observatons to provde consstent parameter estmates [Allson, 1995]. In contrast, statstcal methods such as regresson analyss cannot ncorporate nformaton from censored observatons. The llustraton n Fgure 1 provdes an overvew of the common forms of censorng. Assume that the numbered lnes represent ndvduals [ndvduals 1 through 7]. Observatons are made on hstorcal tme and the perod of observaton begns at t and ends at 1 t 2. A sold lne represents a perod of tme n whch the ndvdual s observed and a dotted lne represents a tme perod where the ndvdual s not observed. The end of each sold lne, f not followed by a dotted lne, represents the occurrence of an event. The end of each dotted lne does not necessarly sgnfy the occurrence of an event. In Fgure 1, ndvdual 1 s not censored. Indvdual 1 entered the orgn state after the start of the observaton perod and experenced the event durng the perod. Indvdual 2 s censored on the rght or rght censored and represents the most common form of censorng. In ths stuaton, the event dd not occur durng the observaton perod and may or may not occur after the observaton perod ends. Observatons are rght censored at L when the knowledge of an ndvdual's lfe s that t s equal to or greater than the value L but the exact value of the observaton s unknown [Lawless, 1982]. T, the tme of event occurrence, s known to be greater than some value L, but the exact occurrence of the event s unknown. As an example, assume that a study examnes a sample of frms to model the process of bankruptcy. Some frms wll experence 5

6 bankruptcy durng the observaton perod. Frms that have not experenced bankruptcy by the end of the observaton perod may or may not experence bankruptcy after the observaton perod. Rght censorng may take one of two common forms. Type I rght censorng (tme censorng) refers to censorng that occurs because observaton s termnated at a preassgned tme pont or date and the ndvdual dd not experence the event up to that pont n tme. Type II rght censorng occurs when the observaton perod s termnated after the occurrence of a specfc number of events. Type II censorng s sometmes referred to as order censorng snce the study s termnated after observng certan order statstcs [Kalbflesch and Prentce, 1980]. Ths form of censorng s qute rare n socal scence research but common n bomedcal research [Allson, 1995]. In terms of estmaton, rght censorng provdes no specal problems n survval analyss snce the termnaton date s determned ndependently of the process under study [Blossfeld and Rohwer, 1995; Kalbflesch and Prentce, 1980]. The ablty of event hstory analyss to handle rght censored observatons s t major advantage over tradtonal statstcal methods [Yamaguch, 1991]. Indvdual 3 s rght censored and ths may be the result of random or nonrandom censorng. Indvdual 3 left the study because some event other than the event of nterest occurred and ndvdual 3 dd not reach the maxmum observaton tme. Indvdual 3 may or may not experence the event after censorng. Subjects may de, move, or leave a research study for any number of unspecfed reasons. 2 The salent ssue s whether the event that caused the 6

7 ndvdual to leave the study s ndependent of the event of nterest or the event s a case of non-ndependent censorng. If ths censorng s random t provdes no estmaton problems for survval analyss methods. However, f the censorng s not random, problems arse due to selectvty bas snce the characterstcs of the ndvduals lost to non-ndependent censorng are often related to the process under study [Blossfeld and Rohwer, 1995; Kalbflesch and Prentce, 1980]. In a study of bankruptcy, frms that merge durng the observaton perod represent ths type of censorng. If mergers are random, no problem exsts. But f fnancally dstressed frms demonstrate a propensty for mergers (as one of a number of alternatves to deal wth fnancal dstress), selectvty bas arses snce the observaton contans nformaton related to the process of bankruptcy. Indvdual 4 s fully censored on the left. The ndvdual entered and left the orgn state pror to the perod of observaton and the only knowledge s that the event occurred before a gven pont n tme. T, the tme of event occurrence, s known to be less than some value L, but the exact occurrence of the event s unknown. Fully censored observatons are qute dffcult to ncorporate nto survval analyss because the duraton of tme n the orgn state s unknown [Blossfeld and Rohwer, 1995]. Sample selecton bas s dffcult to correct because the bas s a functon of the duraton of tme n the orgn state and the event that occurred to generate the mssng data s unknown [Yamaguch, 1991]. Fully left censored observatons result n based parameter estmates. Consder frms that experence bankruptcy before the observaton perod and therefore are not part of the sample. If these frms demonstrate a 7

8 predsposton for bankruptcy (e.g., startup frms that fal early and frequently), then the nformaton these observatons provde about the process s lost and the parameter estmates are bas. Indvdual 5 s partally censored on the left. The ndvdual has already spent tme n the orgn state at the start of the observaton perod and the event occurs durng the observaton perod. If there s no nformaton as to the start of the orgn state, the value of duraton at t 1 s unknown. Survval analyss cannot properly use nformaton from ths observaton snce the begnnng of the observaton perod cannot be substtuted for the begnnng of the orgn state [Yamaguch, 1991]. Sometmes t s possble to observe the duraton of tme spent n the orgn state at t 1 by ether examnng retrospectve nformaton or reconstructng data. But ths ntroduces a sample selecton bas because nformaton pror to t 1 s known for some observatons (ndvdual 5) but not other observatons (ndvdual 4) and early entry nto the orgn state and a short duraton untl the occurrence of the event ncrease the lkelhood that the observaton wll not appear n the study perod [Blossfeld and Rohwer, 1995]. Survval analyss can address ths type of censorng by evaluatng only that part of the duraton n the observaton perod [Blossfeld and Rohwer, 1995; Yamaguch, 1991]. The analyss examnes the survval of the ndvdual from t 1 to the occurrence of the event condtoned on the fact that the ndvdual survved untl t 1. Indvdual 6 s completely rght censored. The entrance and ext from the orgn state occur after the observaton perod. The duraton of tme untl the 8

9 occurrence of the event s unknown for ths ndvdual. Unless the duraton of the orgn state s ndependent of the start of the orgn state, survval analyss must take nto account the effect of the start of the orgn state on duraton n order to elmnate sample selecton bas [Yamaguch, 1991]. Fnally, ndvdual 7 represents an ndvdual that s left and rght censored. The ndvdual s n the orgn state at the start and end of the observaton perod but there s no nformaton about the start or end of tme spent n the orgn state. 1.3 Tme-varyng Covarates The second major ssue whch survval analyss addresses concerns the value of covarates over the observaton perod. Covarates can be tme-nvarant or tme-varyng (tme-dependent). Tme-nvarant covarates do not change durng the perod that precedes the occurrence of an event. For nstance, n the case of ndvduals, some covarates such as sex and blood-type never change over tme. Other covarates mght change over tme, but the change s so nsgnfcant that the covarate may be regarded as tme-nvarant. As an example, f a survval analyss model used the nflaton rate as a covarate, although the rate mght ncrease or decrease over tme, the change mght be so slght that the nflaton rate could be regarded as constant. On the other hand, tme-varyng covarates change durng the course of the observaton perod. Covarates such as ncome, job status, educaton, famly status, and wealth generally do change over tme. 9

10 When modelng the duraton of tme that precedes the occurrence of an event, the value of a covarate along the tme path effects the probablty of event occurrence. The major contrbuton of survval analyss methods n ths area s that the estmaton procedures consder changes n the value of covarates over tme. Cross-sectonal studes only examne the level of a varable at a gven pont n tme. Cross-sectonal analyss uses a snap-shot methodology because t only vews the ndvdual at a snap-shot n tme. Survval analyss, relyng on longtudnal data rather than cross-sectonal data, ncorporates changes n the covarates over tme n the estmaton process. Tme-dependent covarates can be classfed as nternal and external [Blossfeld and Rohwer, 1995; Kalbflesch and Prentce, 1980]. External covarates are not drectly nvolved wth mechansm that causes falure. They may be fxed or vary wth tme but the value of an external covarate s not a functon of the tme pror to the event although t may affect the duraton. Internal tme-dependent covarates are the result of a stochastc process generated by an ndvdual and provde nformaton about the survval tme of the ndvdual [Blossfeld, et al., 1989; Kalbflesch and Prentce, 1980]. 2.0 SURVIVAL DISTRIBUTIONS Survval analyss models the probablty of a change n a dependent varable Y t from an orgn state j to a destnaton state k as a result of causal factors [Blossfeld and Rohwer, 1995]. The duraton of tme between states s referred to as event (falure) tme. Event tme s represented by a non-negatve 10

11 random varable T that represents the duraton of tme untl the dependent varable at tme t 0 (Y t0 ) changes from state j to state k. 3 Dfferent survval analyss models assume dfferent probablty dstrbutons for T. Regardless of the probablty dstrbuton of T, the probablty dstrbuton can be specfed as a cumulatve dstrbuton functon, a survvor functon, a probablty densty functon, or a hazard functon. The cumulatve dstrbuton functon s t F( t) = P( T t) = f ( x) dx. [1] 0 It represents the probablty that T s less than or equal to a value t and denotes the probablty that the event occurs before some tme t. F( t) s also referred to as the lfetme dstrbuton or falure dstrbuton [Elandt-Johnson and Johnson, 1980]. If T represents the frst occurrence of an event (e.g., age at onset of dsease or age at frst marrage) then F( t ) represents the dstrbuton of event or falure tme. The survval functon (sometmes referred to as the relablty functon, cumulatve survval rate, or survvorshp functon) s the complementary functon of F( t) and s represented as S( t) = P( T > t). [2] It represents the probablty that the event tme s greater than a value t. The survval functon ndcates that survval tme s longer than t (the event has not occurred at tme t ) or that the ndvdual survves untl tme t. The survval functon s a monotone non-ncreasng left contnuous functon of tme t wth S ( 0 ) = 1 (snce event tme cannot be negatve) and S( ) lm F( t) = = 0. As tme t 11

12 elapses, the functon approaches 0, snce the event (e.g., death) wll occur for all ndvduals. In event hstores t s more common to use the survval functon rather than the cumulatve dstrbuton functon because t s more ntutve to thnk of ndvduals survvng an event to a certan pont n tme rather than not survvng the event [Blossfeld and Rohwer, 1995]. The probablty densty functon s defned as f ( t) ( t T < t + t) df( t) P S( t) lm = = [3] t 0 t dt dt = and t represents the uncondtonal nstantaneous probablty that falure occurs n the perod of tme from t to t + t per unt wdth t. Before takng the lmt, P( t T < t + t) represents the probablty that the event occurs n the tme perod between t and t and f (t) s proportonal to ths probablty as the nterval becomes very small. The densty functon s also known as the uncondtonal falure rate or the curve of deaths. The hazard functon s represented as ( t T < t + t T t) f ( t) = t F( t) P λ ( t) = lm [4] t 0 1 and t defnes the nstantaneous rate of falure at T = t condtonal upon survvng to tme t. 4 The hazard functon quantfes the probablty of falure for ndvduals that have survved untl tme t and effectvely removes ndvduals who have experenced the event pror to t from consderaton. The hazard functon s sometmes referred to as a hazard rate because t s a dmensonal quantty that has the form number of events per nterval of tme [Allson, 1995]. It provdes a 12

13 local, tme-related descrpton of the behavor of the process over tme by provdng a measure of the rsk of falure per unt of tme and represents the propensty of the rsk set at tme t to change from the orgn state to the event state ( Y j Y k) t = = [Blossfeld and Rohwer, 1995]. Because tme s contnuous, t the probablty that an event wll occur exactly at tme t s 0 so the hazard functon s expressed as the probablty that an event wll occur n the small nterval between t and t [Allson, 1995]. The hazard functon s not a condtonal probablty because t can be greater than 1. The best approxmaton of the condtonal probablty P( t T < t T t) s λ( t) t for very small values of t [Blossfeld, et al., 1989]. The hazard functon provdes nformaton concernng future events f the ndvdual survves to tme t n that the recprocal of the hazard functon 1/ λ( t ) denotes the expected length of tme untl the event occurs [Allson, 1995]. The hazard functon may ncrease, decrease, or reman constant over tme dependng upon the underlyng process. Because the cumulatve dstrbuton functon, survvor functon, probablty densty functon, and hazard functon all descrbe a contnuous probablty dstrbuton, each can be defned n terms of the other [Allson, 1995; Kalbflesch and Prentce, 1980; Lee, 1980]. If you know F( t ), then equatons [1] and [2] provde ( ) ( ) S t = 1 F t [5] because survval and non-survval probabltes add to 1 and equaton [4] provdes 13

14 ( ) λ t = ( ) ( ) f t S t. [6] If S( t ) s known, f ( t) can be determned snce and ( ) ( ) f t = d dt F t [7] ( 1 S t ) S ( t) ( ) ( ) f t d = = dt [8] and equaton [6] provdes λ( t ). If λ( t) s known, then substtutng equaton [8] nto equaton [6] provdes ( ) λ t ( ) ( ) S t d = = log S t S t dt e ( ) [9] and ntegraton provdes ( ) ( ) S t t = exp λ x dx 0 [10] and equatons [6] and [10] provde that t f ( t) = ( t) λ exp λ( x) dx 0. [11] 3.0 NONPARAMETRIC METHODS OF ESTIMATION The most basc approach to descrbe the dstrbuton of survval tmes conssts of nonparametrc descrptve methods. Nonparametrc methods make no assumpton about the dstrbuton of event tmes (T ) but nstead focus on provdng descrptve nformaton of the survvor functon of event tmes [Lee, 14

15 1980]. Nonparametrc or dstrbuton-free methods for analyzng survval data have been favored by bostatstcans [Allson, 1995]. Although nonparametrc methods of estmaton are used less frequently than parametrc and semparametrc methods, the methods are approprate when a theoretcal dstrbuton s not known. Pror to fttng a theoretcal dstrbuton, nonparametrc methods are useful for prelmnary examnaton of data, suggestng functonal form, and assessng homogenety [Allson, 1995; Kefer, 1988]. The two common technques for nonparametrc estmaton of the survvorshp functon are the lfe table method and the Kaplan-Meer estmator. 3.1 Lfe Tables The lfe table method s among the most common methods to analyze survval data. It s used n demographc and nsurance studes and s sometmes referred to as the actuaral method [Allson, 1995]. A lfe table s bascally a frequency table desgned to deal wth censored data [Lawless, 1982]. Lfe tables provde estmates of the survval experence of a cohort and when the cohort s a random sample from the populaton, the lfe table provdes estmates of survval propertes of the populaton [Lawless, 1982]. The prmary functon of a lfe table s to summarze survval data grouped nto ntervals and to provde estmates of the survvor functon, the densty functon, and the hazard rate. It s desgned for stuatons where only the nterval n whch falure or censorng occurred s known as the actual falure or censorng tme s unknown [Kalbflesch and Prentce, 1980]. 15

16 The constructon of a lfe table begns wth the observaton of lfetme t or censorng tme L for each member of a random sample. Because the exact occurrence of t or L s not known, the observatons are grouped nto ntervals. 5 The tme axs of a lfe table s dvded nto q +1 ntervals such that [ ) I = a, a, k = 1,..., q + 1 a = 0, a = [12] k k 1 k 0 q+ 1 and a q s the last observable tme pont [Blossfeld, et al., 1989]. Assumng that the dstrbuton of lfetmes have a survvor functon, S( t ), then we can defne ( ) ( ) P = S a = P T a [13] k k k as the probablty that an ndvdual survves beyond a specfc nterval I k, and ( ) p = P T I T I = k k k 1 Pk P k 1 [14] as the probablty that an ndvdual survves beyond a specfc nterval condtoned on the probablty on survvng untl that nterval. The hazard functon of the kth nterval s ( [ a a ) T a ) λ [15] k = P T k 1, k k 1 and t s the probablty of event occurrence n the kth tme nterval condtoned upon reachng that nterval [Blossfeld, et al., 1989]. Defnng equaton [14] n terms of equaton [15], p k = 1 λ [16] k allows P k to be rewrtten as ( ) ( ) ( ) P = P T a T a... P T a T a P T a = p... p [17] k k k k 1 16

17 Restated, the probablty of survvng past a gven nterval I k s the product of the condtonal probabltes of survvng past a gven nterval gven survval to the start of the nterval [Blossfeld, et al., 1989; Lawless, 1982]. Estmaton of a lfe table nvolves gatherng a random sample or grouped data from some populaton and ether observng event T durng lfetme t or censorng tme L. The observatons are grouped and placed n the kth nterval correspondng to the occurrence of the event. As a result of ths, the data conssts of n ndvduals that begn the study, d k events n the kth nterval, and w k censored observatons n the kth nterval. The rsk set, R k, conssts of all ndvduals who are at rsk for the occurrence of the event at each pont n tme (the begnnng of the k th nterval) such that R = 1 n and R = R d w for k k 1 k 1 k 1 k = 2, K, q + 1 [Blossfeld, et al., 1989]. If there s no censorng n the kth nterval, the estmate of the hazard rate λ^ k s smply d k R k. It s the number of events occurrng n an nterval dvded by the rsk set for that partcular nterval and t represents the condtonal probablty of the event occurrng n a specfc nterval. However, f censorng s present n a specfc nterval ( w k > 0 ), ths estmate underestmates the actual hazard because some of the censored observatons mght have experenced the event durng the nterval [Blossfeld, et al., 1989]. An arbtrary adjustment s made by assumng the censored observatons are at rsk for half of the nterval and the estmator s λ^ = d k Rk wk 2. [18] 17

18 Once an estmate of the hazard functon s obtaned, estmates of p k and P k are obtaned from p k ^ ^ ^ ^ ^ = 1 λ k and P = pk K p 1. The estmated probablty of an event occurrng n the k th nterval s ( [, )) for the k th nterval s ^ ^ ^ P = T ak 1 ak = P k 1 P k and the densty ^ ^ ^ ^ ^ P k 1 Pk P k 1 λ k f k = = [19] hk hk where hk = ak ak 1 characterzes the length of the k th nterval [Blossfeld, et al., 1989]. Standard errors can be approxmated for ^ λ, ^ P, and ^ p and wth large samples, the values of ^ λ, ^ P, and ^ p dvded by the standard errors are assumed to be normally dstrbuted [Lee, 1980; Yamaguch, 1991]. By dvdng the sample nto subsets, ^ λ, ^ P, and ^ p can be estmated for each subset and tests can be made to determne f the estmates for the subsets are smlar or dfferent. Although the popularty of estmatng survval curves wth lfe table based methods has fallen n the last twenty years, the curves are useful for prelmnary evaluaton of data and evaluatng the ft of regresson models [Allson, 1995; Teachman, 1983]. Lfe tables for populaton subgroups provde survvorshp functons that allow an assessment of exogenous varables to nclude n more complex analyses [Teachman, 1983]. However, the prmary crtcsm of the lfe table method s that the choce of ntervals s arbtrary. The arbtrarness of the ntervals leads to arbtrary results and uncertanty over nterval choce [Allson, 1995]. Another crtcsm s that the relablty of the method depends upon a large number of epsodes [Blossfeld and Rohwer, 1995]. 18

19 3.2 Kaplan-Meer Estmator In a lfe table, the exact occurrence of t or L s not known, but ths s not always the case. Sometmes t or L s known wth certanty and observed for each of all N observatons and the product-lmt or Kaplan-Meer method can be used for nonparametrc estmaton of the survvorshp functon. For the perod under observaton, assume tme begns at t 0, ends at t e, and the perod ( 0,t e ] s dvded nto M ntervals ( 0, t 1 ], ( t 1, t 2 ],, ( t m 1, te ] K where the ntervals are so small that the probablty of more than one event n an nterval s almost nonexstent [Elandt-Johnson and Johnson, 1980]. The event tmes are ordered such that t 1 < t 2 <K t e where e n [Blossfeld, et al., 1989]. The rsk set at any gven pont s R and t represents the number of ndvduals or frms that survved untl tme t (or n actualty, survved untl the moment of tme just before t ) [Elandt- Johnson and Johnson, 1980]. Defnng and 1 ( t, t ] 1 f death occurs n 1 φ = [20] 0 otherwse q as the condtonal probablty of death n ( t, ] s alve at 1 φ φ 1 t, then L ( q ) ( ) p R 1 provdes 1 t gven that the ndvdual ( t, t ] 1 ^ f event occurs n 1 f λ 1 = R [21] otherwse 0 as the unbased maxmum lkelhood estmator of the hazard functon [Elandt- Johnson and Johnson, 1980]. The probablty of survvng beyond the current perod s 19

20 ^ p 1 R 1 ^ f event occurs at t λ = R, [22] otherwse 1 = 1 1 and the survval functon s ^ where S( ) ^ ( ) ^ ^ ^ ^ ^ S t = P = p p K p, [23] = P0 = 1 [Elandt-Johnson and Johnson, 1980]. Because an event may not occur n some tme nterval t, p = 1 1 and tme dvsons wthout an event do not enter nto the estmate of the survvor functon. The nterest s therefore only on the ordered tme perods where an event occurs, t < t K 2 < K< t j < t K, where K sgnfes the number of events at a dstnct tme pont [Elandt-Johnson and Johnson, 1980]. Defnng R j as the rsk set at t j, then the product-lmt estmator or Kaplan-Meer estmator of S( t ) s 1 for t < t 1 ^ R j 1 S( t) = for t < 1 t t+ 1, = 1,2, K K 1. [24] j= 1 R j for t t k K R j 1 j= 1 R j Equaton [24] states that for any tme perod t before an event occurs, the probablty of survvng to t 1 s equal to one. For any other tme perod except where t t k, the survvor functon s equal to the product of the condtonal probabltes of survvng to R j gven survval to R j 1. If the tme perod s greater than or equal to t k, the survvor functon depends on the nature of censored observatons. If there s no censorng (no events occur n or after k t ) then S( t) 20

21 equals 0. But f there are rght censored observatons, S( t) s undefned. As wth the lfe table method, t s possble to compare survvor functons for sub-groups. Lke the lfe table method, estmates of the standard error of the survvor functon allow the calculaton of confdence ntervals. Addtonally, several test statstcs exst for comparng survvor functons generated by the product-lmt estmator [e.g., see Lee, 1980 and Blossfeld and Rohwer, 1995]. 4.0 PARAMETRIC METHODS OF ESTIMATION Although nonparametrc methods of estmaton provde nformaton on the survval functon, the prmary dsadvantage of nonparametrc estmaton s that nformaton from these models reles a large part on comparsons among subgroups of sample data. As an ncreasng number of comparsons results n an ncreasng number of subgroups, problems arse wth subgroups contanng a small number of cases, complex nterpretatons among subgroups, and loss of nformaton due to the groupng of subgroups accordng to qualtatve characterstcs [Blossfeld and Rohwer, 1995]. Consequently, survval analyss n the last twenty years has shfted ts focus from non-parametrc to parametrc methods of analyss. 4.1 Falure Tme Dstrbutons The prmary purpose of survval analyss s to descrbe the occurrence and tmng of events. Consequently, hazard functons are used to descrbe the probablty dstrbuton of the tme of event occurrence. However, because 21

22 dfferent processes generate unque patterns of survval data, many dstrbutons are used to model the dstrbuton of falure tme. As a result, f the decson has been made to use a parametrc method of estmaton, one of the key decsons that needs to be made s to choose a probablty dstrbuton that best descrbes the underlyng process that generates the observed duraton data. Ths secton provdes a revew of the exponental, Webull, and log-normal probablty dstrbutons. These are the more common probablty dstrbutons that form the bass for regresson models Exponental Dstrbuton The exponental dstrbuton s the smplest and most wdely used dstrbuton n survval studes [Lee, 1980]. The exponental dstrbuton s characterzed by a sngle parameter and t has a constant hazard functon over T. The falure rate s ndependent of t and the chance of falure over any gven tme nterval s the same. Ths s referred to as the memoryless or lack of memory property n that tme does not affect survval [Kalbflesch and Prentce, 1980; Lee, 1980]. Large (small) values of λ ndcate hgh (low) rsk and short (long) survval [Lee, 1980]. Because tme does not affect falure, the occurrence of the event s a random event. The relevant functons for the exponental dstrbuton are ( ) λ t = λ λ > 0, t 0, [25] ( ) ( ) S t = exp λt t 0, [26] 22

23 ( ) ( ) f t = λ exp λt t 0, λ 0, and [27] 0 t < 0 ( ) ( ) F t = 1 exp λ t t 0. [28] Due to smplcty, the exponental dstrbuton was the frst wdely used dstrbuton model but the restrctveness of a constant hazard functon has led to cautous use n recent years [Lawless, 1982]. Fgures 2, 3, and 4 show the hazard functon, the survvorshp functon, and the probablty densty functon for an exponental dstrbuton wth a constant hazard rate of 1. The fgures show a constant hazard rate ( λ = 1), a survvorshp functon that approaches 0 as tme elapses, and a probablty densty functon that ndcates an uncondtonal probablty of falure that also approaches 0 as tme elapses. 4.3 Webull Dstrbuton The Webull dstrbuton s a generalzaton of the exponental dstrbuton that allows for a power dependence of the hazard on tme. It has broader applcatons than the exponental dstrbuton because t does not assume a constant hazard rate [Lee, 1980]. It s probably the most wdely used dstrbuton and s often used to test the lfe of manufactured tems [Blossfeld, et al., 1989; Lawless, 1982]. The relevant functons for the Webull dstrbuton are γ 1 ( ) = λγ ( λt) t 0, λ, γ > 0, λ t [29] ( t ) ( ) ( ) S t = exp λ γ, [30] γ ( ) γ ( ) = λγ ( λ ) 1 exp ( λ ) f t t t, and [31] 23

24 F t ( t ) ( ) ( ) = 1 exp λ γ. [32] The hazard functon s monotone decreasng (ncreasng) for values of γ <1 ( γ > 1 ) and reduces to the exponental dstrbuton for γ = 1 [Kalbflesch and Prentce, 1980]. Consequently, the dstrbuton can be used to model survval dstrbutons of populatons wth ncreasng, decreasng, and constant rsk [Lee, 1980]. The Webull dstrbuton s characterzed by the two parameters, λ and γ. The value of γ s called the shape parameter because t determnes the shape of the dstrbuton [Lawless, 1982; Lee, 1980]. Typcal values of γ range from 1 to 3 [Lawless, 1982]. The value of λ determnes the scale of the dstrbuton and t s called the scale parameter [Lee, 1980]. Fgures 2, 3, and 4 show the hazard functon, the survvorshp functon, and the probablty densty functon for a dstrbuton wth λ = 2 and alternatve values of γ =.5, γ =1, and γ = 3. When γ =1 the functon reduces to the exponental functon. When γ =.5 the survval dstrbuton has decreasng rsk and when γ = 3 t has ncreasng rsk. 4.4 Log-normal Dstrbuton The log-normal dstrbuton s a dstrbuton of a varable whose logarthm follows the normal dstrbuton [Lee, 1980]. The log-normal dstrbuton s often used to model the dstrbuton of survval tmes of dseases such as Hodgkn s, leukema, and lung cancer [Lawless, 1982; Lee, 1980]. The dstrbuton assumes that f Y = log T s normally dstrbuted wth mean µ and varance σ 2, then 24

25 lfetme T s log-normally dstrbuted wth scale parameter µ and shape parameter σ 2 2 (wrtten as T = Λ ( µ, σ ) 1 ( ) 2π exp σ ). As the probablty densty functon of Y s y µ σ the probablty densty functon of ( ) ( ) f t T = exp Y s 1 = 1 2 exp ( 2π ) σ t The other relevant densty functons are and < y <, [33] log t µ σ 2, t > 0. [34] log t µ F( t) = Φ, [35] σ log t µ S( t) = 1 Φ, [36] σ h( t) = f ( t) S( t). [37] Fgures 2, 3, and 4 show the dstrbutons for a log-normal dstrbuton where T = Λ( 0,. 3). 7 Note that the probablty densty functon s postvely skewed and that the hazard functon has a value of 0 at t = 0, ncreases to a maxmum value, and then approaches 0 as t becomes large. Greater values of σ 2 would ncrease the skewness and dfferent values of µ would change the scale on the x axs. 25

26 5.0 REGRESSION MODELS The survval dstrbutons descrbe earler, such as the exponental and Webull, are useful for modelng the survval experence of a homogeneous populaton but they do not consder the effect of covarates on the survval dstrbuton. It s possble to use these dstrbutons n a bvarate model to perform survval analyss when the only nformaton used s the tme of the event. Although such dstrbutons may adequately model the process of nterest, a large amount of research focuses on heterogeneous populatons. Because the populaton s heterogeneous, covarates exst for each ndvdual or subject n the populaton. As a result, t becomes an nterestng queston to estmate causal models that determne the effect of such covarates, as well as tme, on the hazard rate [Blossfeld, et al., 1989; Kalbflesch and Prentce, 1980; Lawless, 1982]. Ths s accomplshed through the use of regresson models n whch the hazard rate or tme to falure s the fundamental dependent varable. When nonparametrc methods of estmaton are used, data gatherng nvolves obtanng falure or duraton data for each subject. Expandng the method of estmaton to a parametrc method requres that covarates be gathered for each subject n the sample. Upon completon of data gatherng, there s a vector of covarates where = ( x..., ) x 1, x s for a process wth falure tme T > 0. The basc ssue s to specfy a model for the dstrbuton of t gven x and ths can be accomplshed wth parametrc or sem-parametrc models. Parametrc models use dstrbutons such as the exponental and Webull whle sem-parametrc models make no assumptons about the underlyng dstrbuton. 26

27 5.1 Maxmum Lkelhood Estmaton Parametrc survval models are estmated usng the maxmum lkelhood method. Maxmum lkelhood estmaton s used because t produces estmators that are consstent, asymptotcally effcent, and asymptotcally normal. If data s gathered for a sample of n ndvduals ( 1, K, n) =, the data wll consst of t, the tme of the event (or f the observaton s censored, the tme of censorng), an ndcator varable, δ, representng the presence ( δ = 0) or absence ( = 1) δ of censorng, and a vector of covarates, x = x K 1 x k. In the absence of censored observatons, the probablty of observng the entre data s the product of the probabltes of observng the data for each specfc ndvdual. Representng the probablty of each observaton by ts probablty densty functon provdes the lkelhood functon L = f ( t ) n = 1, where L represents the probablty of the entre data. If censorng s presence, then the lkelhood functon becomes n δ [ ( )] ( ) [ ] L = f t S t = 1 1 δ. The lkelhood functon effectvely combnes uncensored and censored observatons n that f an ndvdual s not censored, the probablty of the event s f ( t ), and f the ndvdual s censored at t, the probablty of the event s S ( t ), the survvorshp functon evaluated at t. Takng the natural log of L, the objectve s to maxmze the expresson log n ( L) = δ ln f ( t ) + ( 1 δ ) ln S( t ) = 1 n = 1. Once the approprate dstrbuton has been specfed, the process reduces to usng a numeratve method such as the Newton-Raphson algorthm to solve for the parameters. 27

28 5.2 Parametrc Models Parametrc models of survvor functons are referred to as accelerated falure tme (AFT) models. Gven an underlyng process that generates event tmes T > 0, accelerated falure tme models specfy the relaton between event tme, T, or the hazard rate, ( t) λ, and a vector of covarates, x = x, K 1, x. The k dependence of falure tme or the hazard rate on explanatory varables s accomplshed by specfyng parameters of probablty dstrbutons as a parametrc functon of the covarates [Namboodr and Suchndran, 1987]. The general form of the parametrc models s log T = β + β x + K + β x + σω [38] or * * * ( t) β 0 + β1 x 1 + β k xk k k log λ = K +. [39] Equaton [38] uses survval tme as the dependent varable and equaton [39] uses the hazard rate. Both models make use of a log transformaton n order that event tme and the hazard rate have a postve value. For each ndvdual, T s event tme, λ s the hazard rate, kth covarate. Estmated parameters are σ and ω s a random dsturbance term, and x k s the β k [Allson, 1995]. Alternate parametrc models change the value of σ and dfferent values of σ mply alternate dstrbutons of ω as well as T. Equaton [39] does not have a dsturbance term because of random varaton between λ ( t) and T. If two ndvduals had the same values for each of the covarates and, therefore, the same hazard rate, they would not have the same event tme [Allson, 1995]. 28

29 5.3 Exponental Regresson Model The exponental regresson model s a generalzaton of the exponental dstrbuton [see equatons [25], [26], [27], and [28]] where the falure rate s a functon of covarates rather than a constant parameter λ. As such, t s approprate when subjects have constant hazard functons that can be expressed as a functon of covarates [Blossfeld, et al., 1989; Lee, 1980]. Recall that the exponental dstrbuton specfes that λ ( t ) = λ model, the hazard functon s wrtten as λ ( t x) = λ( x). Generalzng to a regresson,. The hazard rate s stll a constant characterzed by an exponental dstrbuton but the hazard depends on covarates x. Specfyng the hazard rate as the functon of the covarates and a set of parameters where λ( t x) = λc( xβ ),, the functonal form x chosen, and the hazard rate s represented as λ ( t x) = λe x c = e s generally ; [Kalbflesch and Prentce, 1980]. Takng the log of each sde of the equaton, the exponental regresson model takes the form of equaton [39] whereby ( t) = β 0 + β1 x 1 + K β k xk log λ +, [40] log λ = β 0 0, and the log of the hazard rate s a seen as a lnear functon of the covarates. Alternatvely the exponental regresson can model event tme as a functon of the covarates. Gven that falure tme follows an exponental dstrbuton wth the probablty densty functon n equaton [27], the probablty dstrbuton of Y log( T ) = follows an extreme value dstrbuton wth the probablty 29

30 densty functon gven by ( y) ω ω e f = exp where ω θ = y and λ = exp( θ ) [Namboodr and Suchndran, 1987]. Rewrtng reparameterzng θ as xβ provdes the model ω = y θ as y = θ + ω and * * * logt β + β x + K + β x + ω. 8 [41] = k k The regresson model n ths form states that the log of event tme s a lnear functon of the covarates. The relaton between the coeffcents n equatons [40] and [41] s β = [Allson, 1995]. * β 5.4 Webull Regresson Model The prmary shortfall of the exponental regresson s the restrctve assumpton of a constant hazard rate. Hazard rates generally rse or fall over tme dependng on the event of nterest but they are rarely constant for long duratons. Consequently, accelerated falure tme models often assume an underlyng dstrbuton other than the exponental dstrbuton. One popular choce s the Webull regresson model. The Webull dstrbuton s selected because ts flexblty allows t to descrbe many patterns of survval data [Namboodr and Suchndran, 1987]. Gven the hazard functon n equaton [29], the functon can be rewrtten wth ether the shape or scale parameter specfed as a functon of covarates although the general conventon transforms only the scale parameter [Elandt-Johnson and Johnson, 1980; Namboodr and Suchndran, 1987]. The regresson model agan can take two forms although the more common form of the model s stated n equaton [38]. The model specfes a standard extreme-value dstrbuton for ω but allows 0 < σ. Ths mples a 30

31 Webull dstrbuton for T and allows the hazard to ncrease or decrease over tme. 5.5 Sem-Parametrc Models Although parametrc models are an mprovement over lfe tables and the Kaplan-Mejer estmator, they stll have lmtatons. Foremost among these problems are the necessty to specfy the behavor of the hazard functon over tme, fndng a model wth an approprate shape f the hazard functon s nonmonotonc, and a cumbersome estmaton process when the covarates change over tme [Allson, 1984]. 5.6 Cox's Proportonal Hazards Model The dffcultes encountered wth the parametrc models are resolved wth the proportonal hazards models. 9 The proportonal hazards model s represented as λ ( ) ( ) t xβ = λ 0 t e. [42] The model states that the hazard rate for any ndvdual s the product of an arbtrary unspecfed baselne hazard rate ( λ 0 ( t) ) and an exponentated set of covarates. It s ths lack of specfcty of a base-lne hazard functon that makes the model sem-parametrc or dstrbuton-free. If a specfc form were specfed for λ( t ), a parametrc model would result. ( ) λ 0 t may be thought of as the hazard functon for an ndvdual that has a value of 0 for each of the covarates and for xβ whom e = The regresson model s wrtten as 31

32 where ( t) λ ( t) ( t) = α( t) + β1x 1 + K β k xk log λ +. [43] α = log 0 [Allson, 1984]. The model s called the proportonal hazards model because t has the property that dfferent unts have hazard functons that are proportonal [Lawless, 1982]. Ths means that the rato of the hazard functon for two unts wth ndependent covarates does not vary wth t. For two ndvduals, and j, equaton [53] can be expressed as the rato of two hazard functons such that λ λ ( t) ( t) j = e ( x x j ) + + βk ( xk x jk ) β1 1 1 K. [44] The hazard for any ndvdual s a fxed proporton of the hazard of any other ndvdual at any pont n tme. The unqueness of the proportonal hazards model s the manner n whch the β parameters are estmated n the absence of knowledge of λ( t ). Cox [1972] referred to ths estmaton procedure as the method of partal lkelhood. The method of partal lkelhood begns by assumng that there s a group of ( ) ndvduals, R t( ), that are at rsk of falure just before the occurrence of t ( ). If only one falure occurs at t ( ), the condtonal probablty that the falure occurs to ndvdual, gven that ndvdual has a vector of covarates x, s represented by λ( t( ) x( )) λ( t( ) xl ) = λ e l R( t( ) ) l R t( ) x β x β 0 = xl β xl β λ0e e ( ) l R( t( ) ) e. [45] Equaton [45] s the hazard functon for ndvdual at a specfc pont n tme, t ( ), dvded by the sum of the hazard functons for all ndvduals n the rsk set just 32

33 before the occurrence of tme t ( ). Because λ 0 s common to every term n the equaton t s elmnated. The partal lkelhood functon s obtaned by takng the product of equaton [45] over all k ponts n tme such that L ( β ) = k e e = 1 l R xβ xl β δ. [46] Equaton [46] does not depend on λ o ( t) and can be maxmzed to provde an estmate of β^ that s consstent and asymptotcally normally dstrbuted [Kalbflesch and Prentce, 1980; Lawless, 1982; Namboodr and Suchndran, 1987]. Although the proportonal hazards model does not requre the specfcaton of a hazard functon, t does not provde for tests about the shape of the hazard functon [Allson, 1995]. Ths lmtaton s overcome wth the use of a pecewse exponental model. The dea behnd the pecewse exponental model s that the tme scale s dvded nto ntervals. Wthn each nterval, the hazard s constant but the hazard s allowed to vary across tme ntervals. The tme scale has J ntervals and the cutponts are defned as a, 0, a1, K aj wth 0 = 0 a and a J =. Each ndvdual has a hazard functon of the form X ( t) = λe for a j 1 t < a j λ [47] or where α j j ( ) = α j + log λ t X [48] = log λ [Allson, 1995]. Ths allows the ntercept to vary across ntervals. 33

34 5.7 Other Issues The models dscussed up to ths pont assumed that the process of nterest contaned only a sngle, nonrepeatable event and two states (orgn and destnaton). But t s not necessary that only one destnaton state exst. Indvduals mght transton to multple end states. In ths stuaton, the model s called an event hstory model wth competng rsks. In a stuaton wth competng rsks, a state-specfc hazard rate s defned for each state and s represented as ( t T < t + t, J = j T t) P λ t ( t) = lm, j = 1, 2, K, k. [49] t t A comparson of equaton [49] to equaton [4] shows that the only dfference s the ncluson of J = j. Ths represents the condtonal probablty that the event occurs between t and t and the event s type j, gven that the event had not occur before tme t [Allson, 1984; Blossfeld, et al., 1989]. The overall hazard rate s ( t) λ ( t) λ = j [50] j and t s the sum of all the state-specfc hazards [Allson, 1984; Blossfeld, et al., 1989]. In a competng rsks model, the ndvdual s at rsk to experence one of several events and the nterest s on the relatve rate assocated wth each event as well as the effect of the covarates on these rates. The earler models also assumed that the destnaton state was absorbng. Ths s certanly the case when the event of nterest s death. Indvduals de and enter the state known as death. The death state s absorbng because there s no returnng from the state. But not all states are absorbng. For ths stuaton, mult-epsode or repeatable event models are approprate. The most common 34

35 case would be where one specfc event occurred repeatedly. Assume that an ndvdual experences an event and the tme that the event s experenced s represented by nonnegatve random varables 0 < T < T < K < T. The duraton 1 2 k between events or the length of an epsode s represented by V k = Tk Tk 1 for k = 1,2,K. For each epsode there s a vector of covarates for the ndvdual. Followng Blossfeld, Hamerle, and Mayer's exposton [Blossfeld, et al., 1989], the hazard rate for the frst epsode s ( t T < t + t T t, x ) 1 P λ ( t x1 ) = lm. [51] t 0 t The hazard rate of the second epsode s ( t T < t + t T t, H ) 2 P 2 2 x, 1 λ ( t x 2, H1 ) = lm 2 [52] t 0 t and may depend on H1 whch contans the prehstory of the process. The hazard rate of the kth epsode s ( t T < t + t T t,, H ) k P k k x k 1 λ ( t x k, H k 1 ) = lm k. [53] t 0 t At the occurrence of the kth epsode, the prehstory s { t x,, t, } H K. =, k k 1 x k-1 The model states that f an ndvdual experences multple epsodes of a sngle event, then the duraton between events s a functon of the duraton of prevous duraton between events. All parametrc regressons are easly adaptable to mult-perod cases [Blossfeld, et al., 1989]. A hazard model wth competng rsk assumes that there are multple possble event states and the mult-epsode model assumes that an ndvdual could experence a sngle event more than once. The combnaton of both of 35

36 these concepts s the mult-epsode and mult-state event hstory model. Usng the same transton tmes and duraton of epsodes from the mult-epsode model, a state varable s defned as Y k : k = 1, 2, K, K. For the kth epsode, there exsts an epsode at k 1 that termnated n state y k 1. From state y k 1, the set of attanable states s represented as M ( y k 1 ). For Y k = j, the hazard rate for the kth epsode s defned as ( t T < t + t, Y = j T t,, H ) k P k k k x k 1 λ j ( t x k, H k 1 ) = lm k, [54] t 0 t where { t y, t, y x,, t y, } H K contans the prehstory of the k 1 = 0, 0 1 1, 1 k 1, k 1 x k-1 process untl tme t k 1 [Blossfeld, et al., 1989]. The total hazard rate s m k k λ ( t x k, H k 1 ) = λ j ( t x k, H k 1 ). [55] j= 1 It s the average of the transton specfc hazard rates and represents the hazard of a transton from state yk 1 n the kth epsode to any other state. As wth the prevous model, all parametrc regressons are easly adaptable to ths case. 6.0 SURVIVAL ANALYSIS IN ACCOUNTING AND FINANCE RESEARCH Relatve to other busness dscplnes such as economcs and management, the use of survval analyss n accountng and fnance research s lmted. There are probably two explanatons for the lmted use of ths research method. The frst reason s the manner n whch econometrcs s presented to graduate students n accountng and fnance doctoral programs. Graduate educaton typcally provdes a two or three sequence econometrc course 36

37 centered on regresson analyss. The standard texts used n these courses are Judge and Hll [1985], Kmenta [1986], and Maddala [1983]. Such textbooks devote very lttle coverage to survval analyss, and, to the extent that the texts provde coverage, t s cursory. Because graduate students are not exposed to survval analyss n ther educaton, ther knowledge of the technque s lmted, and consequently, they do not use the models. The second reason that accounts for the lmted use s the nature of the research problems. A glance at any of the top research journals n accountng or fnance makes t clear that the domnant research area s market-based research. Market-based research questons do not lend themselves to models of survval analyss. To the extent that survval analyss has been used n the accountng or fnance lterature, t s used to address tradtonal research questons. Usng survval analyss ntroduces event tme as a concept of nterest to research areas that have nterested researchers for a number of years. Instead of assumng that the underlyng process s statc, researchers use survval analyss to examne the process that generates the duraton of tme untl an event of nterest occurs. The common applcatons of survval analyss snce the md-1980s are n the areas of fnancal dstress, occupatonal success, and frm survval. 6.1 Fnancal Dstress The ablty of accountng nformaton to predct fnancal dstress generates consderable research n accountng and fnance. Early works by Altman [1968] and Beaver [1966; 1968a; 1968b] represent the emergence of a large body of 37

Chapter 20 Duration Analysis

Chapter 20 Duration Analysis Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state