Estimating a Semi-Parametric Duration Model without Specifying Heterogeneity

Size: px
Start display at page:

Download "Estimating a Semi-Parametric Duration Model without Specifying Heterogeneity"

Transcription

1 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety Jerry A. Hausman and Temen M. Woutersen MIT and Johns Hopkns Unversty Draft, August 2005 Abstract. Ths paper presents a new estmator for the mxed proportonal hazard model that allows for a nonparametrc baselne hazard and tme-varyng regressors. In partcular, ths paper allows for dscrete measurement of the duratons as happens often n practce. The ntegrated baselne hazard and all parameters are estmated at regular rate, N, where N s the number of ndvduals. A hazard model s a natural framework for tme-varyng regressors f a flow or a transton probablty depends on a regressor that changes wth tme snce a hazard model avods the curse of dmensonalty that would arse from nteractng the regressors at each pont n tme wth one another. Keywords: Mxed Proportonal Hazard Model, Tme-varyng regressors, Heterogenety. 1. Introducton The estmaton of duraton models has been the subject of sgnfcant research n econometrcs snce the late 1970s. Snce Lancaster (1979), t has been recognzed that t s mportant to account for unobserved heterogenety n models for duraton data. Falure to account for unobserved heterogenety causes the estmated hazard rate to decrease more wth the duraton than the hazard rate of a randomly selected member of the populaton. Moreover, the estmated proportonal effect of explanatory varables on the populaton hazard rate s smaller n absolute value than that on the hazard rate of the Comments are welcome, jhausman@mt.edu and woutersen@jhu.edu. We thank Su-Hsn Chang, Matthew Hardng, and Marcel Voa for research assstance. We have receved helpful comments from Bo Honoré, Moshe Buchnsky and semnar partcpants at Harvard-MIT, UCLA, Texas A&M, Rce Unversty, Yale Unversty, UC Santa Barbara, the Unversty of Maryland, and the Unversty of Vrgna. 1

2 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 2 average populaton member and decreases wth the duraton. To account for unobserved heterogenety Lancaster proposed a parametrc Mxed Proportonal Hazard (MPH) model, a generalzaton of Cox s (1972) Proportonal Hazard model, that specfes the hazard rate as the product of a regresson functon that captures the effect of observed explanatory varables, a base-lne hazard that captures varaton n the hazard over the spell, and a random varable that accounts for the omtted heterogenety. Lancaster s MPH model was fully parametrc, as opposed to Cox s sem-parametrc approach, and from the outset questons were rased on the role of functonal form and parametrc assumptons n the dstncton between unobserved heterogenety and duraton dependence 1. Ths queston was resolved by Elbers and Rdder (1982) who showed that the MPH model s sem-parametrcally dentfed f there s mnmal varaton n the regresson functon. A sngle ndcator varable n the regresson functon suffces to recover the regresson functon, the base-lne hazard, and the dstrbuton of the unobserved component, provded that ths dstrbuton does not depend on the explanatory varables. Sem-parametrc dentfcaton means that sem-parametrc estmaton s feasble, and a number of sem-parametrc estmators for the MPH model have been proposed that progressvely relaxed the parametrc restrctons. Nelsen et al., (1992) showed that the Partal Lkelhood estmator of Cox (1972) can be generalzed to the MPH model wth Gamma dstrbuted unobserved heterogenety. Ther estmator s sem-parametrc because t uses parametrc specfcatons of the regresson functon and the dstrbuton of the unobserved heterogenety. The estmator requres numercal ntegraton of the order of the sample sze, whch further lmts ts usefulness and makes t mpractcal for most stuaton n econometrcs. Heckman and Snger (1984) consdered the non-parametrc maxmum lkelhood estmator of the MPH model wth a parametrc baselne hazard and regresson functon. Usng results of Kefer and Wolfowtz (1956), they approxmate the unobserved heterogenety wth a dscrete mxture. The rate of convergence and the asymptotc dstrbuton of ths estmator are not known. Another estmator that does not requre the specfcaton of the unobserved heterogenety 1 Heckman (1991) gves an overvew of attempts to make ths dstncton n duraton and dynamc panel data models.

3 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 3 dstrbuton was suggested by Honoré (1990). Ths estmator assumes a Webull baselne hazard and only uses very short duratons to estmate the Webull parameter. Han and Hausman (1990) and Meyer (1990) proposes an estmator that assumes that the baselne hazard s pecewse-constant, to permt flexblty, and that the heterogenety has a gamma dstrbuton. We present smulatons and a theoretcal result that show that usng a nonparametrc estmator of the baselne hazard wth gamma heterogenety yelds nconsstent estmates for all parameters and functons f the true mxng dstrbuton s not a gamma, whch lmts the usefulness of the Han-Hausman-Meyer approach. Thus, we fnd t mportant to specfy a model that does not requre a parametrc specfcaton of the unobserved heterogenety. Horowtz (1999) was the frst to propose an estmator that estmates both the baselne hazard and the dstrbuton of the unobserved heterogenety nonparametrcally. Hs estmator s an adaptaton of the sem-parametrc estmator for a transformaton model that he ntroduced n Horowtz (1996). In partcular, f the regressors are constant over the duraton then the MPH model has a transformaton model representaton wth the logarthm of the ntegrated baselne hazard as the dependent varable and a random error that s equal to the logarthm of a log standard exponental mnus the logarthm of a postve random varable. In the transformaton model the regresson coeffcents are dentfed only up to scale. As shown by Rdder (1990) the scale parameter s dentfed n the MPH model f the unobserved heterogenety has a fnte mean. Horowtz (1999) suggests an estmator of the scale parameter that s smlar to Honoré s (1990) estmator of the Webull parameter and consstent f the fnte mean assumpton holds so that hs approach allows estmaton of the regresson coeffcents (not just up to scale). However, the Horowtz approach only permts estmaton of the regresson coeffcents at a slow rate of convergence and t s not N 1/2 consstent, where N s the sample sze. In practce, there may be three obstacles for applyng Horowtz (1999) MPH estmator. Frst, the duratons need to be measured at a contnuous scale n order to estmate the transformaton model. Ths condton often does not hold n economc data, e.g. unemployment duraton data as dscussed n Han and Hausman (1990). Second, lke the transformaton model, the

4 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 4 MPH estmator does not allow for tme-varyng regressors. Fnally, the estmator reles on arbtrarly short duratons to estmate the scale parameter and, therefore, converges slowly. Thus, the regresson coeffcent estmates, whch are often of prmary nterest, are often not estmated very precsely 2. In ths paper, we derve a new estmator for the mxed proportonal hazard model (wth heterogenety) that allows for a nonparametrc baselne hazard and tme-varyng regressors. No parametrc specfcaton of the heterogenety dstrbuton nor nonparametrc estmaton of the heterogenety dstrbuton s necessary. Intutvely, we condton out the heterogenety dstrbuton, whch makes t unnecessary to estmate t. Thus, we elmnate the problems that arse wth the Lancaster (1979) approach to MPH models. In our new model the baselne hazard rate s nonparametrc and the estmator of the ntegrated baselne hazard rate converges at the regular rate, N 1/2, where N s the sample sze. Ths convergence rate s the same rate as for a duraton model wthout heterogenety. The regressor parameters also converge at the regular rate. A nce feature of the new estmator s that t allows the duratons to be measured on a fnte set of ponts. Such dscrete measurement of duratons s mportant n economcs; for example, unemployment s often measured n weeks. In the case of dscrete duraton measurements, the estmator of the ntegrated baselne hazard only converges at ths set of ponts, as would be expected. It may be argued that the bas n the estmates of the regresson coeffcents s small, f the estmates of the MPH model ndcate that there s no sgnfcant unobserved heterogenety. The problem wth ths argument s that estmates of the heterogenety dstrbuton are usually not very accurate. Gven the results n Horowtz (1999) ths fndng should not come as a surprse. The smulaton results n Baker and Melno (2000) show that t s emprcally dffcult to fnd evdence of unobserved heterogenety, n partcular f one chooses a flexble parametrc representaton of the baselne hazard. However, Han- Hausman (1990) and applcatons of ther approach have found sgnfcant heterogenety usng a flexble approach to the baselne hazard. Bjwaard and Rdder (2002) fnd that the 2 It should be noted that the slower than N 1/2 convergence of Horowtz (1999) estmator s a property of the model. Hahn (1994) and Ishwaran (1996a) show that no estmator can converge at rate N 1/2 under the assumptons of Horowtz (1999). Horowtz (1999) assumes that the frst three moments of the heterogenety dstrbuton exst and Ishwaran (1996b) shows that the fastest possble rate of convergence s N 2/5 for that case and Horowtz (1999) estmator converges arbtrarly close to that rate.

5 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 5 bas n the regresson parameters s largely ndependent of the specfcaton of the baselne hazard. Hence, falure to fnd sgnfcant unobserved heterogenety should not lead to the concluson that the bas due to correlaton of the regressors and the unobservables that affect the hazard s small. Because t s emprcally dffcult to recover the dstrbuton of the unobserved heterogenety, estmators that rely on estmaton of ths dstrbuton may be unrelable. Therefore, we avod estmatng the unobserved heterogenety dstrbuton. Nevertheless, we can dentfy and estmate the regresson parameters and the ntegrated baselne hazard. We fnd the removal of the requrement to estmate the heterogenety dstrbuton a major advantage of our approach 3. Our estmator s related to the estmator by Han (1987). Han derves an estmator, up to scale, of the regresson coeffcents. However, Han s estmator cannot handle tme-varyng regressors and we estmate the regresson coeffcents when tme-varyng regressors are present as well as the scale of the regresson coeffcents. In partcular, by estmatng the regresson coeffcents up to scale, each regresson coeffcent can be nterpreted as the elastcty of the hazard wth respect to ts regressor. Smlarly, Chen s (2002) estmator of the transformaton model cannot handle tme-varyng regressons and only gves the transformaton functon up to scale. Whle Horowtz s (1999) estmator s not subject to the lmtaton of estmatng the regresson coeffcents up to scale only, t converges slowly and t s not N 1/2 consstent whch makes standard nferences technques napplcable unless N s very large. A hazard model s a natural framework for tme-varyng regressors f a flow or a transton probablty depends on a regressor that changes wth tme snce a hazard model avods the curse of dmensonalty that would arse from nteractng the regressors at each pont n tme wth one another. A nonconstructve dentfcaton proof for the duraton model wth tme-varyng regressors can be produced usng technques smlar to Honoré (1993b) and Honoré (1993a) gves such a proof. In partcular, Honoré (1993a) does not assume that the mean of the heterogenety dstrbuton s fnte 4. Rdder and Woutersen (2003) argue that t s precsely the fnte mean assumpton that makes the dentfcaton 3 An uncondtonal approach s also used, n another context, by Heckman (1978) who develops uncondtonal tests to dstngush true and spurous state dependence. 4 nor does Honoré (1993a) assume a tal condton as n Heckman and Snger (1985).

6 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 6 of Elbers and Rdder (1982) weak n the sense that the model of Elbers and Rdder (1982) cannot be estmated at rate N 1/2. As n Honoré (1993a), we do not need the fnte mean assumpton whch gves an ntutve explanaton why we can estmate the model at rate N 1/2. Ths paper s organzed as follows. Secton 2 dscusses the mxed proportonal hazard model (wth heterogenety) and presents our estmator. Secton 3 shows that our estmator converges at the regular rate and s asymptotcally normally dstrbuted. Secton 4 adjust the objectve functon for the case of an endogenous regressor. Secton 5 shows that msspecfyng the heterogenety yelds nconsstent estmates, even f the baselne hazard s nonparametrc. Secton 6 presents an emprcal example and secton 7 concludes. 2. Mxed Proportonal Hazard Model Lancaster (1979) ntroduced the mxed proportonal hazard model n whch the hazard s a functon of a regressor X, unobserved heterogenety v, and a functon of tme λ(t), θ(t X, v) =ve Xβ 0 λ(t). (1) The functon λ(t) s often referred to as the baselne hazard. The popularty of the mxed proportonal hazard model s partly due to the fact that t nests two alternatve explanatons for the hazard θ(t X) to be decreasng wth tme. In partcular, estmatng the mxed proportonal hazard model gves the relatve mportance of the heterogenety, v, and genune duraton dependence, λ(t), see Lancaster (1990) and Van den Berg (2001) for overvews. Lancaster (1979) uses functonal form assumptons on λ(t) and dstrbutonal assumptons on v to dentfy the model. Examples by Lancaster and Nckell (1980) and Heckman and Snger (1984), however, show the senstvty to these functonal form and dstrbutonal assumptons. We avod theses functonal form and dstrbutonal assumptons and consder the mxed proportonal hazard model wth tme-varyng regressors, θ(t x(t),v)=ve x(t)β 0 λ(t) (2) where x(t) s a set of regressors that can vary wth tme, v denotes the heterogenety and s ndependent of x(t) and λ(t) denotes the baselne hazard. We also use x(t) to denote the sequence of the regressors x(s) for s =0to s = t. The mxed proportonal hazard

7 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 7 model of equaton (2) mples the followng survval probabltes, P (T t x(t),v)= F (t x(t),v)=exp( v Z t P (T t x(t)) = E v { F (t x(t),v)} = E v {exp( v 0 e x(s)β 0 λ(s)ds) and Z t 0 e x(s)β 0 λ(s)ds)}. (3) In appled work, duraton are measured dscretely and to fx deas we assume that the duraton are measured on a weekly scale. We also assume that the regressors could only change at the begnnng of the week. Let the regressor x 1 denote the vector of regressors of ndvdual durng week 1, x 2 the regressors of ndvdual durng week two etc. We now can wrte equaton (3) as follows, P (T t x(t)) = E v { F tx (t x(t),v)} = E v {exp( v e x sβ 0 +δ 0,s )}, s=1 where t s a natural number, δ 0,s =ln{ R s s 1 λ(s)ds} and we normalze δ 0,1 =0. Ths specfcaton s smlar to Han-Hausman (1990) who specfy δ 0,s n a smlar manner, but who specfy and estmate v parametrcally, a requrement we remove n ths paper. Kendall (1938) proposes a statstc for rank correlaton. If we are nterested n the rank correlaton between T and the ndex Xβ, then Kendall s (1938) rank correlaton has the followng form, Q(β) = 1 N(N 1) X X 1{T >T j }1{X β>x j β}. j Han (1987) proposes an estmator that maxmzes Q(β), the rank correlaton between T and the ndex Xβ. Under certan assumptons, ncludng that the T only depends on X through the ndex Xβ, maxmzng Q(β) yelds an estmate for β up to scale, excludng the ntercept whch cannot be estmated. 5. However, Kendall s (1938) rank correlaton cannot be used for the case of tme-varyng regressors snce t s unclear whch regressor one should use. We therefore propose the followng modfcaton of the rank correlaton. In partcular, n our model, the expectaton does depend on an ndex, although t has a more complcated form. Defne Z (l; β,δ) = 5 For ths reason, Han (1987) estmates β/ β ; alternatvely, a nonzero coeffcent of a regressor could be normalzed to be one n absolute value, e.g. β 1 =1.

8 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 8 P l s=1 ex sβ+δ s. We propose mnmzng the followng objectve functon, 1 X X LX KX Q(β,δ) = [1{T l} 1{T j k}]1{z (l; β,δ) <Z j (k; β,δ)}. N(N 1) j l=1 k=1 (4) Thus, Z (l; β,δ) s the ndex durng the l th perod. The ntuton for ths objectve functon s the followng. We are comparng two dfferent ndvduals as does Han s objectve functon. However, we are now also takng account of the outcome n each perod through the parameters for the ntegrated hazard functon, δ. The probablty that ndvdual survves perod l s larger than the probablty that ndvdual j survves perod k f and only f Z (l; β 0,δ 0 ) <Z j (k; β 0,δ 0 ). Vce versa f Z (l; β 0,δ 0 ) >Z j (k; β 0,δ 0 ). Thus, we use the outcomes for ndvduals and j together wth these probabltes to yeld an objectve functon that permts dentfcaton of the parameters β 0 and δ 0, wthout the restrcton of only up to scale as n the Han approach. Consder the expectaton of the objectve functon, E{Q(β,δ)} = P P P L P K j l=1 k=1 E[{e vz (l;β 0,δ 0 ) e vz j(k;β 0,δ 0 ) } 1{Z (l; β,δ) <Z j (k; β,δ)}]. N(N 1) Ths expectaton of the objectve functon s mnmzed at the true value of the parameters. To see ths, suppose that Z (l; β 0,δ 0 ) >Z j (k; β 0,δ 0 ) so that e vz (l;β 0,δ 0 ) <e vz j(k;β 0,δ 0 ). Thus, {β,δ} = {β 0,δ 0 } mnmzes [E v {e vz,(l;β 0,δ 0 ) e vz j(k;β 0,δ 0 ) }1{Z (l; β,δ) <Z j (k; β,δ)} Z] for each set {, j, k, l} and therefore the expectaton of the sum. 6 Note that our approach focuses on the probablty than an ndvdual survves perod l (measured from tme 0) whch permts a convenent treatment of the heterogenety n comparson wth the tradtonal approach that focuses on the hazard functon. By only usng comparsons measured from tme t =0we are able to condton out the heterogenety dstrbuton. The more tradtonal hazard approach consders the probablty of survval condtonal on ndvdual survvng up to perod l whch requres an explct treatment of the heterogenety dstrbuton. The defnton of Q(β,δ) that s gven above contans a double sum so that the number of computatonal operatons for calculatng Q(β,δ) s N 2 (note that L and K are fxed). 6 Intheappendx1weshowthatthetruevalueunquely mnmzes the expectaton of the objectve functon.

9 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety 9 In order to reduce the number of computatonal operatons to be of the order N ln N, we use the rank operator. In partcular, let d r =1{T r} for the vector T of length N. Let d be constructed by stackng the vectors d r vertcally for all r =1,..., K. Now both d and Z are of dmenson NK 1. If a regressor s contnuously dstrbuted condtonal on the other regressors, then we can re-wrte Q(β,δ) usng these vectors and the rank functon, Q(β,δ) = 1 NKX d(j)[2 Rank(Z(j)) NK]. N(N 1) j=1 The computatonal burden to calculate 7 Q(β,δ) s proportonal to N ln(n). Note that we have dentfcaton of β rather than dentfcaton only up to an unknown scale coeffcent, whch s the usual outcome of most prevous approaches to the problem. Also, note that by focussng on survval from the begnnng of the sample, we have elmnated the requrement to specfy the heterogenety dstrbuton snce no survval bas (dynamc sample selecton) occurs n our sample comparsons. Our dentfcaton s somewhat smlar to the nonconstructve dentfcaton result of Elbers and Rdder (1982) n the sense that we also assume a contnuously dstrbuted regressor. However, our dentfcaton results dffers n two mportant ways. Frst, our dentfcaton proof s constructve n the sense that t suggests an estmator. Second, our dentfcaton result does not rely on an teratve procedure. An teratve procedure typcally precludes N 1/2 consstency LargeSamplePropertes In ths secton, we derve the large sample propertes of our estmator. Suppose that x k = {x 1,..., x k },x 1,..., x k arescalars. Wesaythatthedenstyoftheregressorspostve around x k f at least one element of x k s contnuously dstrbuted and the densty s postve around all contnuously dstrbuted elements of x k. We assume that we observe {T,x } where T s a natural number and T [0,K], K>1. For example, we observe unemployment duraton, whch s measured n weeks, and want to estmate the ntegrated baselne hazard at the end of each week. We assume the followng. 7 SupposewehaveanorderedvectoroflengthN 1; calculatng the rank of a new, N th observaton s ln(n). We can see ths by observng that havng 2(N 1) elements to begn wth would requre us to compare the new observaton to the medan of the 2(N 1) elements; we are then back to comparng the new element to N 1 observaton. Thus, the extra cost s ln(n). The summaton then yelds the rate N ln(n). 8 Indeed, Hahn (1994) shows that the dentfcaton result of Elbers and Rdder (1982) holds for sngular nformaton matrces, so that no N estmator exsts.

10 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety10 Assumpton 1 (Tme-varyng Regressors): Let () {T,v,x} be a random sample, x = {x 1,..., x K },x 1,...,x K are scalars, {T,x} be observed and K 2; () v and x are ndependent, () Pr(T l x) = E v exp{ v P l s=1 exsβ+δs } for l =1,..., K; (v) δ 1 s normalzed to be zero, let {β,δ} Θ, whch s compact; (v) let G be a K by K matrx and let the element G lk be equal to one f apar{x l,x k } R 2K such that Pr(T l x l )=Pr(T k x k ) where the densty of the regressor s postve n an arbtrarly small neghborhood around x l or x k and let G lk be zero otherwse; let the matrx G represent a connected graph; (v) ether (a) x r = x c f x 1 = x 2 =... = x r = x c (thus, x 1 = x c ),Pr(x K = x c ) > 0 for some x c R, let G be a K by K matrx and let the element G lk be equal to one f apar{x c,x k } R K+1 such that Pr(T l x c )=Pr(T k x k ) where the densty of the regressor s postve n an arbtrarly small neghborhood around x c or x k and let G lk be zero otherwse; let the matrx G represent a connected graph; or (b) x l,1 x l,2,x l,3,..., x l,k,x l 1,x l 2,... s contnuously dstrbuted for all l, and Pr(T l x l )=Pr(T k x k ) where x l,1 s n the nteror of the support of x l,1 x l,2,x l,3,..., x l,k,x l 1,x l 2,... for all k; (v) apar{x 1,x 0 1,x 0 2} R 3,x 0 1 6= x 0 2, such that 0 < Pr(T 1 x 1,x 2 )=Pr(T 2 x 0 1,x0 2 ) < 1 where the densty of the regressor s postve n an arbtrarly small neghborhood around x 1 or {x 0 1,x 0 2}. Condtons ()-(v(a)) ensure dentfcaton up to scale and condton ()-(v) ensures complete dentfcaton 9. Condton () s satsfed f the data generatng process s the mxed proportonal hazard model of equaton (2) wth exogenous regressors that can change at the begnnng of each perod. Condton (v) assumes that ether (a) the regressor are constant wth postve probablty or (b) a regressor s contnuously dstrbuted condtonal on the other regressors and earler realzatons of the regressors. The substantal restrcton of condton (v) s that one of the regressors vares wth tme; the theorem below stll holds f condton (v) holds after relabellng the perods. For example, one can label week 1 through 8 as perod 1 so that condton (v) holds whle the other condton hold before or after relabellng. 9 Matrces wth only zeros and ones can be represented by graphs; a connected graph means that, nformally speakng, you can travel from one pont to any other pont but not necessarly drectly. Condton (v) s consderably weaker than a condton that a regressor has a postve condtonal densty on the whole real lne.

11 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety11 Theorem 1: Let assumpton 1 hold. Let δ be contaned n a compact subset D of R K and normalze δ 1 =0. Let {ˆβ,ˆδ} = argmn β,δ Then Q(β,δ) where 1 X X KX KX Q(β,δ) = [1{T l} 1{T j k}]1{z (l) <Z j (k)}. N(N 1) j l=1 k=1 {ˆβ,ˆδ} p {β,δ}. Suppose that the regressor s a vector nstead of a scalar. The easest way to prove dentfcaton for that case s by notng that one can dentfy the regressor up to scale usng only observatons of the frst perod. In partcular, the parameter vector could be estmated up to scale usng the maxmum rank correlaton estmator (MRC). Rank correlaton was ntroduced by Kendall (1938) and Han (1987) proposed the MRC estmator. Han (1987) assumes that one regressor has nfnte support condtonal on all other regressors. We weaken ths assumpton. In partcular, f all regressors are dstrbuted contnuously, then we only requre one regressor s contnuously dstrbuted condtonal on the others wthout any support restrctons. In order to estmate β up to scale, we assume the followng. Assumpton 2: Let () β be contaned n a compact subset B e of R q () Pr(T 1 x 1 )=G(x 1 β,v) where G(.,.) s a strctly monotonc decreasng functon n ts frst argument; () {T,v,x} be a random sample (v) let x 1 = {x 1,1, x 1 }, let S denote that support or a subset of the support of x 1, let S1 denote the nteror of the support of the contnuously dstrbuted x 1,1 condtonal on x 1 and let, for all x 1 S 1, there be an x 1,1 S 1 such that 0 < Pr(T 1 x 1,1, x 1 )=p<1 for some p; (v) let S denote the support of x condtonal on x S 1 and assume that ths support of x, S, s not contaned n any proper lnear subspace of R q, (v) β 1 6=0, and (v) v and x are ndependent.

12 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety12 Proposton 1 Let assumpton 2 hold. Let {ˆκ} = argmn Q(κ) where κ: κ 1 =1 Then Q(κ) = 1 N(N 1) X X [1{T 1} 1{T j 1}]1{Z (1) <Z j (1)}. j ˆκ p β/ β 1. Ths proposton states that β can be estmated up to scale under weaker support condtons than presented by Han (1987). In partcular, f all regressors are dstrbuted contnuously, then we only requre that x 1,1 s contnuously dstrbuted on a small nterval wthout assumng that t has support over the whole real lne 10. If regressor has a dscrete dstrbuton and the support of the contnuously dstrbuted varables s small, then we can frst condton on the regressor wth the dscrete dstrbuton and dentfy the whole model usng theorem 1 and proposton 1. We then can try to dentfy the coeffcent on the regressor usng the objectve functon of equaton (4). Alternatvely, we can check whether ths objectve functon emprcally dentfes the parameters. Suppose that none of the regressors vares wth tme (most lkely, ths would be due to qualty of the data) and that we want to estmate β up to scale. We can then use the objectve functon of equaton (4). Besdes the mld support condton, ths objectve functon can also handle known censorng ponts that depend on the regressors whle Han s (1987) objectve functon cannot handle such censorng. Choosng G(x 1 β)=e v exp( ve x1β ) n proposton 1 and combnng the theorem 1 and proposton 1 yelds a consstency result for {ˆβ,ˆδ 2,..., ˆδ K }. Thus, nstead of estmaton of β up to scale, the objectve functon Q(β,δ) permts estmaton of the β, ncludng the scale. Theorem 2 (Consstency): Let assumpton 1-2 hold. Let Pr(T l x) =E v exp( P s=l s=1 vexsβ+δs ). Let δ be contaned 10 To see ths, consder choosng S such that x 1 β 1,0 < x β 0 <x 1 β 1,0 for some x 1 and x 1 S 1 and note that there must exst such an x 1 and x 1 snce S 1 contans an nterval.

13 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety13 n a compact subset D of R K and normalze δ 1 =0. Then {ˆβ,ˆδ} p {β,δ}. 3.1 Asymptotc Dstrbuton In ths subsecton, we derve the asymptotc dstrbuton of our estmator. As before, we use the followng objectve functon, where θ = {β,δ}, 1 X X LX KX Q N (θ) = [1{T l} 1{T j k}]1{z (l) <Z j (k)}. N(N 1) In the appendx, we show that Q N (θ) = j P N l=1 k=1 LX 1{T l}k[1 2 ˆF Z {Z (l)}]. (5) l=1 where ˆF K j k=1 Z {Z (l)} = N 1 K 1{Z j(k) <Z (l)}. Note that E[ ˆF Z {Z (l)} Z (l)] = F Z {Z (l)} where F Z s the cumulatve dstrbuton functon of Z (l) for l =1,..., K and =1,...,N. Let Q 0 (θ) be twce contnuously dfferentable at θ 0 wth respect to θ and let H denote the second dervatve dvded by the constant K and evaluated at θ 0,.e. H = 1 K θθq 0 (θ 0 ). We assume the followng. Assumpton 3 (Interor): Let θ 0 =(β,δ) Interor( Θ), where Θ s compact. Let f Z {Z (l)} denote the densty of Z (l). Assumpton 4: Let () the second dervatve H be nonsngular; () let f Z (z) be dfferentable and let f Z (z) Z θ <M for all θ, df Z{z} dz <M for all z and for some M<. Assumpton 4 s a standard regularty condton and supports an argument based on a Taylor expanson We cannot mmedately apply Sherman (1993) snce he requres that Q N (θ 0 ) Q 0 (θ 0 )=O p (N 1 ), an assumpton that s volated for our objectve functon. Therefore, we apply Newey (1991) and Newey and McFadden (1994, lemma 2.8 and secton 7).

14 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety14 Theorem 3 (Asymptotc Normalty) Let assumpton 1-4 hold. Then N{ˆθ θ} d N(0,H 1 ΩH 1 ) where Ω = E[D N (θ 0 )D N (θ 0 ) 0 ] and P LX D N (θ) = 2 [ N l=1 1{T l}f Z {Z (l)} Z (l) θ LX E[ l=1 1{T l}f Z {Z (l)} Z (l) ]]. θ The functon D N (θ) s an approxmate dervatve and an nfluence functon n the termnology of Newey and McFadden (1994). It allows to vew the asymptotc behavor of an estmator as an average, multpled by N. Moreover, bootstrappng an asymptotcally normally dstrbuted estmator that can be represented by an nfluence functon yelds a consstent varance-covarance matrx and consstent confdence ntervals, see Horowtz (2001, theorem 2.2) 12. In the applcaton, we bootstrap the estmator. The matrx Ω = E[D N (θ 0 )D N (θ 0 ) 0 ] can be estmated usng a sample analogue where f Z {Z (l)} can be estmated usng a second order kernel that omts observaton. In order to estmate H let e denote the th unt vector, ε N a small postve constant that depends on the sample sze, and Ĥ the matrx wth, jth element Ĥ j = 1 4ε 2 [ ˆQ(ˆθ+e ε N +e j ε N ) ˆQ(ˆθ e ε N +e j ε N ) ˆQ(ˆθ+e ε N e j ε N )+ ˆQ(ˆθ e ε N e j ε N )]. N Lemma 1 (Newey and McFadden, Estmatng H) Let the condtons of theorem 3 be satsfed. Let ε N 0 and ε N N. Then Ĥ p H. Theorem 3 requres the regressors to be exogenous. Sometmes a regressor can qualfy as an exogenous regressor, even f ts value depend on survval up to a certan pont. For example, a treatment that s randomly assgned wth probablty p h to ndvduals who survved h perods may appear to be endogenous snce t depends on survval. However, n ths duraton framework, we can relabel the treatment as f t s gven at the begnnng of the spell wth probablty p h and consder the randomly assgned treatment 12 Horowtz (2001, theorem 2.2) averages g n(x ).

15 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety15 exogenous 13. In the next secton, we consder endogenous regressors, such as randomly assgned treatment wth partal complance. Our estmates of {δ 1,..., δ K } mply an estmate for the the ntegrated hazard. In partcular, suppose that we measure survval at {0, 1,..., K}, e.g. weekly unemployment data, then Xs=t dλ(t) = exp(ˆδ s ) where t {0, 1,..., K}. s=1 We defne the average hazard on the nterval [a, b) to be the value λ for whch R b a λ(s)ds = Λ(b) Λ(a). Ths gves an expresson for the average hazard, dλ(s) =exp(ˆδ t ) for t 1 <s<t. If the duraton are measured on a fne grd, then one could also approxmate the hazard by numercally dfferentatng the ntegrated hazard Λ(t). d Thus, we can estmate the ntegrated hazard rate at each pont and also approxmate the hazard rate at each pont. Ths dffers consderably from Chen (2002), who only estmates the logarthm of the ntegrated hazard up to a unknown scalar, so that we do not know whether the hazard s ncreasng or decreasng. 4. An Endogenous Regressor The last secton dealt wth exogenous regressors. However, some regressors are endogenous n the sense that the regressor depends on the unobserved heterogenety. Ths stuaton occurs often n panel data and the geness of the problem and an approach to a soluton to the problem are dscussed n e.g. Mundlak (1961), Hausman and Wse (1979) and Hausman and Taylor (1981). For example, n the Natonal Supported Work Demonstraton 14 data, long term unemployed ndvduals are randomly offered tranng but some choose not to partcpate. Thus, there s a partal complance problem and the treatment ndcator can depend on unobserved heterogenety. See also Heckman, LaLonde, and Smth (1999). The duraton model of ths paper gves a natural framework to handle survval 13 In partcular, ndvduals that do not survve up to perod h wll be assgned treatment wth probablty p h ; an alternatve s to use a weghtng functon that gves the weths p h and (1 p h ) to both possble outcomes. 14 Ham and LaLonde (1996) dscuss ths data

16 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety16 selecton and tme-varyng regressors. As dscussed n the last secton, the model can handle survvor selecton wthout usng nstrumental varables. However, some treatment are not just endogenous n the sense of dynamc or survval selecton but are also endogenous n the sense that the treatment stll depends on the unobserved heterogenety v, even after condtonng on survval. Let R {0, 1} denote the treatment assgnment and let X {0, 1} denote actual treatment. Let R be randomly assgned among the ndvduals that are unemployed at tme h. Suppose that an ndvdual can refuse treatment, that s, we can observe R =1and X =0for a partcular ndvdual. The refusal of treatment, or equvalently, the choce of partcpatng, can potentally depend on the unobserved heterogenety v or on the observed regressors. If the probablty of X depends on v, the dstrbuton p(v X =1)s dfferent from p(v X =0). In partcular, let X be a functon of R, v and other exogenous regressor and random nose. Snce the dstrbuton of v depends on X, we have, n general, E v {e vz (l) Z (l),x =0} 6= E v {e vz j(l) Z j (l) =Z (l),x =1}. Therefore, E v {e vz (l) Z (l),x} may not be decreasng n Z (l). Therefore, we need to adjust the objectve functon Q(β,δ) that was ntroduced above, 1 X X LX KX Q(β,δ) = [1{T l} 1{T j k}]1{z (l; β,δ) <Z j (k; β,δ)}. N(N 1) j l=1 k=1 (6) One can vew the ndcators 1{T l} and 1{T j k} as estmators of survval functons. In order to deal wth the self selecton nto treatment, we replace these ndcators by other estmates of survvor functons. In partcular, we replace 1{T l} and 1{T j k} by survvor functons that have the same unobserved heterogenety dstrbuton so that we do not have to explctly model the dstrbuton of the heterogenety. Suppose that ndvduals are treated at the begnnng of perod h. In order to avod survval bas, we condtononsurvvaluptoh andalsoonthendexath 1, Z(h 1). Let R denote the treatment ntenton 15, X the actual treatment and R, X {0, 1}. For now, we assume that R =0mples X =0and that P (X =1 R =1)>P(X =1 R =0). Suppose that there are ndvduals that could have dfferent values of R or X but that have dentcal values of 15 The support of any nstrument can be reduced to two ponts.

17 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety17 exogenous regressors. For example, they became unemployed at the same tme and also have the same exogenous regressors but can have dfferent values of R or X. Let these groups be denoted by G 1,..., G M and let κ be an estmate of the odds rato, κ = 1 ˆp ˆp where ˆp = 1{T h}1{r=1} 1{T h}. For each group, we construct the followng dstrbuton functons, ˆF g,11 = P P K g l=h 1{T l}1{x =1} P K l=h 1{T h}1{x =1} and P g ˆF g,00 = = P P K g l=h [1{T l}1{r =0}1{X =0} κ 1{T l}1{r =1}1{X =0}] P P K g l=h [1{T h}1{r =0}1{X =0} κ 1{T h}1{r =1}1{X =0}] P P K g l=h 1{T l}1{x =0}[1{R =0} κ 1{R =1}] P K l=h 1{T h}1{x =0}[1{R =0} κ 1{R =1}]. P g We use these functons nstead of 1{T l} and 1{T j k} n equaton (6). Defne Z g,11 (l) as the ndex of the the ndvduals of group g wth R = X =1. Smlarly, Z g,00 (l) s the ndex of group g for whch R = X =0. Defne P KX KX Q g 1(β,δ) = [ M ˆF g,11 (l) ˆF g,00 (k)]1{z g,11 (l) <Z g,00 (k)}. l=h k=h For the perods 1, 2,..., (h 1) wecanusethesamestatstc 16 as n earler sectons, Q 2(β,δ) = 1 N(N 1) X X h 1 X h 1 X [1{T l} 1{T j k}]1{z (l; β,δ) <Z j (k; β,δ)}. j l=1 k=1 Moreover, snce R =0mples X =0we can also defne R as n the last secton. In partcular, Let R = R for all ndvduals who have been assgned treatment and for all others we assgn R =0wth probablty 1 ˆp,.e. P (R =0 R =0)=1and P (R =0 T <h)=1 ˆp. We then defne P P j KX KX Q 3(β,δ) = N 1(R =0) N 1 1(R j =0) [1{T l} 1{T j k}]1{z (l; β,δ) <Z j (k; β,δ)}. l=1 k=1 We then mnmze the followng objectve functon, Q (β,δ) =w 1 Q 1(β,δ)+w 2 Q 2(β,δ)+(1 w 1 w 2 )Q 3(β,δ) where 0 <w 1 < 1 and 0 <w 2 < 1. Let the random assgnment or nstrumental varable assumptons mentoned above hold and let the specfcaton assumptons of theorem 2 16 If we calculate the rank of the ndces Z k for k =1,..., (h 1) and =1,..., N, then we can use N(h 1) Q(β, δ) = j=1 d(j)[2 Rank(Z(j)) N(h 1)]. 1 N(N 1)

18 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety18 hold so that Q 1 (β,δ) has a maxmum at the true value of the parameters. If M, the number of groups, s fnte, than Q 1(β,δ) does not pont dentfy any parameters. Ths complcates the choce of the weghts. Suppose that the lmt of Q 1(β,δ), Q 2(β,δ) and Q 3 (β,δ), together, dentfy the parameters but that Q 1 (β,δ) does not play a role n local asymptotcs. Let the assumptons of theorem 2 and 3 hold for the objectve functons Q 2(β,δ) and Q 3(β,δ). In that case, consstency and asymptotc normalty follows from theorem 2 and 3 for any 0 <w 1 < 1 and 0 <w 2 < 1. In partcular, after choosng w 1 = w 2 = 1 3 to get an ntal estmate, one can calculate the asymptotc varancecovarance matrx usng theorem 2 and lemma 1 usng Q 2(β,δ) and Q 3(β,δ). The rato w 2 /(1 w 1 w 2 ) can then be chosen to mnmze a functon of the asymptotc varance. In a second step, one can mnmze Q (β,δ) usng the estmate the rato estmates for the weghts 17. For fnte M, one can use Q 1(β,δ) wthout usng Q 2(β,δ) or Q 3(β,δ) to derve bounds but ths s beyond the scope of the paper. The objectve functon Q 1(β,δ) can be nterpreted as condtonng on both survval up to the end of perod h as well as Z(h) whch removes possble dependence between treatment assgnment and the unobserved heterogenety term. Ths data generatng process resembles the data of Ham and LaLonde (1996); see also Heckman, LaLonde, and Smth (1999). We can extend the analyss n a straghtforward manner to the stuaton of noncomplance n both treatment and control ndvduals, so that R =1and X =0for a partcular ndvdual and R =0and X =1for another ndvdual. However, snce the latter stuaton s relatvely unlkely to occur n practce, we leave the detals as an exercse. 5. Gamma Mxng Dstrbuton Han and Hausman (1990) and Meyer (1990) use a flexble baselne hazard and model the unobserved heterogenety as a gamma dstrbuton. In ths secton we dscuss the senstvty of the estmators of the MPH model to msspecfcaton of the mxng dstrbuton. In partcular, msspecfyng the heterogenety yelds nconsstent estmators and havng a flexble ntegrated baselne hazard Λ(t) does not compensate for a falure to control for 17 An easer but computatonally more ntensve way s to determne the asymptotc varance usng bootstrap and to try several values for w 1 [0, 1], w 2 [0, 1].

19 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety19 heterogenety. We llustrate ths usng two examples. Example 1: Suppose we estmate the followng hazard model, θ(t v, x) =φ x λ(t). The functon λ(t) s nonparametrc and one could (ncorrectly) thnk that the flexblty of ths functon compensates for the lack of unobserved heterogenety. Ths model mples that the followng survvor functon, P (T t x) = F (t x) =exp( φ x Λ(t)). Suppose we observe F (t x) for x =0, 1 and all t 0. Wedefne F 0 (t) = F (t x =0)and estmate Λ(t), ˆΛ(t) = ln F (t x =0) For a gven ˆΛ(t) = ln F (t x =0), the MLE of φ can be derved (see appendx) and t can be shown that 1 plm ˆφ = N E[ln{ F 0 (T )} x =1]. where F 0 s the survval functon for x =0. Suppose that v Gamma(α, α), so that ³ α ³ F 0 (t) = and ln F0 (t) =α ln. Note that φ x Λ(T )= Z v where Z 1+ Λ(t) α 1+ Λ(t) α has an exponental dstrbuton wth mean one. Ths yelds 1 plm ˆφ = N E[α ln{1+z/(φvα}]. where v Gamma(α, α). Note that φ only appears n the denomnator of an argument of a logarthmc functon. Ths does not bode well for consstency. Usng N = 10, 000 we fnd the followng, True φ True α plm ˆφ φ =2 α =1 ˆφ =1.46 φ =2 α =2 ˆφ =1.09 φ =10 α =1 ˆφ =4.04 φ =10 α =2 ˆφ =3.20 Example 2: Suppose we estmate the followng hazard model, θ(t v, x) = ve xβ λ(t) where v has a gamma dstrbuton. The functon λ(t) s nonparametrc and ths tme one could (ncorrectly) thnk that the flexblty of ths functon compensates for the restrctve assumpton that v has a gamma dstrbuton.

20 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety20 Suppose the data s generated by a hazard model, θ(t v,x) =ve x λ(t) where p(v) =e c v, v c and c 0. Thus, v s an exponental stochast to whch the nonnegatve number c s added and the true value of β equals one. Consder estmatng ths model under the assumpton of gamma heterogenety. Wthout loss of generalty, we can wrte the ntegrated baselne hazard as follows, Λ(t) =H(t) d where H(t) s unrestrcted and d>0. Horowtz (1996) and Chen (2002) show how to estmate H(t) at rate N. Suppose that the condtons of Horowtz (1996) or Chen (2002) are fulflled and that one frst estmates H(t) usng one of these methods. Estmatng d s then lke estmatng a Webull model. In the appendx, we show that the nconsstency of β does not depend on the dstrbuton of the regressors. Usng N =10, 000, we found the followng, c β γ v δ v β; γ v =2,δ v = For c =0, correct specfcaton, all parameters can be consstently estmated; the last column gves estmaton results for β for γ v =2and δ v =1. The smulaton results show that the nconsstences ncreases wth c. Note that the asymptotc bas n the examples above does not depend on the shape of the hazard. The followng lemma gves a reason for the asymptotc bas. Lemma 2: Let θ(t v, x) =ve xβ λ(t) where v x. Letv c T 0 Gamma(γ v,δ v ). If c =0, then F (t x) decreases at a polynomal rate. If c>0, then F (t x) decreases at an exponental rate. The lemma states that the survvor probablty as a functon of tme decreases at a polynomal rate f the unobserved heterogenety dstrbuton s a gamma dstrbuton but that

21 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety21 the survvor probablty decreases at an exponental rate f the unobserved heterogenety dstrbuton s a shfted gamma dstrbuton. As the examples show, msspecfcaton of the heterogenety dstrbuton cannot, n general, be corrected by a flexble baselne hazard. The estmator presented n ths paper does not rely on specfyng or estmatng the heterogenety dstrbuton whch explans ts better performance n terms of asymptotc bas and consstency. 6. Emprcal Results We estmate our new duraton model on a sample of 15,491 males who receved unemployment benefts begnnng n 1998 n a data set called the Study of Unemployment Insurance Exhaustees publc use data. The study was desgned to examne the characterstcs, labor market experences, unemployment nsurance (UI) program experences, and reemployment servce recept of UI recpents. 18 The study sample conssts of UI recpents n 25 states who began ther beneft yearn 1998 and receved at least one UI payment, and s desgned to be natonally representatve of UI exhaustees and non-exhaustees. The data descrpton s: The data come from the UI admnstratve records of the 25 sample states and telephone ntervews conducted wth a subsample of these UI recpents. Telephone ntervews were conducted n Englsh and Spansh between July 2000 and February 2001 usng a two-stage process. For the frst 16 weeks, all 25 partcpatng states used mal, phone, and database methods to locate sample members, who were then asked to complete the survey. The second stage, conducted n 10 of the sample states, added feld staff to help locate nonrespondng sample members. The admnstratve data nclude the ndvdual s age, race, sex, weekly beneft amount, frst and last payment date, the state where benefts were collected, and whether benefts were exhausted. (op. ct.) The survey data contan ndvdual level nformaton about labor market and other actvtes from the tme the person entered the UI system through the tme of the ntervew. However, we lmt our econometrc study to the frst 25 weeks of unemployment 18 The followng descrpton follows from whch has further detals of the sample desgn and results.

22 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety22 due to the recognzed change n behavor n week 26 when UI benefts cease for a sgnfcant part of the sample, see e.g. Han-Hausman (1990). The data nclude nformaton about the ndvdual s pre-ui job, other ncome or assstance receved, and demographc nformaton. We use two ndcator varables, race and age over 50 n our ndex specfcaton. We also use the replacement rate whch s the weekly beneft amountdvdedbytheui recpent s base perod earnngs. Lastly, we use the state unemployment rate of the state from whch the ndvdual receved UI benefts durng the perod n whch the ndvdual fled for benefts. Ths varable changes over tme. Table 1 gves the means and standard devatons for the varables we use n our emprcal specfcaton: Table 1 here We frst estmate the unknown parameters of the model usng the gamma heterogenety specfcaton of Han-Hausman (1990) and Meyer (1990) (HHM). Ths specfcaton allows for a pecewse constant baselne hazard, whch does not restrct the specfcaton snce unemployment duraton s recorded on a weekly bass. However, t does mpose a gamma heterogenety dstrbuton on the specfcaton whch can lead to nconsstent estmates as we dscussed above. We estmate the model usng a gradent method and report the HHM estmates and bootstrap standard errors n Table 2. Table 2 here The estmates of the parameters, as reported n table 2, should not depend on how many weeks of data we use (6, 13 or 24 weeks). However, the coeffcents dffer sgnfcantly. We fnd sgnfcant evdence of heterogenety n the two larger samples, whle n the 6 perod sample we do not estmate sgnfcant heterogenety. We also fnd the expected negatve estmates for all of the coeffcents wth the state unemployment rate a sgnfcant factor n affectng the probablty of extng unemployment. When comparng

23 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety23 the estmates of the β across the 3 samples, the scalng changes dependng on the varance of the estmate gamma dstrbuton. Thus, the ratos of the coeffcents should be compared. The ratos of the coeffcents across samples reman smlar wth the results for the 13 perod and 24 perod very close to each other. We now turn to estmate of the new duraton specfcaton, whch does not requre estmaton of a heterogenety dstrbuton usng the same samples as above. Optmzaton of the objectve functon can now create a problem because of ts lack of smoothness. Usual Newton-type gradent methods or conjugate gradent (smplex) methods do not work n ths stuaton. To date we have found that generalzed pattern search algorthms perform best. 19 We use the pattern search routne from Matlab to estmate the parameters. See the Appendx for further detals of our computatonal approach. The basc dea s to begn wth the gamma heterogenety estmates and to construct a boundng box around each parameter estmates of 3 standard devatons. We then fnd new estmates and ncrease the boundng box untl we do not fnd an ncrease n the objectve functon. The routne converges relatvely rapdly. We estmate standard errors usng a bootstrap approach. In Table 3 we gve the estmates of the new duraton model. We also check our pattern search results usng a genetc optmzaton approach that s also dscussed n the appendx. The genetc optmzaton approach has the advantage of not dependng on ntal values. However, t has the dsadvantage of takng much longer to solve so t cannot be used feasbly to bootstrap the results to estmate the standard errors. However, the results of the pattern search algorthm and the genetc optmzaton algorthm are very smlar as we descrbe n the appendx. Table 3 here Agan we fnd that all of the estmated coeffcents have the expected negatve sgns. The coeffcents are also estmated wth a hgh degree of statstcal precson, although ths 19 Further research would be helpful here. We have also used gradent algorthms on a smoothed objectve functon to obtan ntal estmates and then employed Nelder-Mead routnes to fnd the optma. However, the pattern search algorthms appear to work best. See e.g. Audet and Denns (2003) for a recent survey of pattern search algorthms.

24 Estmatng a Sem-Parametrc Duraton Model wthout Specfyng Heterogenety24 fndng may be a result of our large sample sze of 15,491 ndvduals. We agan fnd that the rato of coeffcents remans relatvely stable across the three dfferent samples wth the excepton of the replacement rate whch becomes ncreasngly larger wth respect to the state unemployment rate as the sample length ncreases. The change n the estmated coeffcent for the replacement rate for the 24 week sample appears to arse because most recpents unemployment nsurance termnates after 26 weeks. Han-Hausman (1990) found a sgnfcant change n behavor at week 26. As ndvduals start to approach week 26 the sze of the replacement rate has a dmnshed effect on ther behavor as they foresee the end of ther unemployment benefts begnnng to draw near. In Fgures 1 and 2 we plot the survval curves for the 13 week and 24 week gamma heterogenety estmates and for the estmates from the new model. We ft the survval curves usng a second order local polynomal estmator whch takes account of the standard devatons of the estmated perod coeffcents n Table 2 and The estmated local polynomal survval curves ft the data well for all specfcatons. Fgure 1 here Fgure 2 here We fnd that the results of the new model gves extremely smlar results for the 6 perod data and the 13 perod data. Indeed, a Hausman (1978) specfcaton test on the slope coeffcents s 0.42 wth 4 degrees of freedom. Thus, we fnd that the new model s not senstve to the number of perods used to estmate the model. For the 24 perod model we fnd the coeffcents agan very close to the other results except for the coeffcent of the replacement rate. A Hausman test now rejects the equalty of the slope coeffcents wth a value of 234.3, based essentally on the change n the replacement rate coeffcent. However, snce most ndvduals unemployment benefts run out n the 26th week, the 20 We explan our approach n more detal n the appendx.

Chapter 20 Duration Analysis

Chapter 20 Duration Analysis Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria ECOOMETRICS II ECO 40S Unversty of Toronto Department of Economcs Wnter 07 Instructor: Vctor Agurregabra SOLUTIO TO FIAL EXAM Tuesday, Aprl 8, 07 From :00pm-5:00pm 3 hours ISTRUCTIOS: - Ths s a closed-book

More information

9. Binary Dependent Variables

9. Binary Dependent Variables 9. Bnar Dependent Varables 9. Homogeneous models Log, prob models Inference Tax preparers 9.2 Random effects models 9.3 Fxed effects models 9.4 Margnal models and GEE Appendx 9A - Lkelhood calculatons

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

Factor models with many assets: strong factors, weak factors, and the two-pass procedure Factor models wth many assets: strong factors, weak factors, and the two-pass procedure Stanslav Anatolyev 1 Anna Mkusheva 2 1 CERGE-EI and NES 2 MIT December 2017 Stanslav Anatolyev and Anna Mkusheva

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Systems of Equations (SUR, GMM, and 3SLS)

Systems of Equations (SUR, GMM, and 3SLS) Lecture otes on Advanced Econometrcs Takash Yamano Fall Semester 4 Lecture 4: Sstems of Equatons (SUR, MM, and 3SLS) Seemngl Unrelated Regresson (SUR) Model Consder a set of lnear equatons: $ + ɛ $ + ɛ

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

RELIABILITY ASSESSMENT

RELIABILITY ASSESSMENT CHAPTER Rsk Analyss n Engneerng and Economcs RELIABILITY ASSESSMENT A. J. Clark School of Engneerng Department of Cvl and Envronmental Engneerng 4a CHAPMAN HALL/CRC Rsk Analyss for Engneerng Department

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

An Introduction to Censoring, Truncation and Sample Selection Problems

An Introduction to Censoring, Truncation and Sample Selection Problems An Introducton to Censorng, Truncaton and Sample Selecton Problems Thomas Crossley SPIDA June 2003 1 A. Introducton A.1 Basc Ideas Most of the statstcal technques we study are for estmatng (populaton)

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Limited Dependent Variables and Panel Data. Tibor Hanappi

Limited Dependent Variables and Panel Data. Tibor Hanappi Lmted Dependent Varables and Panel Data Tbor Hanapp 30.06.2010 Lmted Dependent Varables Dscrete: Varables that can take onl a countable number of values Censored/Truncated: Data ponts n some specfc range

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrcs of Panel Data Jakub Mućk Meetng # 8 Jakub Mućk Econometrcs of Panel Data Meetng # 8 1 / 17 Outlne 1 Heterogenety n the slope coeffcents 2 Seemngly Unrelated Regresson (SUR) 3 Swamy s random

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

1 Binary Response Models

1 Binary Response Models Bnary and Ordered Multnomal Response Models Dscrete qualtatve response models deal wth dscrete dependent varables. bnary: yes/no, partcpaton/non-partcpaton lnear probablty model LPM, probt or logt models

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

A Comparative Study for Estimation Parameters in Panel Data Model

A Comparative Study for Estimation Parameters in Panel Data Model A Comparatve Study for Estmaton Parameters n Panel Data Model Ahmed H. Youssef and Mohamed R. Abonazel hs paper examnes the panel data models when the regresson coeffcents are fxed random and mxed and

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

An (almost) unbiased estimator for the S-Gini index

An (almost) unbiased estimator for the S-Gini index An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger

JAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Feb 14: Spatial analysis of data fields

Feb 14: Spatial analysis of data fields Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Dummy variables in multiple variable regression model

Dummy variables in multiple variable regression model WESS Econometrcs (Handout ) Dummy varables n multple varable regresson model. Addtve dummy varables In the prevous handout we consdered the followng regresson model: y x 2x2 k xk,, 2,, n and we nterpreted

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Lecture 19. Endogenous Regressors and Instrumental Variables

Lecture 19. Endogenous Regressors and Instrumental Variables Lecture 19. Endogenous Regressors and Instrumental Varables In the prevous lecture we consder a regresson model (I omt the subscrpts (1) Y β + D + u = 1 β The problem s that the dummy varable D s endogenous,.e.

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

STK4080/9080 Survival and event history analysis

STK4080/9080 Survival and event history analysis SK48/98 Survval and event hstory analyss Lecture 7: Regresson modellng Relatve rsk regresson Regresson models Assume that we have a sample of n ndvduals, and let N (t) count the observed occurrences of

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information