Semiparametric Estimation of a Panel Data Proportional Hazards Model with Fixed Effects

Size: px

Start display at page:

Download "Semiparametric Estimation of a Panel Data Proportional Hazards Model with Fixed Effects"

Caroline Barton
5 years ago
Views:

1 Semiparametric Estimation of a Panel Data Proportional Hazards Model with Fixed Effects Joel L. Horowitz Department of Economics Northwestern University Evanston, IL and Sokbae Lee Department of Economics University of Iowa Iowa City, IA January 2002 Abstract This paper considers a panel duration model that has a proportional hazards specification with fixed effects. The paper shows how to estimate the baseline and integrated baseline hazard functions without assuming that they belong to known, finitedimensional families of functions. Existing estimators assume that the baseline hazard function belongs to a known parametric family. Therefore, the estimators presented here are more general than existing ones. This paper also presents a method for estimating the parametric part of the proportional hazards model under dependent right censoring, under which the partial likelihood estimator is inconsistent. The paper presents some Monte Carlo evidence on the small sample performance of the new estimators. Finally, the estimation methods are illustrated by applying them to National Longitudinal Survey of Youth work history data. The estimated, inverted U-shaped baseline hazard function of job ending suggests that the data are consistent with the job matching theory of Jovanovic 1979). Keywords: Duration analysis, panel data, semiparametric estimation. JEL Codes: C14, C23, C41 We thank John Geweke, George Neumann, Forrest Nelson, Gene Savin, and seminar participants at University of Iowa and Northwestern University for helpful comments and suggestions. This research was supported in part by NSF Grant SES

2 Semiparametric Estimation of a Panel Data Proportional Hazards Model with Fixed Effects 1 Introduction Much empirical research in economics is concerned with the analysis of duration data. In many applications multiple durations of a given individual are observed with possible covariates. For example, the National Longitudinal Survey of Youth NLSY) provides detailed work history for each respondent, thereby allowing a researcher to construct data on durations of each of the jobs that the respondent has held during some period. Other examples of multiple-spell duration data can be found in a recent survey by Van der Berg 2001). This paper considers a panel duration model that has a proportional hazards specification with unobserved heterogeneity. Specifically, this model can be formulated in terms of the hazard functions of successive positive random variables T j the durations of interest) conditional on d 1 vectors of observed covariates X j and an unobserved random variable U the unobserved heterogeneity) for j = 1,..., J. This form of model is 1.1) λ Tj t j x j, u) = λ 0 t j ) expx jβ + u), where λ Tj is the hazard that T j = t j conditional on X j = x j and U = u, λ 0 is the baseline hazard function, and β is a d 1 vector of constant parameters. The random variable U represents unobserved, permanent attributes of individuals. In the example of the NLSY work history data, U may represent heterogeneity in mobility rates across workers. In this model, the observed covariates X j are assumed to be constant within each spell but vary over spells, whereas the unobserved heterogeneity U is assumed to be constant over spells. 1 Covariates that are constant over spells are not included explicitly because they can be regarded as being included in U and also corresponding part of β is not estimable. addition, U is assumed to be arbitrarily correlated with X j and therefore is a fixed effect. Unlike the random-effects approach, the fixed-effects approach does not require X j and 1 There could be another source of heterogeneity that varies over spells. In the example of the NLSY work history data, there could be job-specific heterogeneity across workers, which varies over spells. In this paper, it is assumed implicitly that this kind of heterogeneity is observed and thus part of X j. In 2

3 U to be statistically independent of one another or to have any other known statistical relationship. 2 It is assumed throughout most of the paper that J = 2. The extension to larger J is discussed briefly in Section 5.2. This paper presents methods for estimating λ 0 ) and the integrated baseline hazard function Λ 0 ) 0 λ 0s)ds nonparametrically. 3 That is, this paper shows how to estimate λ 0 and Λ 0 without assuming that they belong to known, finite-dimensional families of functions. Several existing estimators assume that λ belongs to a parametric family. For example, Chamberlain 1985) considers a marginal likelihood approach for models with Weibull, gamma, and lognormal specifications. Ridder and Tunalı 1999) assume that λ 0 is piecewise constant. This paper shows how to estimate λ 0 and Λ 0 nonparametrically when observations of T j are uncensored and when they are right-censored. This paper also considers estimation of β when observations of T j are subject to rightcensoring. An estimator of β based on a partial likelihood approach already exists in uncensored and independently censored cases. See Chamberlain 1985), Kalbfleisch and Prentice 1980, 8.1.2), Lancaster 2000), and Ridder and Tunalı 1999) among others. The partial likelihood method cannot be applied to censored panel durations because the standard independent censoring assumption is likely to be violated. In many applications durations are observed over a fixed period. For example, in the NLSY work history data, the duration of the most recent job of a respondent may be right-censored at the last interview date. Because of the fixed effect, the censoring threshold of T j is not independent of T j unless j = 1. Therefore, β cannot be estimated consistently by using the partial likelihood approach. This paper presents a consistent estimator of β under dependent censoring. The estimation approach developed in this paper consists of two steps. The first step is to express λ 0, Λ 0, and β as functionals of the population distribution of T j, X j ) by utilizing the identification result of Honoré 1993). The second step is to construct suitable empirical analogs for the unknown population quantities that appear as arguments of these 2 If the data are cross-sectional or single-spell, then the fixed-effects approach in this paper cannot be applied. See Horowitz 1999) for estimating the baseline and integrated baseline hazard functions nonparametrically in a cross-sectional proportional hazards model with random effects. Also, see Van der Berg 2001) for comparison between single-spell and multiple-spell models. 3 A recent working paper by Woutersen 2000) proposes a nonparametric estimator of λ 0 for the case of independent censoring. Woutersen 2000) does not provide the asymptotic distribution of his estimator and does not consider estimation of Λ 0. 3

4 functionals, depending on whether or not observations of T j are censored. Let λ n0 and Λ n0, respectively, denote nonparametric estimators of λ 0 and Λ 0, where n is the sample size. It will be shown that λ n0 and Λ n0 are uniformly consistent, and n q/2q+1) λ n0 λ 0 ) and n 1/2 Λ n0 Λ 0 ) are asymptotically normal, where q denotes the order of smoothness of λ 0. It will also be shown that the new estimator β n of β under dependent censoring is consistent, and n 1/2 β n β) is asymptotically normal. The remainder of the paper is organized as follows. Section 2 provides an informal description of the estimators of λ 0, Λ 0, and β. Section 3 presents the formal, asymptotic results for the uncensored case. Section 4 provides rule-of-thumb, data-driven methods for choosing bandwidths needed to estimate λ 0 and Λ 0 for the uncensored case. Extensions of the estimators of λ 0 and Λ 0 are discussed in Section 5. Section 6 presents the results of some Monte Carlo experiments that illustrate the finite-sample properties of the estimators. In Section 7, the estimation methods are illustrated by applying them to the NLSY work history data. Concluding comments are given in Section 8. The proofs of theorems in Section 3 are in Appendix A. Appendix B presents the asymptotic results of the estimators for the censored case. 2 Informal Description of the Estimators 2.1 The Uncensored Case This section provides an informal description of our estimators of λ 0 and Λ 0 under the assumption that observations of T j are uncensored with J = 2. In this case, an estimator of β is already available see aforementioned references in Section 1). Let b n denote the resulting estimator of β. The estimation approach developed here is based on the identification result of Honoré 1993). When the model 1.1) is identified, λ 0 and Λ 0 can be expressed as functionals of the population distribution of T 1, T 2, X 1, X 2 ). Then estimators of λ 0 and Λ 0 can be obtained by replacing unknown population quantities with their empirical analogs. To identify λ 0 and Λ 0, observe first that T j depends on X j only through the index Z j X j β for j = 1, 2. Assume conditional on Z 1, Z 2, U), T 1 and T 2 are independent. 4

5 Then the joint conditional survivor function of T 1 and T 2 is St 1, t 2 z 1, z 2 ) PrT 1 > t 1, T 2 > t 2 Z 1 = z 1, Z 2 = z 2 ) = exp Λ 0 t 1 )e z1+u Λ 0 t 2 )e z 2+u ] dp u z1,z 2, where P u z1,z 2 denotes the distribution of U conditional on Z 1, Z 2 ) = z 1, z 2 ). By differentiation of St 1, t 2 z 1, z 2 ), 2.1) St 1, t 2 z 1, z 2 )/ t 1 = λ 0t 1 ) St 1, t 2 z 1, z 2 )/ t 2 λ 0 t 2 ) expz 1 z 2 ). A scale normalization is needed to make identification possible. This is accomplished here by assuming that S T w t t) dt = 1, λ 0 t) where w t is a scalar-valued function with compact support S T that satisfies S T w t t)dt = 1 and other conditions in Section 3. developed here, as will be seen below. This scale normalization is useful for the estimators Let Rt 1, t 2 z 1, z 2 ) denote the left-hand side of 2.1). Under the scale normalization, 2.1) implies that λ 0 has the form λ 0 t) = w t t 2 ) expz 2 z 1 )Rt, t 2 z 1, z 2 ) dt 2 S T for every z 1, z 2 ). Let w z ) be a scalar-valued function with compact support S Z that satisfies S Z w z z)dz = 1 and other conditions in Section 3. Also, let wt 2, z 1, z 2 ) = w t t 2 )w z z 1 )w z z 2 ). Then 2.2) λ 0 t) = dt 2 S T dz 1 S Z dz 2 wt 2, z 1, z 2 ) expz 2 z 1 )Rt, t 2 z 1, z 2 ). S Z Equation 2.2) is the basis for the estimators of λ 0 and Λ 0 proposed here. 4 This completes the first step of our estimation strategy. 4 Observe that λ 0 can also be written as Z 2.3) λ 0 t) = dt 1 dz 1 dz 2 wt 1, z 1, z 2 ) expz 1 z 2 )Rt 1, t z 1, z 2 ) ZST ZSZ 1. S Z This equation can be the basis for another estimator of λ 0. One can use the same arguments used in Appendix A in order to establish asymptotic results for an estimator based on 2.3). Hence, we just focus on the estimator of λ 0 based on 2.2). Also, one can use a linear combination of these possible estimators. This will be discussed in detail in Section 5. 5

6 In the second step, the estimators of λ 0 and Λ 0 are obtained by replacing the unknown function Rt 1, t 2 z 1, z 2 ) in 2.2) with a uniformly consistent estimator R n t 1, t 2 z 1, z 2 ). The resulting estimators of λ 0 and Λ 0 are 2.4) λ n0 t) = dt 2 S T dz 1 S Z dz 2 wt 2, z 1, z 2 ) expz 2 z 1 )R n t, t 2 z 1, z 2 ) S Z and 2.5) Λ n0 t) = t 0 λ n0 t 1 )dt 1. Section 3 gives conditions under which λ n0 and Λ n0 are uniformly consistent, and n q/2q+1) λ n0 λ 0 ) and n 1/2 Λ n0 Λ 0 ) are asymptotically normal, where q denotes the order of smoothness of λ 0. Intuitively, this is possible because integration over t 2, z 1, z 2 ) or t 1, t 2, z 1, z 2 ) in 2.4)-2.5) creates averaging effects that mitigate the curse of dimensionality. Similar averaging effects occur estimation of single index models e.g., Horowitz and Härdle 1996), Powell, Stock, and Stoker 1989)), partially linear models e.g., Robinson 1988)), additive models e.g., Horowitz 2001), Linton and Härdle 1996)), and transformation models e.g., Horowitz 1996), Horowitz and Gørgens 1999)). In this paper, R is estimated with kernels. To describe an estimator of Rt 1, t 2 z 1, z 2 ), let p t z t 1, t 2 z 1, z 2 ) denote the probability density function of T 1 and T 2 conditional on Z 1 = z 1 and Z 2 = z 2. Write 2.6) Rt 1, t 2 z 1, z 2 ) = t 2 p t z t 1, s 2 z 1, z 2 )ds 2 At 1, t 2 z 1, z 2 ) t 1 p t z s 1, t 2 z 1, z 2 )ds 1 Bt 1, t 2 z 1, z 2 ). Let {T i1, T i2, X i1, X i2 } n denote a random sample of T 1, T 2, X 1, X 2 ) in 1.1). Define Z ni1 = X i1 b n and Z ni2 = X i2 b n. Let K T and K Z be kernel functions of scalar arguments, and let {h n1 }, {h n2 }, and { } n = 1, 2,...) be sequences of bandwidths that converge to zero as n. Conditions that K T, K Z, h n1, h n2, and need to satisfy are given in Section 3. Let p z z 1, z 2 ) denote the probability density function of Z 1 and Z 2. Estimate p z z 1, z 2 ) by p nz z 1, z 2 ) = nh 2 nz) 1 ) ) z1 Z ni1 z2 Z ni2 K Z K Z. 6

7 Let 1 ) be the indicator function. Define A n t 1, t 2 z 1, z 2 ) = nh n1 h 2 nzp nz z 1, z 2 ) ] 1 n ) ) z1 Z ni1 z2 Z ni2 K Z K Z ) t1 T i1 1T i2 > t 2 )K T h n1 and B n t 1, t 2 z 1, z 2 ) = nh n2 h 2 nzp nz z 1, z 2 ) ] 1 n ) ) z1 Z ni1 z2 Z ni2 K Z K Z. ) t2 T i2 1T i1 > t 1 )K T h n2 The estimator of Rt 1, t 2 z 1, z 2 ) is obtained by 2.7) R n t 1, t 2 z 1, z 2 ) = A n t 1, t 2 z 1, z 2 )/B n t 1, t 2 z 1, z 2 ). A higher-order kernel is needed for K Z to insure that certain bias and remainder terms in the asymptotic expansions of n q/2q+1) λ n0 λ 0 ) and n 1/2 Λ n0 Λ 0 ) vanish as n. For estimation of λ 0 t), it is advisable to let h n2 converge to zero faster than h n1 to reduce bias. For estimation of Λ 0 t), it is necessary to have both h n1 and h n2 converge to zero faster than n 1/2q+1), which is the asymptotically optimal rate for λ n0 t), to prevent the asymptotic distribution of n 1/2 Λ n0 Λ 0 ) from having a non-zero mean. 2.2 The Censored Case This section provides informal descriptions of estimators of β, λ 0, and Λ 0 when T 1 and T 2 are subject to dependent right censoring. We assume that the successive durations, T 1 and T 2, are observed over a time period of length C, where C is random with an unknown probability distribution. It is assumed that C is observed for every individual and that C is independent of T 1 and T 2 given X 1 and X 2. The censoring mechanism here governs the sum of T 1 and T 2, rather than each separately. In this case, one observes not T j but Y j mint j, C j ), where C 1 = C and C 2 = C T 1 )1T 1 C). Observe that C 2 depends on T 1, and, therefore, on T 2 because of the fixed effect. Hence, the censoring mechanism here violates the standard independence 7

8 assumption, under which C j is independent of T j given X j for j = 1, 2. 5 Define indicator variables by j = 1T j C j ) for j = 1, 2. An observed random sample now consists of {Y i1, Y i2, X i1, X i2, i1, i2, C i ) : i = 1,..., n} Estimating β This subsection shows how to estimate β under dependent right censoring. As was discussed in Section 1, β cannot be estimated consistently by using the partial likelihood approach. This is because PrY 1 < Y 2 X 1, X 2, U, mint 1, T 2 ) < minc 1, C 2 )) is now dependent on the fixed effect. An approach based on 2.1), however, can be used to obtain a consistent estimator of β. Abusing notation a bit, let St 1, t 2 x 1, x 2 ) = PrT 1 > t 1, T 2 > t 2 X 1 = x 1, X 2 = x 2 ). As in 2.1), 2.8) St, t x 1, x 2 )/ t 1 St, t x 1, x 2 )/ t 2 = expx 1 x 2 ) β] by setting t 1 = t 2 = t. Let R β t x 1, x 2 ) denote the left-hand side of 2.8). Since 2.8) holds for any t, write 2.9) S β w β t)r β t x 1, x 2 ) dt = expx 1 x 2 ) β], where w β ) is a scalar-valued function with compact support S β that satisfies S β w β t)dt = 1 and other conditions in Appendix B.1. This yields 2.10) β = EX 1 X 2 )X 1 X 2 ) ] 1 E X 1 X 2 ) log S β w β t)r β t X 1, X 2 ) dt provided that EX 1 X 2 )X 1 X 2 ) is nonsingular. Define V = S β w β t)r β t X 1, X 2 ) dt and X = X 1 X 2. Equation 2.10) suggests that β can be estimated by a no-intercept OLS regression of a sample analog of log V on X. Carrying out this OLS regression requires an estimator of R β t x 1, x 2 ). There may be several methods for estimating R β t x 1, x 2 ) under dependent right censoring, but we present here a simple estimator based on Burke 1988) and Wang and Wells 1998). An alternative estimator of R β t x 1, x 2 ) will be described briefly in Appendix B.3. 5 Lin, Sun, and Ying 1999), Visser 1996), and Wang and Wells 1998) have considered estimation of the joint survivor or distribution) function of T 1 and T 2 without covariates) under the same type of dependent censoring. )] 8

9 Define the joint conditional sub-distribution function F t 1, t 2 x 1, x 2 ) = PrY 1 t 1, Y 2 t 2, 1 = 2 = 1 X 1 = x 1, X 2 = x 2 ) and its density ft 1, t 2 x 1, x 2 ) = 2 F t 1, t 2 x 1, x 2 )/ t 1 t 2. Also, let Gc x 1, x 2 ) = PrC > c X 1 = x 1, X 2 = x 2 ) denote the survivor function of C conditional on X 1 = x 1 and X 2 = x 2. As in equation 7) of Wang and Wells 1998), observe that 2.11) St 1, t 2 x 1, x 2 ) = Therefore, R β t x 1, x 2 ) can be written as 2.12) R β t x 1, x 2 ) = t 1 fs 1, s 2 x 1, x 2 ) t 2 Gs 1 + s 2 x 1, x 2 ) ds 1ds 2. t ft, s 2 x 1, x 2 )/Gt + s 2 x 1, x 2 ) ds 2 fs 1, t x 1, x 2 )/Gs 1 + t x 1, x 2 ) ds 1 t Ã β t x 1, x 2 ) B β t x 1, x 2 ). The right-hand side of 2.12) can be estimated with kernels. For simplicity, assume that the distribution of X 1 and X 2 is absolutely continuous with respect to Lebesgue measure on R 2d. It is straightforward to include discrete covariates. Let K X be a kernel function of d-dimensional arguments, {h nx } n = 1, 2,...) be a sequence of bandwidths that converge to zero as n, and p x x 1, x 2 ) denote the probability density function of X 1 and X 2. Let p nx x 1, x 2 ) and G n c x 1, x 2 ) denote the kernel estimators of p x x 1, x 2 ) and Gc x 1, x 2 ), that is p nx x 1, x 2 ) = and Define G n c x 1, x 2 ) = Ã nβ t x 1, x 2 ) = nh 2d nx ) 1 nh 2d nxp nx x 1, x 2 ) ) ) x1 X i1 x2 X i2 K X K X h nx h nx ] 1 ] 1 nh n1 h 2d nxp nx x 1, x 2 ) ) ) x1 X i1 x2 X i2 1C i > c)k X K X. h nx h nx ) x1 X i1 x2 X i2 K X K X h nx h nx ) i1 i2 1Y i2 > t) t G n Y i1 + Y i2 X i1, X i2 ) K Yi1 T h n1 ) and B nβ t x 1, x 2 ) = ] 1 nh n2 h 2d nxp nx x 1, x 2 ) ) x1 X i1 x2 X i2 K X K X h nx h nx ) i1 i2 1Y i1 > t) t G n Y i1 + Y i2 X i1, X i2 ) K Yi2 T h n2 ). 9

10 The estimator of R β t x 1, x 2 ) can be obtained by 2.13) Rnβ t z 1, z 2 ) = Ãnβt x 1, x 2 )/ B nβ t x 1, x 2 ). Observe that R nβ t x 1, x 2 ) only uses uncensored data i1 = i2 = 1) and is weighted by the inverse of G n to take into account the effect of censoring. Let w x ) be a scalar-valued function with compact support S X that satisfies conditions in Appendix B.1. Then the OLS estimator β n of β is 2.14) β n = n 1 ) 1 w xi X i X i n 1 ) w xi X i log V ni, where w xi = w x X i1 )w x X i2 ), X i = X i1 X i2 and V ni = S β w β t)r nβ t X i1, X i2 )dt. The weight function w x is introduced here to estimate β without being overly influenced by the tail behavior of the distributions of X 1 and X Estimating λ 0 and Λ 0 In this subsection, we present modified versions of the estimators of λ 0 and Λ 0 described in Section 2.1. Observe that 2.2) holds for the latent variables T 1 and T 2. Therefore, λ 0 and Λ 0 can be estimated by using 2.4) and 2.5) if a consistent estimator of Rt 1, t 2 z 1, z 2 ) is available. For simplicity, it is assumed in this subsection that the distribution of C depends on X 1 and X 2 only through Z 1 and Z 2. Abusing notation a bit, define F t 1, t 2 z 1, z 2 ) = PrY 1 t 1, Y 2 t 2, 1 = 2 = 1 Z 1 = z 1, Z 2 = z 2 ), ft 1, t 2 z 1, z 2 ) = 2 F t 1, t 2 z 1, z 2 )/ t 1 t 2, and Gc z 1, z 2 ) = PrC > c Z 1 = z 1, Z 2 = z 2 ). As in Section 2.2.1, Rt 1, t 2 z 1, z 2 ) can be written as 2.15) Rt 1, t 2 z 1, z 2 ) = t 2 ft 1, s 2 z 1, z 2 )/Gt 1 + s 2 z 1, z 2 ) ds 2 fs 1, t 2 z 1, z 2 )/Gs 1 + t 2 z 1, z 2 ) ds 1 t 1 Ãt 1, t 2 z 1, z 2 ) Bt 1, t 2 z 1, z 2 ). Again the right-hand side of 2.15) can be estimated with kernels. Estimate G z 1, z 2 ) by the kernel estimator G n c z 1, z 2 ) = nh 2 nzp nz z 1, z 2 ) ] 1 n ) ) z1 Z ni1 z2 Z ni2 1C i > c)k Z K Z. 10

11 Define Ã n t 1, t 2 z 1, z 2 ) = nh n1 h 2 nzp nz z 1, z 2 ) ] 1 n ) i1 i2 1Y i2 > t 2 ) G n Y i1 + Y i2 Z ni1, Z ni2 ) K t1 Y i1 T h n1 ) ) z1 Z ni1 z2 Z ni2 K Z K Z and B n t 1, t 2 z 1, z 2 ) = nh n2 h 2 nzp nz z 1, z 2 ) ] 1 n ) i1 i2 1Y i1 > t 1 ) G n Y i1 + Y i2 Z ni1, Z ni2 ) K t2 Y i2 T h n2 ) ) z1 Z ni1 z2 Z ni2 K Z K Z. The estimator of Rt 1, t 2 z 1, z 2 ) is obtained by 2.16) Rn t 1, t 2 z 1, z 2 ) = Ãnt 1, t 2 z 1, z 2 )/ B n t 1, t 2 z 1, z 2 ). 3 Asymptotic Properties of the Estimators This section establishes the asymptotic properties of λ n0 and Λ n0 proposed in Section 2.1 under the assumption that complete spells of T 1 and T 2 are available. Appendix B.1 gives conditions under which n 1/2 β n β) is asymptotically normal, and Appendix B.2 presents the asymptotic properties of λ n0 and Λ n0 for the censored case. We make the following assumptions: Assumption 3.1 Random Sampling). {T i1, T i2, X i1, X i2 : i = 1,..., n} is a random sample of T 1, T 2, X 1, X 2 ) in 1.1). Assumption 3.2 Conditional Independence). T 1 and T 2 are conditionally independent given X 1, X 2, and U. Assumption 3.2 is used to identify λ 0 and Λ 0. duration dependence, which is not treated in this paper. 6 It precludes the possibility of lagged 6 Honoré 1993) achieves identification of the lagged duration model through an analytic continuation. The resulting identifying relation is very different from 2.2), and the estimation approach developed here is not applicable to it. 11

12 Assumption 3.3 Normalization). 0 w tt)/λ 0 t)] dt = 1. As was explained in Section 2.1, Assumption 3.3 is useful to create averaging effects. The same type of scale normalization is used for a similar reason in Horowitz 2001). Assumption 3.4 Covariates). X 1 and X 2 have bounded support. 7 Let pt 1, t 2, z 1, z 2 ) denote the probability density function of T 1, T 2, Z 1, Z 2 ). In what follows, q 2 and r are integers such that r 4 for λ n0 and r 6 for Λ n0. Assumption 3.5 Smoothness). The distribution of T 1, T 2, Z 1, Z 2 ) is absolutely continuous with respect to Lebesgue measure on R 4. Furthermore, there are intervals of the real line, I T and I Z, such that a) I T = 0, τ T ), where τ T, and I Z is open, b) pt 1, t 2, z 1, z 2 ) is bounded on I T I T I Z I Z, c) pt 1, t 2, z 1, z 2 ) is positive for all t 1, t 2, z 1, z 2 ) inti T I T I Z I Z ), and d) pt 1, t 2, z 1, z 2 ) has bounded partial derivatives up to order q with respect to t j and up to order r with respect to z j for j = 1, 2. In view of 2.1) and 2.6), condition c) ensures that λ 0 t) > 0 for all t inti T ) and condition d) implies that λ 0 is q-times differentiable. Assumption 3.5 also implies that the distribution of Z 1, Z 2 ) is absolutely continuous with respect to Lebesgue measure on R 2 and p z z 1, z 2 ) is positive in the interior of the support of the distribution. 8 Assumption 3.6 Weight Functions). a) The weight function w t ) is a bounded, nonnegative function with compact support S T I T such that S T w t t)dt = 1 and w t is q times continuously differentiable on S T. b) The weight function w z ) is a bounded, non-negative function with compact support S Z I Z such that S Z w z z)dz = 1 and w z is r times continuously differentiable on S Z. Assumption 3.7 Estimator of β). There is a d 1-vector-valued function Ωt 1, t 2, x 1, x 2 ) such that 7 Assumption 3.4 can be relaxed at the expense of more complicated proofs. 8 Assumption 3.5 is not satisfied if all of the covariates are discrete. However, in that case, the estimators of λ 0 and Λ 0 can be easily modified and, in fact, are simpler than the estimators presented in Section

13 a) EΩT 1, T 2, X 1, X 2 ) = 0, b) the components of EΩT 1, T 2, X 1, X 2 )ΩT 1, T 2, X 1, X 2 ) ] are finite, and c) as n, 1. b n β = n 1 ΩT i1, T i2, X i1, X i2 ) + o p n 1/2 ). Assumption 3.7 is satisfied by the partial likelihood estimator of β mentioned in Section Assumption 3.8 Kernels). a) K T has support 1, 1], is bounded and symmetrical about 0, has bounded variation, and satisfies 1 1 if j = 0, u j K T u)du = 0 if j = 1 j q 1, 1 nonzero if j = q. b) K Z has support 1, 1], is bounded and symmetrical about 0, has bounded variation, and satisfies if j = 0, u j K Z u)du = 0 if j = 1 j r 1, nonzero if j = r. c) K Z is everywhere differentiable. K Z v) dk Zv)/dv is bounded and Lipschitz continuous and has bounded variation. Assumption 3.8 requires K Z to be a higher-order kernel. A higher-order kernel is used to insure that certain bias and remainder terms in the asymptotic expansions of n q/2q+1) λ n0 λ 0 ) and n 1/2 Λ n0 Λ 0 ) are negligibly small. Assumption 3.9 Bandwidths). a) For the estimator λ n0, nh 1 n1 h6 nz, nh 1+4q n1 0, nh n1 h 2q n2 0, nh n1h 2r nz 0, log n/nh n1 h 4 nz) 1/4 0, and log n/nh 1 n1 h2 n2 h4 nz) 1/4 0. b) For the estimator Λ n0, nh 6 nz, nh 2q n1 log n/nh 2 n1 h4 nz) 1/4 0, and log n/nh 2 n2 h4 nz) 1/4 0. 0, nh2q n2 0, nh2r nz 0, Assumptions 3.8 and 3.9 a) are satisfied, for example, if K T is a second-order kernel, K Z is a fourth-order kernel, h n1 n 1/5, h n2 n κ 2, and n κz, where 1/5 < κ 2 < 2/5, 1/10 < κ z < 1/5, and κ 2 + 2κ z < 3/5. Also, Assumptions 3.8 and 3.9 b) are satisfied, for example, if K T is a second-order kernel, K Z is a sixth-order kernel, h n1 n κ, h n2 n κ, and n κ z, where 1/4 < κ < 1/3, 1/12 < κ z < 1/8, and κ + 2κ z < 1/2. 13

14 Define ϕt 2, z 1, z 2 ) = p z z 1, z 2 ) 1 wt 2, z 1, z 2 ) expz 2 z 1 ), Ct 1, t 2, z 1, z 2 ) = Bt 1, t 2 z 1, z 2 ) 1 ϕt 2, z 1, z 2 ), Dt 1, t 2, z 1, z 2 ) = Bt 1, t 2 z 1, z 2 ) 2 At 1, t 2 z 1, z 2 )ϕt 2, z 1, z 2 ), ] ) 1 t Ti1 γ t T i1, T i2, X i1, X i2 ) = Ct, t 2, Z i1, Z i2 )1T i2 > t 2 ) dt 2 K T λ 0 t), S T h n1 h n1 and ] Γ t T i1, T i2, X i1, X i2 ) = CT i1, t 2, Z i1, Z i2 )1T i2 > t 2 ) dt 2 10 T i1 t) S T t Dt 1, T i2, Z i1, Z i2 )1T i1 > t 1 ) dt 1 0 ] w z z 1 )w z z 2 ) Λ 0 t) dz 1 dz 2 EX 1 X 2 ] ΩT i1, T i2, X i1, X i2 ). S Z S Z p z z 1, z 2 ) In addition, define ] q B λ t) = dt 2 dz 1 dz 2 Ct, s 2, z 1, z 2 )1t 2 > s 2 ) ds 2 S T S Z S Z S T t q pt, t 2, z 1, z 2 ) 1 1 q! 1 1 u q K T u)du and ] 2 V λ t) = dt 2 dz 1 dz 2 Ct, s 2, z 1, z 2 )1t 2 > s 2 ) ds 2 pt, t 2, z 1, z 2 ) S T S Z S Z S T 1 1 K 2 T u)du. The following theorem gives the main result of this section. Theorem 3.1. Let Assumptions hold. Let 0, τ] I T be a compact interval. Then as n, a) λ n0 t) λ 0 t) = n 1 γ t T i1, T i2, X i1, X i2 ) Eγ t T 1, T 2, X 1, X 2 )] b) Λ n0 t) Λ 0 t) = n 1 uniformly over t 0, τ]. + h q n1 B λt) + o p nh n1 ) 1/2] + o p h q n1 ) and Γ t T i1, T i2, X i1, X i2 ) + o p n 1/2) 14

15 Theorem 3.1 implies that the rate of convergence in probability of λ n0 to λ 0 is maximized at a n q/2q+1) rate by setting h n1 n 1/2q+1) and that Λ n0 converges to Λ 0 in probability uniformly at a n 1/2 rate. Let denote weak convergence in the space of bounded, realvalued functions on 0, τ] equipped with the uniform metric. The following corollary of Theorem 3.1 is easily proved. Corollary 3.2. Let the assumptions of Theorem 3.1 hold. a) Assume h n1 n 1/2q+1). For t 0, τ], n q/2q+1) λ n0 t) λ 0 t)] d N B λ t), V λ t)). b) For t 0, τ], n 1/2 Λ n0 t) Λ 0 t)] χ Λ t), where χ Λ t) is a tight Gaussian process with mean 0 and covariance function Eχ Λ t)χ Λ t )] = EΓ t T 1, T 2, X 1, X 2 )Γ t T 1, T 2, X 1, X 2 )]. Under the assumptions of Corollary 3.2, the asymptotic distribution of n q/2q+1) λ n0 λ 0 ) is not centered at zero. The asymptotic bias B λ can be removed by undersmoothing λ n0 equivalently, by letting h n1 converge faster than n 1/2q+1) ) at the expense of the reduced rate of convergence. The asymptotic variance V λ of λ n0 and the covariance function of χ Λ can be estimated consistently by replacing unknown quantities with sample analogs. See Appendix A.2 for details. 4 Bandwidth Selection This section describes rule-of-thumb, data-driven methods for choosing the values of the bandwidths h n1, h n2, and for the uncensored case. We first consider the choice of h n1. An asymptotically optimal bandwidth h n1 in estimation of λ 0 can be defined as a minimizer of the weighted asymptotic integrated mean-square error of λ n0. It follows from Section 3 that h n1 = c n 1/2q+1), where wt)vλ t) dt c = 2q wt)bλ 2 t) dt ] 1/2q+1) 15

16 and w ) is a weight function. A feasible bandwidth requires an estimate of the constant factor c. To develop a rule of thumb for choosing h n1, assume that ε e U has a gamma distribution with mean 1 and unknown variance θ and is independent of X j. Also, assume that λ 0 belongs to a known parametric family. In the Monte Carlo experiments reported in Section 5, we use the following form λ 0 t, α) = α 1 t α α 3 α 2 t α 2 1, where α α 1, α 2, α 3 ) is a vector of unknown positive constants. This form can be viewed as a mixture of Weibull hazards and is flexible enough to exhibit non-monotone hazards. Under the parametric specification of λ 0, it is straightforward to show that the probability density function of T 1 and T 2 conditional on Z 1 = z 1 and Z 2 = z 2 has the form 4.1) p t z t 1, t 2 z 1, z 2 ) = 1 + θ)λ 0 t 1, α)λ 0 t 2, α)e z 1+z 2 θλ 0 t 1, α)e z 1 + θλ0 t 2, α)e z 2 + 1] 2+1/θ. This suggests that θ and α can be estimated by maximizing the log-likelihood function obtained from p t z. Once θ and α are estimated, then c can be evaluated numerically with an additional assumption about the distribution of Z 1 and Z 2. In the Monte Carlo experiments, we use p z z 1, z 2 ) = 1 s 1 s 2 φ z1 m 1 s 1 ) φ z2 m 2 where φ is the probability density function of the standard normal distribution, and m j and s j are the sample mean and standard deviation of Z nj for each j = 1, 2. Let ĉ denote the resulting constant factor. Now consider h n2 and in estimation of λ 0. Unlike h n1, h n2 and do not affect the asymptotic distribution of λ n0 if Assumption 3.9 is satisfied. Therefore, the values of h n2 and are less critical than the value of h n1. If K T is a second-order kernel and K Z is a fourth-order kernel, then the following rule of thumb can be used: h n2 = ĉ n 2/9 and = s ĉ n 1/9, where s = s 1 + s 2 )/2. This rule satisfies Assumption 3.9 and the Monte Carlo experiments in Section 5 indicate that it performs well. Similarly, one can choose the values of bandwidths in estimation of Λ 0. If K T s 2 ), is a second-order kernel and K Z is a sixth-order kernel, then one can use the following rule: h n1 = h n2 = ĉ n 2/7 and = s ĉ n 1/11. 16

17 A similar, data-based method could be developed to choose the values of the bandwidths for the censored case, although details for the censored case would be quite different from those for the uncensored case. The rule-of-thumb bandwidths presented here can be used as pilot bandwidths for more sophisticated plug-in methods. 5 Extensions 5.1 Combination of Possible Estimators This section presents a method for combining possible estimators of λ and Λ 0. As was noted in Section 2, λ 0 can be expressed as 2.2) or 2.3). Combining these expressions yields λ 0 t) = αt) dt 2 dz 1 dz 2 wt 2, z 1, z 2 ) expz 2 z 1 )Rt, t 2 z 1, z 2 ) S T S Z S 5.1) Z + 1 αt)) dt 1 dz 1 dz 2 wt 1, z 1, z 2 ) expz 1 z 2 )Rt 1, t z 1, z 2 ) 1 S T S Z S Z for any αt) such that 0 αt) 1 for all t. This suggests that λ 0 can be estimated by 5.1) with R n. Let ˆλ n0 denote the resulting estimator of λ 0. For simplicity, we consider only uncensored case and assume that h n1 = h n2 h n and nh 2q n 0. Under this additional assumption along with the assumptions of Theorem 3.1, it can be shown that as n, ˆλ n0 = αt) nh n + 1 αt) nh n Ct, t 2, Z i1, Z i2 )1T i2 > t 2 ) dt 2 S T λ 0 t) + o p nh n ) 1/2] S T Ct1, t, Z i1, Z i2 )1T i1 > t 1 ) dt 1 ] ) t Ti1 K T ] h n K T t Ti2 h n ) uniformly over t 0, τ], where Ct 1, t 2, z 1, z 2 ) = At 1, t 2 z 1, z 2 )p z z 1, z 2 )] 1 wt 1, z 1, z 2 ) expz 1 z 2 ). The weight function αt) can be chosen to minimize the mean squared error of ˆλ n0 t) for each t 0, τ]. It is easy to see that the optimal weight function has the form α t) = 17

18 Ṽ λ t)/v λ t) + Ṽλt)], where V λ t) is defined in Section 3 and ] 2 Ṽ λ t) = dt 1 dz 1 dz 2 Cs1, t, z 1, z 2 )1t 1 > s 1 ) ds 1 pt 1, t, z 1, z 2 ) S T S Z S Z S T 1 1 K 2 T u)du. In applications, V λ t) and Ṽλt) can be estimated easily, thus yielding an estimate of α t) see Appendix A.2). Similarly, Λ 0 can be expressed as Λ 0 t) = αt) t 0 1 αt)) dt 1 S T dt 2 t 0 dt 2 S T dt 1 dz 1 dz 2 wt 2, z 1, z 2 ) expz 2 z 1 )Rt 1, t 2 z 1, z 2 ) S Z S Z dz 1 dz 2 wt 1, z 1, z 2 ) expz 1 z 2 )Rt 1, t 2 z 1, z 2 ) 1. S Z S Z A new estimator of Λ 0 can be obtained by replacing R in the equation above with R n. 5.2 Estimation with Longer Panels The estimation approach described in this paper extends easily to the case of longer panels. First consider the case when observations of T j are uncensored. Observations of any pair of the set {1,..., J} can be used to construct nonparametric estimators of λ 0 and Λ 0 as in Section 2.1 or as in Section 5.1). This gives JJ 1)/2 different estimators, and these can be linearly combined to construct a more efficient estimator. It may be an interesting question what linear combination yields the smallest integrated mean square error among all linear combinations possible, but it is beyond the scope of this paper. Chamberlain 1985) discusses estimation of β when J completed spells are available for each individual. For the censored case, we assume that C 1 = C and C j = C j 1 k=1 T k)1t j 1 C j 1 ) for j = 2,..., J. Here, C is conditionally independent of T j given X j. As in Section 2.2, observe that C censors the sum of T j, not each separately, and that j = 1 for j < J if J = 1. To describe an estimator of β, let t l, t k ) be a pair such that l k. Define the joint survivor function St l, t k x l, x k ) = PrT l > t l, T k > t k X l = x l, X k = x k ), the joint conditional sub-distribution function F t 1,..., t J x l, x k ) = PrY 1 t 1,..., Y J t J, J = 1 X l = 18

19 x l, X k = x k ), its corresponding density ft 1,..., t J x l, x k ) = J F t 1,..., t J x l, x k )/ t 1... t J, and the conditional survivor function of the censoring threshold Gc x 1, x 2 ) = PrC > c X l = x l, X k = x k ). As in the equation 2.11), 5.2) St l, t k x l, x k ) = t l t k... fs 1,..., s J x l, x k ) Gs s J x l, x k ) ds lk ds k ds l, where t lk denotes a vector containing all components of t 1,..., t J ) except t l and t k. By differentiating S with respect to t l and t k and then setting t l = t k = t, St l, t k x l, x k )/ t l 5.3) St l, t k x l, x k )/ t k = expx l x k ) β]. tl =t k =t This suggests that β can be estimated by using the similar procedure described in Section As in 2.14), the estimator of β is given by β nlk = n 1 ) 1 w xilk X ilk X ilk n 1 ) w xilk X ilk log V nilk, where w xilk = w x X il )w x X ik ), X ilk = X il X ik, V nilk = S β w β t) R nβlk t X i1l, X i2k )dt, and R nβlk t x l, x k ) = Ãnβlkt x l, x k )/ B nβlk t x l, x k ) with Ã nβlk t x l, x k ) = ] 1 nh n1 h 2d nxp nx x l, x k ) ) xl X il xk X ik K X K X h nx h nx ) ij 1Y ik > t) t Yil G J n j=1 Y )K T ij Xil, X ik h n1 ) and B nβlk t x l, x k ) = ] 1 nh n2 h 2d nxp nx x l, x k ) ) xl X il xk X ik K X K X h nx h nx ) ij 1Y il > t) t Yik G J n j=1 Y )K T ij Xil, X ik h n1 ). Asymptotic properties of the estimator β nlk can be obtained by repeating the same arguments as in Appendix B.2. Note that since any pair t l, t k ) can be used to construct β nlk, there are JJ 1)/2 possible estimators of β. Furthermore, we can construct jj 1)/2 estimators using only observations up to first j 2 spells. Therefore, there are J j=2 jj 1)/2 possible estimators all together. Again it is an interesting question how to combine 19

20 these estimators. Investigation of such an extension with details, however, is beyond the scope of this paper. It is a topic for future research. Estimators of λ 0 and Λ 0 can also be developed analogously. 6 Monte Carlo Experiments This section presents the results of a small set of Monte Carlo experiments that illustrate the numerical performance of the estimators of λ 0, Λ 0, and β. Samples were generated by simulation from model 1.1) with J = 2. In the experiments, β = 1, X 1 N0, 1), X 2 N0, 1), and X 1 and X 2 are independent. The fixed effect was generated by U = X 1 + X 2 )/2. Experiments were carried out with two baseline hazard functions, which are taken from Horowitz 1999). One is λ 0 t) = 0.087t, which makes 1.1) a Weibull proportional hazard model with unobserved heterogeneity. The other baseline hazard function is λ 0 t) = 0.05t/5) 2/ t/5) 5, which is U-shaped. Experiments were also carried out for both the uncensored and censored cases. The censoring threshold C was generated from the exponential distribution with mean 20. Recall that C 1 = C and C 2 = C T 1 )1T 1 C). Under this censoring mechanism, the means of 1 and 2 are about 0.78 and 0.64, respectively, for the Weibull hazard model and about 0.87 and 0.76, respectively, for the U-shaped hazard model. The experiments used sample sizes of n = 100 and 500. There were 100 Monte Carlo replications per experiment, and the experiments were carried out in GAUSS using GAUSS pseudo-random number generators. We first focus on the finite sample performance of the estimators of λ 0 and Λ 0 for the uncensored case. The partial likelihood estimator was used to estimate β. The kernel functions used in estimation of λ 0 are 6.1) K T u) = 15/16)1 u 2 ) 2 1 u 1) and 6.2) K Z u) = 105/64)1 5u 2 + 7u 4 3u 6 )1 u 1). 20

21 These are second-order and fourth-order kernels. The following sixth-order kernel along with 6.1) is used in estimation of Λ 0 : 6.3) K Z u) = 315/2048)15 140u u 4 396u u 8 )1 u 1). All the kernel functions are taken from Müller 1984). The bandwidths were chosen by the data-based methods described in Section 4. The weight functions and the means of the values of bandwidths used in the experiments are shown in Table 1. 9 It is not difficult to compute λ n0 and Λ n0. The triple integral in 2.4) was evaluated numerically using the Gauss-Legendre quadrature method. The quadruple integral in 2.5) was first evaluated analytically with respect to t 1 and the remaining triple integral was evaluated numerically. See Horowitz and Gørgens 1999, 2.4) for details how the integral in 2.5) can be evaluated analytically with respect to t 1. The results of the experiments are summarized graphically in Figure 1 for the Weibull model and Figure 2 for the U-shaped hazard model. The left-hand panels of the figures show the means of 100 estimates of λ 0 and Λ 0 solid lines) and the true λ 0 and Λ 0 dashed lines). The right-hand panels show five individual estimates of λ 0 and Λ 0 solid lines) and the true λ 0 and Λ 0 dashed lines). The baseline hazard functions used in the experiments do not satisfy the scale normalization; hence, the estimates were normalized by dividing them by 0 w t t)/λ 0 t)dt. It can be seen that the true functions and the means of the estimates are quite close to one another, especially when n = 500. It is not surprising that the estimates of λ 0 are more variable than those of Λ 0 given the rates of convergence of the estimators obtained in Section 3. Most of the individual estimates are reasonable approximations to the functions they estimate. In order to investigate whether there is an advantage to using a combined estimator of λ 0 described in Section 5, we computed ˆλ n0 using equal weight for each t αt) = 0.5) with the same bandwidths used in λ n0. Figure 3 shows the means of 100 estimates of λ 0 and five individual estimates. It can be seen that the biases of ˆλ n0 remain virtually the same as those of λ n0 but the variances of ˆλ n0 are somewhat smaller than those of λ n0. This is not surprising given the fact that ˆλ n0 is just a weighted average of consistent estimators. 9 The weight function w t ) does not satisfy the differentiability requirement of Assumption 3.6. This does not matter in a finite sample because there are no observations of T 2 at discontinuous points. 21

22 We now turn to investigate the small sample performance of the estimators for the censored case. The parameter β was estimated by the method described in Section The regularity conditions established in Appendix B.1 require K T to be a higher-order kernel in order to prevent β n from having the asymptotic bias. As is well known, however, kernel estimates with second-order kernels often outperform those with higher-order kernels for small sample sizes. 10 Due to this reason, the experiments were carried out using both the second-order and fourth-order kernels 6.1) - 6.2) for K T. 11 The second-order kernel 6.1) was used for K X. The single integral in V ni in Section was evaluated numerically using the quadrature method. As in the uncensored case, the kernels 6.1) and 6.2) were used in estimation of λ 0 ; the kernels 6.1) and 6.3) were used for Λ n0. Estimates of β with the fourth-order kernel were used as β n in estimation of λ 0 and Λ 0. The weight functions and the values of bandwidths used for the censored case are shown in Tables 2. The bandwidths were chosen to roughly minimize the integrated) mean square errors of the estimators. The results for the censored case are summarized in Table 3 and Figures 4-5. Table 3 reports the results of the experiments for β n. It is not surprising that the estimates of β exhibit some biases when the second-order kernel is used, given the fact that a higherorder kernel is needed to remove the bias. On the other hand, the use of the higher-order kernel reduces the biases at the expense of increased variances. In order to compare the censored estimator of β to the uncensored estimator, we computed the root mean square error RMSE) of the partial likelihood estimator without censoring. The resulting RMSE s were and 0.098, respectively, for sample sizes of n = 100 and Thus, the RMSE of the censored estimator is quite larger than that of the uncensored estimator roughly by a factor of 2. Figures 4 and 5 show the means of 100 estimates of λ 0 and Λ 0 and five individual estimates, as was shown in Figures 1 and 2. It can be seen that as in the uncensored case, the true functions and the means of the estimates are quite close to one another and the individual estimates are reasonable approximations to the functions they estimate. 10 For example, see Efromovich 2001) for theoretical arguments why the higher-order kernels perform poorly in small samples. 11 When the fourth-order kernel is used, V ni in 2.14) can be negative for finite samples. To deal with this problem, we set w xi = 0 when V ni is not strictly positive. 12 The functional form of the baseline hazard function is irrelevant for the partial likelihood estimator. 22

23 7 An Empirical Example This section presents an empirical example that illustrates the usefulness of the model 1.1). The example consists of using the NLSY work history data to estimate a proportional hazards model of job duration conditional on a couple of covariates. Simple search models in labor economics suggest that job durations be identically and independently distributed over the life cycle under the assumption that workers are homogeneous see, for example, Jovanovic 1979)). We incorporate permanent differences in workers characteristics as the unobserved heterogeneity. Two covariates are used in an attempt to control for job-specific heterogeneity: initial, real hourly wages in 1996 constant dollars and the indicator variable for the second jobs. 13 The sample is selected from the NLSY random sub-sample containing 3,003 men from 1979 to In order to focus on the transition from school to work for relatively homogeneous workers, we limit the sample to 1,093 respondents whose highest degree ever received is a high school diploma or equivalent) and who received a degree after January We confine the analysis to the jobs that: a) averaged at least 30 hours or more a week, b) did not start before receiving a degree, and c) involved private companies. 15 There are cases in which it is not possible to calculate job durations because the starting or ending date of a job is missing or erroneous. Individuals with these cases were dropped from the sample, as were individuals with missing data on hourly wages. After dropping these individuals, there remained 901 men who had at least one job. In the sample, initial hourly wages for a few jobs are too high. We consider too high initial wages erroneous and dropped individuals with erroneous wages. 16 The final sample size was 890. Descriptive statistics for the sample are shown in Table 3. Because we use the NLSY through 1996, the censoring rates of jobs are relatively low; about 80% of 890 men exactly, 707) had at least two uncensored jobs. The mean of the second jobs is about 4.5 weeks 13 Typical demographic variables such as race, marital status, and residential area are not included because they seldom vary over spells. The worker s age as of the start of the job is not included either because there is some endogeneity problem, that is the change in age between two consecutive jobs is the duration of the first job. See Chamberlain 1985) for details. 14 Employment histories are recorded from January 1, 1978 in the NLSY work history data. 15 Government jobs, self-employed jobs, and jobs without pay are excluded. 16 We regard initial wages as erroneous if they are greater than 25 dollars in 1996 constant dollars 23

24 greater than that of the first jobs, and the mean of initial, real hourly wages rises from the first jobs to the second jobs by 65 cents in 1996 constant dollars. We use only data on the first and second jobs, although information on later jobs is available for some of individuals. In order to illustrate the usefulness of the methods described in this paper, we estimate 1.1) with the method for uncensored observations in Section 2.1 and the method for censored observations in Section 2.2. We applied the method in Section 2.1 to the sub-sample that consists of observations with complete spells. The size of this sub-sample was 707. For the sake of comparison, we also estimate an ordinary proportional hazards model without unobserved heterogeneity) using observations with complete spells. First we consider results for parameter estimates. Table 4 reports parameter estimates obtained using the model 1.1) and the ordinary proportional hazards model. Estimates without the heterogeneity were obtained by the standard partial likelihood estimator Cox 1972)). When observations with only complete spells were used, estimates with the heterogeneity were obtained by the partial likelihood estimator mentioned in Section 1. When all observations were used, estimates were obtained by the method described in Section The fourth-order kernel 6.2) was used for K T. The following bandwidths and weight functions were used: h n1 = h n2 = 300, h nx = 4, w β u) = 18 u 156)/148, and w x u) = 11 u 20). 18 The coefficient for the initial wages is significantly negative for the Cox estimator and the uncensored estimator at the 90% level. This implies that higher initial wages decrease the probability of a job s being ended and thus lead to longer job durations. This is consistent with previous research e.g., Topel and Ward 1992)). 19 The coefficient for the indicator variable for the second job is also significantly negative for the Cox estimator and the uncensored estimator. The estimated coefficients for the censored 17 There might be two potential problems in applying the censored estimator in Section One problem is that some workers were unemployed and/or out of labor force between two jobs; the other problem is that when the first job is censored, the wage for the second job is missing. To deal with the first problem, the censoring variable C was computed by C = the last interview date) - the start date of the first job) - time spent out of the labor force or unemployed). This will be a valid modification if time spent out of the labor force or unemployed is independent of durations of jobs given covariates. The second problem does not exist because the censored estimator uses only uncensored data. 18 Qualitative results reported here remained the same when we changed the values of the bandwidths and the ranges of the weight functions. 19 To deal with the heterogeneity, Topel and Ward 1992) treated the fixed effects as estimable parameters and then estimated those by the maximum likelihood estimator. 24

25 estimator are also negative, but they are insignificant at any reasonable level. This may be due to the fact that the censored estimator has larger variance than the uncensored estimator. It seems that the differences between estimation methods are small compared to the standard errors. We now turn to nonparametric estimates of the baseline and integrated baseline hazard functions. For the uncensored estimator, we apply the estimation methods described in Section 2.1 and Section 5.1. Specifically, the baseline hazard function was estimated by a linearly combined estimator ˆλ n0 with an estimate of the optimal weight function α t), and the estimate of the integrated baseline hazard function was obtained by Λ n0 in 2.5). The kernel functions 6.1)-6.3) were used as in Section 6. For both ˆλ n0 and Λ n0, the same bandwidths h n1 and h n2 were used: h n1 = h n2 = c 0 c n 2/7, where c 0 is a constant, c is a constant factor that was computed by the method described in Section 4. Here, we will vary c 0 from 0.75 to 1, 1.25, and 1.5 in order to check the sensitivity to the bandwidths. When c 0 = 1, we obtained h n1 = h n2 = This corresponds to the rule-of-thumb bandwidths with undersmoothing. The bandwidth was set = 2. The following uniform weight functions were used: w t u) = 18 u 156)/148 and w zj u) = 1z j u z j )/z j z j ) with z j = min i Z nij and z j = max i Z nij for j = 1.2. When all observations were used, nonparametric estimates were obtained by the method described in Section with the same bandwidths and weight functions. Just like the uncensored estimator, the baseline hazard function was estimated by a linearly combined estimator ˆλ n0 with an estimate of the optimal weight function. Nonparametric estimates without the heterogeneity were obtained by the methods described in Horowitz 1998, and 5.2.4). The bandwidth h n in estimation of λ 0 equation 5.44) of Horowitz 1998, p.162)) was set h n = c 0 c 2n) 2/7. This bandwidth gives roughly the same amount of smoothing as h n1 and h n2, because the effective sample size for the model without the heterogeneity is 2n. Figures 6 and 7 show nonparametric estimates of the baseline and integrated baseline hazard functions for the model with unobserved heterogeneity solid lines) and the model without unobserved heterogeneity dotted lines) using four different values of c 0. The figures also show 90% pointwise confidence intervals dashed lines) for the estimates with the heterogeneity. The constant c 0 was set uniformly c 0 = 1 25

ESTIMATING PANEL DATA DURATION MODELS WITH CENSORED DATA

ESTIMATING PANEL DATA DURATION MODELS WITH CENSORED DATA Sokbae Lee THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL cemmap working paper CWP13/03 Estimating Panel Data Duration Models with