Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates

Size: px
Start display at page:

Download "Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates"

Transcription

1 Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Jon A. Wellner 1 and Ying Zhang 2 December 1, 2006 Abstract We consider estimation in a particular semiparametric regression model for the mean of a counting process with panel count data. The basic model assumption is that the conditional mean function of the counting process is of the form E{Nt) Z} = expβ T 0 Z)Λ 0 t) where Z is a vector of covariates and Λ 0 is the baseline mean function. The panel count observation scheme involves observation of the counting process N for an individual at a random number K of random time points; both the number and the locations of these time points may differ across individuals. We study semiparametric maximum eudo-likelihood and maximum likelihood estimators of the unknown parameters β 0, Λ 0 ) derived on the basis of a nonhomogeneous Poisson process assumption. The eudo-likelihood estimator is fairly easy to compute, while the maximum likelihood estimator poses more challenges from the computational perspective. We study asymptotic properties of both estimators assuming that the proportional mean model holds, but dropping the Poisson process assumption used to derive the estimators. In particular we establish asymptotic normality for the estimators of the regression parameter β 0 under appropriate hypotheses. The results show that our estimation procedures are robust in the sense that the estimators converge to the truth regardless of the underlying counting process. 1 Research supported in part by National Science Foundation grant DMS , NIAID grant 2R01 AI , and an NWO Grant to the Vrije Universiteit Amsterdam 2 Corresponding author AMS 1980 subject classifications. Primary: 60F05, 60F17; secondary 60J65, 60J70. Key words and phrases. asymptotic distributions, asymptotic efficiency, asymptotic normality, consistency, counting process, empirical processes, information matrix, maximum likelihood, Poisson process, eudo-likelihood estimators, monotone function. 1

2 1. Introduction Suppose that N = {Nt) : t 0} is a univariate counting process. In many applications, it is important to estimate the expected number of events E{Nt) Z} which will occur by the time t, conditionally on a covariate vector Z. In this paper we consider the proportional mean regression model given by Λt Z) E{Nt) Z} = e βt 0 Z Λ 0 t), 1.1) where Λ 0 is a monotone increasing baseline mean function. The parameters of primary interest are β 0 and Λ 0. Suppose that we observe the counting process N at a random number K of random times 0 T K,0 < T K,1 < < T K,K. We write T K T K,1,..., T K,K ), and we assume that K, T K Z) G Z) is conditionally independent of the counting process N given the covariate vector Z. We further assume that Z H on R d with some mild conditions on H for the identifiability of our semiparametric regression model given in Section 3. The observation for each individual consists of X = Z, K, T K, NT K,1 ),..., NT K,K )) Z, K, T K, N K ). 1.2) This type of data is referred to as panel count data. Throughout this manuscript, we will assume that we observe n i.i.d. copies, X 1,..., X n, of X. Panel count data arise in many fields including demographic studies, industrial reliability, and clinical trials; see for example Kalbfleisch and Lawless 1985), Gaver and O Muircheataigh 1987), Thall and Lachin 1988), Thall 1988), Sun and Kalbfleisch 1995), and Wellner and Zhang 2000) where the estimation of either the intensity of event recurrence or the mean function of a counting process with panel count data was studied. Many applications involve covariates whose effects on the underlying counting process are of interest. While there is considerable work on regression modeling for recurrent events based on continuous observations see, for example Lawless and Nadeau 1995), Cook, Lawless, and Nadeau 1996), and Lin, Wei, Yang, and Ying 2000)), regression analysis with panel count data for counting processes has just started recently. Sun and Wei 2000) and Hu, Sun and Wei 2003) proposed estimating equation methods, while Zhang 1998, 2002) proposed a eudo-likelihood method for studying the proportional mean model 1.1) with panel count data. 2

3 To derive useful estimators for this model we will often assume, in addition to 1.1), that the counting process N, conditionally on Z, is a non-homogeneous Poisson process. But our general perspective will be to study the estimators and other procedures when the Poisson assumption may be violated and we assume only that the proportional mean assumption 1.1) holds. Such a program was carried out by Wellner and Zhang 2000) for estimation of Λ 0 without any covariates for this panel count observation model. The outline of the rest of the paper is as follows: In Section 2, we describe two methods of estimation, namely maximum eudo-likelihood and maximum likelihood estimators of β 0,Λ 0 ). The basic picture is that the eudo-likelihood estimator is computationally relatively straightforward and easy to implement, while the maximum likelihood estimators are considerably more difficult, requiring an iterative algorithm in the computation of the profile likelihood. In Section 3, we state the main asymptotic results: strong consistency, rate of convergence and asymptotic normality of ˆβ n maximum likelihood estimators and ˆβ n, for the maximum eudo-likelihood and ˆΛ n ) and ˆβ n, ˆΛ n ) of β 0, Λ 0 ) assuming only the ˆβ n, proportional mean structure 1.1), but not assuming that N is a Poisson process. These results are proved in Section 5 by use of tools from empirical process theory. Although eudo-likelihood methods have been studied in the context of parametric models by Lindsay 1988) and Cox and Reid 2004), not much seems to be known about their behavior in non- and semi-parametric settings such as the one studied here, even assuming that the base model holds. In Section 4, we present the results of simulation studies to demonstrate the robustness of the methods and compare the relative efficiency of the two methods. An application of our methods to a bladder tumor study is presented in this section as well. A general theorem concerning asymptotic normality of semiparametric M-estimators and a technical lemma upon which the proofs of our main theorems rely, are stated and proved in Sections 6 and 7, respectively. 2. Two Likelihood-Based Semiparametric Estimation Methods Maximum Pseudo-likelihood Estimation: To derive our estimators we assume that conditionally on Z, N is a non-homogeneous Poisson process with mean function given by 1.1). The eudo-likelihood method for this model uses the marginal 3

4 distributions of N, conditional on Z, P Nt) = k Z) = Λt Z)k k! exp Λt Z)) and ignores dependence between Nt 1 ), Nt 2 ) to obtain the log-eudo-likelihood: l n β, Λ) = n K i i=1 { } N i) T i) i) K i,j) log ΛT K i,j ) + Ni) T i) K i,j )βt Z i e βt Z i ΛT i) K i,j ). Let R R d be a bounded and convex set, and let F be the class of functions F {Λ : [0, ) [0, ) Λ is monotone nondecreasing, Λ0) = 0}. 2.1) Then the maximum eudo-likelihood estimator ˆβ n, ˆΛ n ) of β 0, Λ 0 ) is given by ˆβ n, ˆΛ n ) argmax β,λ) R F ln β, Λ). This can be implemented in two ste via the usual eudo-) profile likelihood. For each fixed value of β we set and define l,profile n β) ln β, ˆβ n ). Note that l ˆΛ n, β) argmax Λ F l n β, Λ), 2.2) ˆΛ n, β)). Then ˆβ n = argmax β R ln,profile β), and ˆΛ n = ˆΛ n, n β, Λ) depends on Λ only at the observation time points. By convention, we define our estimator to be the one that has jum ˆΛ n only at the observation time points to insure uniqueness. The optimization problem in 2.2) is easily solved and the detail of the solution can be found in Zhang 2002). Maximum Likelihood Estimation: Under the assumption that conditionally on Z, N is a non-homogeneous Poisson process, the likelihood can be calculated using the conditional) independence of the increments of N, Ns, t] Nt) Ns), and the Poisson distribution of these increments: P Ns, t] = k Z) = to obtain the log-likelihood: where l n β, Λ) = n K i i=1 [ Λs, t] Z)]k k! exp Λs, t] Z)) { } N i) K i j log Λ K i j + N i) K i j βt Z i e βt Z i Λ Ki j N Kj NT K,j ) NT K,j 1 ), j = 1,..., K, Λ Kj ΛT K,j ) ΛT K,j 1 ), j = 1,..., K. 4

5 Then ˆβ n, ˆΛ n ) argmax β,λ) R F l n β, Λ). This maximization can also be carried out in two ste via profile likelihood. For each fixed value of β we set ˆΛ n, β) argmax Λ F l n β, Λ), and define ln profile β) l n β, ˆΛ n, β)). Then ˆβ n = argmax β R ln profile β) and ˆΛn = ˆΛ n, ˆβ n ). Similarly, the estimator ˆΛ n is defined to have jum only at the observation time points. To compute the estimate ˆβ n, ˆΛ n ), we adopt a doubly iterative algorithm to update the estimates alternately. The sketch of the algorithm consists of the following ste: S1. Choose the initial β 0) = ˆβ n, the maximum eudo-likelihood estimator. S2. For given β p) p = 0, 1, 2, ), the updated estimate of Λ 0, Λ p) is computed by the modified iterative convex minorant algorithm proposed by Jongbloed 1998) on the likelihood l n β p), Λ). Initialize this algorithm using Λ p 1) and stop the iteration when l n β p), Λ new ) l n β p), Λ current ) l n β p), Λ current ) η. In the very first step, we choose the starting value of Λ by interpolating ˆΛ n linearly between two adjacent jump points to make it monotone increasing and so the likelihood l n β, Λ) is well defined. S3. For given Λ p), the updated estimate of β, β p+1) is obtained by optimizing l n β, Λ p) ) using the Newton-Raphson method. Initialize the algorithm using β p) and stop the iteration when β new β current η. S4. Repeat Ste 2 and 3 until the following convergence criterion is satisfied: l n β p+1), Λ p+1) ) l n β p), Λ p) ) l n β p), Λ p) ) η. As in the case of eudo-likelihood studied in Zhang 2002), it is easy to verify that for any given monotone nondecreasing function Λ, the likelihood l n β, Λ) is a concave function of the regression parameter β with a negative definite Hessian matrix. Using this fact, we can easily show that the iteration process increases the value of the likelihood, i.e. l n β p+1), Λ p+1) ) l n β p), Λ p) ) 0, for p = 0, 1,. The iterative algorithms proposed via the profile eudo-likelihood or the profile likelihood approach converge very well and the convergence does not seem to be effected by the starting point in our simulation experiments described in Section 4. However, this algorithm is not efficient, especially for the maximum likelihood estimation method. It generally needs a considerable number of iterations to achieve the convergence criterion as stated in S4. Meanwhile, computing the profile 5

6 estimator ˆΛ n given in S2 involves the modified iterative convex minorant algorithm which also needs a large number of iterations to converge with the criterion stated in S2. Our simulation experiment with sample size of n=100 shows that computing the maximum likelihood estimator with η = needs about 1800 minutes to converge in a PC Intel Xeon CPU 2.80 GHz) with the algorithm coded in R. Compared to the profile likelihood algorithm, the profile eudo-likelihood algorithm is computationally less demanding, since the profile eudo estimator ˆΛ n has an explicit solution, as shown in Zhang 2002), and hence does not involve any iteration. As result, computing the maximum eudo-likelihood estimator is much faster than computing the maximum likelihood estimator. 3. Asymptotic theory: Results In this section, we study the properties of the estimators ˆβ n, ˆΛ n ) and ˆβ n, ˆΛ n ). We establish strong consistency and derive the rate of convergence of both estimators in some L 2 - metrics related to the observation scheme. asymptotic normality of both ˆβ n and ˆβ n under some mild conditions. We also establish the First we give some notations. Let B d and B denote the collection of Borel sets in R d and R, respectively, and let B [0,τ] = {B [0, τ] : B B} and B 2 [0, τ] = B [0,τ] B [0,τ]. On [0, τ], B [0,τ] ) we define measures µ 1, µ 2, ν 1, ν 2, and γ as follows: for B, B 1, B 2 B [0,τ] and C B d, set ν 1 B C) = P K = k Z = z) C k=1 ν 2 B 1 B 2 C) = P K = k Z = z) C k=1 γb) = R d k=1 k P T k,j B K = k, Z = z)dhz), k P T k,j 1 B 1, T k,j B 2 K = k, Z = z)dhz), P K = k Z = z)p T k,k B K = k, Z = z)dhz). We also define the L 2 -metrics d 1 θ 1, θ 2 ) and d 2 θ 1, θ 2 ) in the parameter space Θ = R F as d 1 θ 1, θ 2 ) = d 2 θ 1, θ 2 ) = { 1/2 β 1 β Λ 1 Λ 2 2 L 2 µ 1 )}, { 1/2 β 1 β Λ 1 Λ 2 2 L 2 µ 2 )}, 6

7 where µ 1 B) = ν 1 B R d ) and µ 2 B 1 B 2 ) = ν 2 B 1 B 2 R d ). To establish consistency, we assume that: C1. The true parameter θ 0 = β 0, Λ 0 ) R F where R is the interior of R. C2. For all j = 1,..., K, K = 1, 2,..., the observation times T K,j are random variables, taking values in the bounded interval [0, τ] for some τ 0, ). The measure µ l H on [0, τ] l R d, B l [0, τ] B d ) is absolutely continuous with respect to ν l for l = 1, 2, and EK) <. C3. The true baseline mean function Λ 0 satisfies Λ 0 τ) M for some M 0, ). C4. The function M 0 defined by M 0 X) K N Kj logn Kj ) satisfies P M 0 X) <. C5. The function M 0 defined by M 0 X) K N Kj log N Kj ) satisfies P M 0 X) <. C6. Z supph), the support of H, is a bounded set in R d. Thus there exists z 0 > 0 such that P Z z 0 ) = 1.) C7. For all a R d, a 0, and c R, P a T Z c) > 0. Condition C7 is needed together with µ l H ν l identifiability of the semiparametric model. Theorem 3.1. from C2 to establish Suppose that Conditions C1-C7 hold and the conditional mean structure of the counting process N is given by 1.1). Then for every b < τ for which µ 1 [b, τ]) > 0, ) d 1 ˆβ n, ˆΛ n 1 [0,b] ), β 0, Λ 0 1 [0,b] ) In particular, if µ 1 {τ}) > 0, then ) d 1 ˆβ n, ˆΛ n ), β 0, Λ 0 ) 0 a.s. as n. 0 a.s. as n. Moreover, for every b < τ for which γ[b, τ]) > 0, d 2 ˆβ n, ˆΛ ) n 1 [0,b] ), β 0, Λ 0 1 [0,b] ) 0 a.s. as n. In particular, if γ{τ}) > 0, then d 2 ˆβ n, ˆΛ ) n ), β 0, Λ 0 ) 0 a.s. as n. 7

8 Remark 3.1. Some condition along the lines of the absolute continuity part of C2 is needed. For example, suppose that Λ 0 t) = t 2, β 0 = 0, Λt) = t, and β = 1. Then if we observe at just one time point T so K = 1 with probability 1), and T = e Z with probability 1, then Λ 0 T )e β 0Z = ΛT )e βz almost surely and the model is not identifiable. C2 holds, in particular, if K, T K ) is independent of Z. The condition on the measure µ 2 H in C2 and C5 are not needed for proving consistency of ˆθ n = ˆβ n, ˆΛ n ), while the condition on the measure µ 1 H in C2 and C4 are not needed for proving consistency of ˆθ n = ˆβ n, ˆΛ n ). To derive the rate of convergence, we also assume that: C8. For some interval O[T ] = [σ, τ] with σ > 0 and Λ 0 σ) > 0, P K {T K,j [σ, τ]}) = 1. C9. P K k 0 ) = 1 for some k 0 <. C10. For some v 0 0, ) the function Z E e v 0Nτ) Z ) is uniformly bounded for Z Z. C11. The observation time points are s 0 separated : i.e. there exists a constant s 0 > 0 such that P T K,j T K,j 1 s 0 for all j = 1,..., K) = 1. Furthermore, µ 1 is absolutely continuous with respect to Lebesgue measure λ with a derivative µ 1 satisfying µ 1 t) c 0 > 0 for some positive constant c 0. C12. The true baseline mean function Λ 0 is differentiable and the derivative has a positive and finite lower and upper bounds in the observation interval, i.e. there exists a constant 0 < f 0 < such that 1/f 0 Λ 0 t) f 0 < for t O[T ]. C13. For some η 0, 1), a T V arz U)a ηa T EZZ T U)a a.s. for all a R d, where U, Z) has distribution ν 1 /ν 1 R + Z). C14. For some η 0, 1), a T V arz U, V )a ηa T EZZ T U, V )a a.s. for all a R d, where U, V, Z) has distribution ν 2 /ν 2 R +2 Z). Theorem 3.2 In addition to the conditions required for the consistency, suppose C8, C9, C10, and C13 hold with the constant v 0 in C10 satisfying v 0 4k 0 1+δ 0 )2 with δ 0 = c 0 Λ 3 0 σ)/24 8f 0) and µ 1 {τ}) > 0. Then ) n 1/3 d 1 ˆβ n, ˆΛ n ), β 0, Λ 0 ) = O p 1). 8

9 Moreover, if conditions C11, C12, and C14 hold along with the conditions listed above but with the constant v 0 in C10 satisfying v 0 4k δ 0 ) 2 with δ 0 = c0 s 3 0 /48 82 f0 4 ) and γ{τ}) > 0, it follows that n 1/3 d 2 ˆβ n, ˆΛ ) n ), β 0, Λ 0 ) = O p 1). Remark 3.2. Conditions C8, C9, C10, C11 and C12 are sufficient for validity of Theorem 3.2, but they are probably not necessary. Conditions C9 and C10 are mainly used in deriving the rate of convergence when the counting process N is allowed to be general but satisfying the mean model 1.1)). C8 says that all the observations should fall in a fixed interval in which the mean function is bounded away from zero and C9 indicates that the number of observations is bounded. These conditions are generally true in clinical applications. Condition C10 holds for all v 0 > 0, if the counting process is uniformly bounded which can be justified in many applications) or forms a Poisson process, conditionally on covariates. The first part of C11 requires that two adjacent observation times should be at least s 0 apart, an assumption which is very reasonable in practice. The second part of C11 implies that the total observation measure µ 1 has a strictly positive intensity or density). C12 requires that the true baseline mean function should be absolutely continuous with bounded intensity function. While C12 is a reasonable assumption in practice, it may be stronger than necessary. We assume C12 mainly for technical convenience in our proofs. Remark 3.3. The metrics d 1 and d 2 are closely related. Since k a j b j ) 2 k 2 k {a j a j 1 ) b j b j 1 )} 2 see Wellner and Zhang 1998) for a proof), the two metrics are equivalent under C9 and therefore the consistency and rate of convergence results for the Maximum Likelihood Estimator ˆβ n, ˆΛ n ) hold under the metric d 1 as well. Remark 3.4. Condition C13 can be justified in many applications. By the Markov inequality, it is easy to see that condition C7 implies that EZZ T ) is a positive-definite matrix. under the probability measure ν 1 /ν 1 R + Z). Let E 1 and V ar 1 denote expectations and variances If we assume that V ar 1 Z U) is a positive-definite matrix, and we set λ 1 = max{eigenvaluee 1 ZZ T U)} and λ d = min{eigenvaluev ar 1Z U))}, then 0 < λ d λ 1. Therefore, for any a R d, a T V ar 1 Z U)a a T λ d a = λ d λ 1 a T λ 1 a λ d λ 1 a T E 1 ZZ T U)a. 9

10 Thus, condition C13 holds by taking η λ d /λ 1. Note that both λ 1 and λ d depend on U in general and the argument here works assuming that this ratio has a positive lower bound uniformly in U. We can justify C14 similarly. Although the overall convergence rate for both the maximum eudo- and likelihood estimators is of the order n 1/3, the rate of convergence for the estimators of the regression parameter, as usual, may still be n 1/2. Similar to the results of Huang 1996) for the Cox model with current status data, we can establish asymptotic normality of both ˆβ n and ˆβ n. Theorem 3.3 Under the same conditions assumed in Theorem 3.2, the estimators ˆβ n and and ˆβ n are asymptotically normal: n ˆβn β 0 ) d Z N d 0, A 1 B A 1) T ), 3.1) n ˆβ n β 0 ) d Z N d 0, A ) 1 B A ) 1) T ), 3.2) where B = E C j,j Z) [ Z RT K,j, T K,j ) ] 2, j,j =1 A = E Λ 0Kj e βt 0 Z [Z RK, T K,j 1, T K,j )] 2, B = E C j,j Z) [Z R K, T K,j )] [ Z R K, T K,j ) ] T, j,j =1 A = E Λ 0Kj e βt 0 Z [Z R K, T K,j )] 2 in which RK, T K,j, T K,j ) E R K, T K,j ) E Ze βt 0 Z K, T K,j, T K,j Ze βt 0 Z K, T K,j ) /E ) /E ), e βt 0 Z K, T K,j, T K,j ) e βt 0 Z K, T K,j, C j,j Z) = Cov [ N Kj, N Kj Z, K, T K ], C j,j Z) = Cov [ NT Kj ), NT Kj ) Z, K, T K,j, T K,j ], Λ 0Kj = Λ 0 T K,j ), and Λ 0,K,j = Λ 0 T K,j ) Λ 0 T K,j 1 ). If the counting process is, conditionally given Z, a non-homogeneous Poisson process with conditional mean function given as specified, then C j,j Z) = Λ 0Kj e βt 0 Z 1{j = j }. It follows that B = A = Iβ 0 ), the information matrix computed in Wellner, Zhang, and Liu 2004), and hence A 1 B A 1) T = I 1 β 0 ). 10

11 This implies that the estimator ˆβ n under the conditional Poisson process is asymptotically efficient. However, since C j,j Z) = e βt 0 Z Λ 0Kj j ), B A. This shows that the semiparametric maximum eudo-likelihood estimator be asymptotically efficient under the Poisson assumption. ˆβ n will not There is, however, a natural Poisson regression model for which the maximum eudo-likelihood estimator is asymptotically efficient: if we simply assume that the conditional distribution of NT K,1 ),..., NT K,K )) given K, T K,1,..., T K,K, Z) is that of a vector of independent Poisson random variables with means given by ΛT K,j Z) = expβ T 0 Z)Λ 0T K,j ) for j = 1,..., K, then C j,j Z) = Cov [ NT K,j ), NT K,j ) Z, K, T K,j, T K,j ] = ΛTK,j Z)1{j = j }. Hence B = A = I P oissregr β 0 ) and ˆβ is asymptotically efficient for this alternative model. In practice, this occurs when NT K,1 ), NT K,2 ),, NT K,K )) consist of cluster Poisson count data in which the counts within a cluster are independent. 4. Numerical Results 4.1. Simulation Studies We generated data using the same schemes as those given in Zhang 2002). Monte- Carlo bias, standard deviation, and mean squared error of the maximum eudolikelihood and maximum likelihood estimates are then compared. Scenario 1. In this scenario, the data is {Z i, K i, T i) K i, N i) K i ) : i = 1, 2,..., n} with Z i = Z i,1, Z i,2, Z i,3 ) where, conditionally on Z i, K i, T Ki ), the counts N i) K i were generated from a Poisson process. For each subject, we generate data by the following scheme: Z i,1 Unif0, 1), Z i,2 N0, 1), Z i,3 Bernoulli0.5); K i is sampled randomly from the discrete set, {1, 2, 3, 4, 5, 6}; Given K i, T i) K i = T i) K i,1, T i) K i,2,..., T i) K i,k i ) are the order statistics of K i random observations generated from Unif1, 10) and rounded to the second decimal point to make the observation times possibly tied. The panel counts N i) K i = N i) T i) K i,1 ), Ni) T i) K i,2 ),..., Ni) T i) K i,k i )) are generated from the Poisson process with the conditional mean function given by Λt Z i ) = 2t expβ T 0 Z i), i.e. N i) T i) K i,j ) Ni) T i) i) K i,j 1) P oisson{2t K i,j T i) K i,j 1 ) expβt 0 Z i )}, 11

12 where β 0 = β 1, β 2, β 3 ) T = 1.0, 0.5, 1.5) T. For this scenario, we can directly calculate the asymptotic covariance matrices given in Theorem 3.3, Σ A ) 1 B A ) 1) T = 1582/17787)W 1 and Σ = A 1 B A 1) T = A 1 = 1260/19179)W 1 respectively, where W = E{e βt 0 Z [Z EZe βt 0 Z )/Ee βt 0 Z )] 2 }. Since it is difficult to evaluate the matrix W analytically, we calculated it numerically using Mathematica Wolfram 1966)) to obtain the following approximate results for the asymptotic covariance matrices: Σ ) and Σ ) We conducted simulation studies with sample sizes of n = 50 and n = 100, respectively. For each case, the Monte-Carlo sample bias, standard deviation and mean squared error for the semiparametric estimators of the regression parameters are reported in Table 1. We also include the asymptotic standard errors obtained from 4.1) and 4.2) in Table 1 to compare with the Monte-Carlo sample standard deviations. The results show that the sample bias for both estimators is small, the standard deviation and mean squared error are smaller for the maximum likelihood method compared to the eudo-likelihood method and the latter decrease as n 1/2 and n 1 respectively as sample size increases. Moreover, the standard errors of estimates based on asymptotic theory are all close to the corresponding standard deviations based on the Monte-Carlo simulations. All of these provide numerical support for our asymptotic results in Theorem 3.3. Based on the results of 1000 Monte-Carlo samples, we plot the pointwise means, 2.5- and 97.5-percentiles of both estimators of the baseline mean function Λt) = 2t in Figure 1. It clearly shows that both estimators seem to have negligible bias and the maximum likelihood estimator has smaller variability compared to the maximum eudo-likelihood estimator. When sample size increases, the variability of both estimators decreases accordingly. Scenario 2. In this scenario, the data is {Z i, K i, T i) K i, N i) K i ) : i = 1, 2,..., n} with Z i = Z i,1, Z i,2, Z i,3 ) and, conditionally on Z i, K i, T i) K i ), the counts N i) K i 12

13 Table 1: Results of the Monte-Carlo simulation studies for the regression parameters estimates based on 1000 repeated samples for data generated from the conditional Poisson process n = 50 n = 100 Pseudo- Pseudo- Likelihood Likelihood Likelihood Likelihood Estimate of β 1 BIAS SD ASE MSE Estimate of β 2 BIAS SD ASE MSE Estimate of β 3 BIAS SD ASE MSE

14 were generated from a mixed Poisson process. For each subject, Z i, K i, T i) K i ) are generated in exactly the same way as in Scenario 1. The panel counts are, however, generated from a homogeneous Poisson process with a random effect on the intensity: given subject i with covariates Z i and frailty variable α i independent of Z i ), the counts are generated from the Poisson process with intensity λ + α i ) expβ T 0 Z i), where λ = 2.0 and α i { 0.4, 0, 0.4} with probabilities 0.25, 0.5, and 0.25, respectively. In this scenario, the counting process given only the covariates is not a Poisson process. However, the conditional mean function of the counting process given the covariates still satisfies 1.1) with Λ 0 t) = 2t and thus our proposed methods are expected to be valid for this case as well. The asymptotic variances given in Theorem 3.3 for this scenario are and Σ = A ) 1 B A ) 1) T = W W 1 W W 1 ) T Σ = A 1 B A 1) T = W W 1 W W 1 ) T, respectively, where W = E{e 2βT 0 Z [Z EZe βt 0 Z )/Ee βt 0 Z )] 2 }. Using Mathematica Wolfram 1966)) to calculate the asymptotic covariance matrices numerically yields: Σ ) and Σ ) As in Scenario 1, we conducted simulation studies with sample sizes of n = 50 and n = 100, respectively. For each case, the Monte-Carlo sample bias, standard deviation and mean squared error for the semiparametric estimators of the regression parameters are computed with 1000 repeated samples. The results are shown in Tables 2. In Figure 1, we also plot the pointwise means, 2.5 percentiles, and percentiles of both estimators of the unconditional baseline mean function Λ 0 t) = 2t based on the results obtained from 1000 Monte-Carlo samples. We observe the same phenomenon as appeared in Scenario 1: for the regression parameters, both 14

15 standard deviation and mean squared error using the maximum likelihood method are smaller than those using the eudo-likelihood method while the bias is relatively small; for the baseline mean function, both estimators have a negligible bias but the maximum likelihood estimator has less variability than the maximum eudolikelihood estimator. We also note that the variability of semiparametric estimators are relatively larger than their counterpart in Scenario 1. This may be caused by violation of the assumption of a conditional Poisson process given only the covariates. We also include the asymptotic standard errors of the regression parameter estimates based on 4.3) and 4.4) in Table 2. Again the standard errors derived from the asymptotic theory are all close to the standard deviations based Monte-Carlo simulations. These simulation studies provide numerical support for the statement that the proposed semiparametric estimation methods are robust against the underlying conditional Poisson process assumption. These methods are valid as long as the proportional mean function model 1.1) holds. We have also conducted several analytical analyses to compare the semiparametric efficiency between the maximum eudo-likelihood and maximum likelihood estimation methods. There is considerable evidence that the maximum likelihood method based on the Poisson process assumption) is more efficient than the eudo-likelihood method both on and off the Poisson model, with large differences occurring when K is heavily tailed. The detailed analytical results are presented in Wellner, Zhang, and Liu 2004) A Real Data Example Using the semiparametric methods proposed in the preceding sections, we analyze the bladder tumor data, extracted from Andrews and Herzberg 1985, pp ). This data set comes from a bladder tumor study conducted by the Veterans Administration Cooperative Urological Research Byar et al., 1977). In the study, a randomized clinical trial of three treatments, placebo, pyridoxine pills and thiotepa instillation into the bladder was conducted for patients with superficial bladder tumor when entering the trial. At each follow-up visit, tumors were counted, measured and then removed if observed, and the treatment was continued. The treatment effects, especially the thiotepa instillation, on suppressing the recurrence of bladder tumor have been explored by many authors, for example, Wei, Lin, and Weissfeld 1989), Sun and Wei 2000), Wellner and Zhang 2000) and Zhang 2002). In this paper, we study the proportional mean model that has been proposed by 15

16 Table 2: Results of the Monte-Carlo simulation studies for the regression parameter estimates based on 1000 repeated samples for data generated from the mixed Poisson process n = 50 n = 100 Pseudo- Pseudo- Likelihood Likelihood Likelihood Likelihood Estimate of β 1 BIAS SD ASE MSE Estimate of β 2 BIAS SD ASE MSE Estimate of β 3 BIAS SD ASE MSE

17 Sun and Wei 2000) and Zhang 2002), E{Nt) Z} = Λ 0 t) expβ 1 Z 1 + β 2 Z 2 + β 3 Z 3 + β 4 Z 4 ), 4.5) where Z 1 and Z 2 represent the number and size of bladder tumors at the beginning of the trial, and Z 3 and Z 4 are the indicators for the pyridoxine pill and thiotepa instillation treatments, respectively. We choose β 0) = 0, 0, 0, 0) to start our iterative algorithm and η = for the convergence criteria to stop the algorithm. Since the asymptotic variances are difficult to estimate, we adopt the bootstrap procedure to estimate the asymptotic standard error of the semiparametric estimates of the regression parameters. We generated 200 bootstrap samples size and calculated the proposed estimators for each sampled data set. The sample standard deviation of the estimates based on these 200 bootstrap samples is used to estimate the asymptotic standard error. The inference based on the bootstrap estimator for asymptotic standard error is given in Table 3. The semiparametric maximum eudo-likelihood and maximum likelihood estimators of the baseline mean function are plotted in Figure 2. Both methods yield the same conclusion that the baseline number of tumors the number of tumors observed when entering the trial) significantly affect the recurrence of the tumor at level 0.05 p-value= and , respectively, for the maximum eudo-likelihood and maximum likelihood methods), and the thiotepa instillation treatment appears to reduce the recurrence of tumor significantly. p-value= and , respectively for the maximum eudolikelihood and maximum likelihood methods). In Figure 2, we can see that the maximum likelihood estimator of the baseline mean function is substantially smaller than the maximum eudo-likelihood estimator which preserves the phenomenon we have observed in nonparameteric estimation methods for this data set studied in Wellner and Zhang 2000). We also notice that the maximum likelihood method, in contrast to what we have observed through both the simulation and analytical studies, yields larger standard errors compared to the eudo-likelihood method. Violation of the proportional mean model 4.5) for this data set could be the explanation for this result, since Zhang 2006) plotted the nonparameteric eudo-likelihood estimators of the mean function for each of three treatments and found that the estimators cross over. While plotting the nonparametric estimators of the mean function for the grou defined by covariates is a reasonable first step in an exploration of the validity of the 17

18 Z 1 Pseudo-Likelihood Z 2 Pseudo-Likelihood Z 3 Pseudo-Likelihood Z 4 Pseudo-Likelihood Table 3: Semiparametric inference for the bladder tumor study based on 200 bootstrap samples from the original data set. Variable Method ˆβ ˆ se ˆβ) ˆ ˆβ/ se ˆβ) p-value Likelihood Likelihood Likelihood Likelihood

19 proposed model, it would be preferable to proceed via more quantitative measures, such as appropriate goodness-of-fit statistics. The construction of goodness-of-fit test statistics for regression modelling of panel count data remains an open problem for future research. All the numerical experiments in this paper were implemented in R. The computing programs are available from the second author. 5. Asymptotic theory: Proofs We use empirical process theory to study the asymptotic properties of the semiparametric maximum eudo-likelihood and maximum likelihood estimators. The proof of Theorem 3.1 is closely related to the proof of Theorems 4.1 of Wellner and Zhang 2000). The rate of convergence is derived based on the general theorem for the rate of convergence given in Theorem of van der Vaart and Wellner 1996). The asymptotic normality proofs for both ˆβ n and ˆβ n are based on the general theorem for M-estimation of regression parameters in the presence of a nonparametric nuisance parameter which is stated and proved) in Section 6. Proof of Theorem 3.1: Zhang 2002) has given a proof for the first part of the theorem concerning the semiparametric maximum eudo-likelihood estimator. Unfortunately, his proof of theorem 1 on pages 47 and 48 is not correct in particular the conditions imposed do not suffice for identifiability as claimed). Here we give proofs for both the maximum eudo-likelihood and maximum likelihood estimators. We first prove the claims concerning the eudo-likelihood estimators ˆβ n, ˆΛ n ). Let M n θ) = n 1 l n β, Λ) = P n m θ X) and M θ) = P m θ X), where K θ X) = { NKj log Λ Kj + N Kj β T Z Λ Kj expβ T Z) }. m First, we show that M has θ 0 = β 0, Λ 0 ) as its unique maximizing point. Computing the expectation conditionally on Z, K, T K ) yields [ M θ 0 ) M θ) = Λu) expβ T Λ0 u) expβ0 T z)h z) ] Λu) expβ T dν 1 u, z), z) where hx) = x logx) x + 1. The function hx) satisfies hx) 0 for x > 0 with equality holding only at x = 1. Hence M θ 0 ) M θ) and M θ 0 ) = M θ) if and only if Λ 0 u) expβ T 0 z) Λu) expβ T z) = 1 a.e. with respect to ν ) 19

20 This implies that β = β 0 and Λu) = Λ 0 u) a.e. with respect to µ 1 5.2) by C2 and C7. Here is a proof of this claim: Let f 1 u) = Λu) Λ 0 u), f 2 u) = Λ 0 u), h 1 z) = expβ T z), h 2 z) = expβ T z) expβ T 0 z). Then 5.1) implies that Λ 0 u) expβ T 0 z) = Λu) expβt z) a.e. ν 1, or, equivalently 0 = {Λu) Λ 0 u)}e βt z + Λ 0 u)e βt z e βt 0 z ) = f 1 u)h 1 z) + f 2 u)h 2 z) a.e. ν 1. Since µ 1 H is absolutely continuous with respect to ν 1 by assumption C2, equality holds in the last display a.e. with respect to µ 1 H. By multiplying across the identity in the last display by ab, integrating with respect to the measure µ 1 H, and then applying Fubini s theorem, it follows that 0 = f 1 adµ 1 h 1 bdh + f 2 adµ 1 h 2 bdh for all measurable functions a = au) and b = bz). The choice of a = f 1 1 A for A B 1 and b = h 1 1 B for B B d yields 0 = f1 2 1 A dµ 1 h 2 11 B dh + f 1 f 2 1 A dµ 1 h 1 h 2 1 B dh ; the choice of a = f 2 1 A for the same A B 1 ) and b = h 2 1 B for the same set B B d ) yields 0 = f 1 f 2 1 A dµ 1 h 1 h 2 1 B dh + f A dµ 1 h 2 21 B dh. Thus we have f A dµ 1 h 2 11 B dh = f 1 f 2 1 A dµ 1 h 1 h 2 1 B dh = f A dµ 1 h 2 21 B dh for all A B 1 and B B d. By Fubini s theorem, this yields f1 2 h 2 1dµ 1 H) = f2 2 h 2 2dµ 1 H). A B for all such sets A, B. But this implies that the measures γ 1 and γ 2 defined by γ j A B) = A B f j 2h2 j dµ 1 H), j = 1, 2, are equal for all the product sets A B, 20 A B

21 and hence, by a standard monotone class argument, we conclude that γ 1 = γ 2 as measures on [0, τ] R d, B 1 [0, τ] B d ). It follows that f1 2u)h2 1 z) = f 2 2u)h2 2 z) a.e. with respect to µ 1 H. Thus we conclude that or, in other words, f1 2u) f2 2u) = h2 2 z) h 2 1 z) a.e. on {u, z) : f1 2 u) > 0, h 2 1z) > 0}, ) Λu) 2 Λ 0 u) 1 = 1 expβ 0 β) T z)) 2 a.e. with respect to µ 1 H. This implies that 5.2) holds in view of C7. Integrating across this identity with respect to µ 1 yields ) Λu) 2 Λ 0 u) 1 dµ 1 u) = 1 expβ 0 β) T z)) 2 µ 1 [0, τ]) a.e. H, and hence the right side is a constant a.e. H. But this implies that β = β 0 in view of C7. Combining this with the last display shows that 5.2) holds. For any given ɛ > 0, let θ n = ˆβ n, 1 ɛ)ˆλ n +ɛλ 0 ) = ˆβ n, Since M n ˆθ n ) M n θ n ) = M n ˆθ n + ɛ0, Λ 0 where 0 lim { = P n P n ɛ 0 M n ˆθ n + ɛ0, Λ 0 ɛ N Kj ˆΛ T exp ˆβ n Z) 1 ˆΛ = ˆΛ n T K,j ). This yields { Λ 0Kj N Kj + ˆΛ ˆΛ n )) M n ˆθ n ) } } T ˆΛ exp ˆβ n Z) ˆΛ n )), it follows that Λ 0Kj ˆΛ n )+ɛ0, Λ 0 T ˆΛ ) exp ˆβ n Z), ) T P n N Kj + Λ 0Kj exp ˆβ n Z) CP n N K,j + Λ 0 T K,j )) a.s. CP N K,j + Λ 0 T K,j )), ˆΛ n ). by C1-C3 and the strong law of large numbers. Here C represents a constant. In the sequel C appearing in different lines may represent different constants.) The 21

22 limit on the right side is finite. On the other hand, { lim sup P n Λ 0Kj N Kj n ˆΛ + lim sup n P n C lim sup n ˆΛ n b)p n } T ˆΛ exp ˆβ n Z) 1 [b,τ] T K,j )ˆΛ T exp ˆβ n Z) 1 [b,τ] T K,j ) = C lim sup n ˆΛ n b) µ 1 [b, τ]). Hence ˆΛ n t) is uniformly bounded almost surely for t [0, b] if µ 1 [b, τ]) > 0 for some 0 < b < τ or for t [0, τ] if µ 1 {τ}) > 0. By the Helly-Selection ) Theorem and the compactness of R F, it follows that ˆθ n = ˆβ n, ˆΛ n has a subsequence ) ˆθ n = ˆβ n, ˆΛ n converging to θ + = β +, Λ + ), where Λ + is an increasing bounded function defined on [0, b] for a b < τ and it can be defined on [0, τ] if µ 1 {τ}) > 0. Following the same argument as in proving Theorem 4.1 of Wellner and Zhang 2000), we can show that M θ + ) M θ 0 ). Since M θ 0 ) M θ + ), by the argument above 5.1), we conclude that M θ + ) = M θ 0 ). Then 5.2) implies that β + = β 0 and Λ + = Λ 0 a.e. in µ 1. Finally, the dominated convergence theorem yields the strong consistency of ˆβ n, ˆΛ n ) in the metric d 1. Now we turn to the maximum likelihood estimator. Let M n θ) = n 1 l n β, Λ) = P n m θ X) and Mθ) = P m θ X), where m θ X) = { NKj log Λ Kj + N Kj β T Z Λ Kj expβ T Z) }. Much as in the eudo-likelihood case, M has θ 0 = β 0, Λ 0 ) as its unique maximum point, and β = β 0 and Λv) Λu) = Λ 0 v) Λ 0 u) a.e. with respect to µ ) The proof of consistency then proceeds along the same lines as for the eudolikelihood estimator; see Wellner and Zhang 2005) for the detailed argument. The uhot is that ˆβ n, ˆΛ n ) is almost surely consistent in the metric d 2. Proof of Theorem 3.2: We derive the rate of convergence by checking the conditions in Theorem of van der Vaart and Wellner 1996). Here we give 22

23 a detailed proof for the first part of the theorem and for the second, we point out the differences in the proof from the first. Let K θ X) = { NKj log ΛT K,j ) + N Kj β T Z ΛT K,j ) expβ T Z) } m with N Kj = NT K,j ) and M θ) = P m θ X). We have M θ 0 ) M θ) = E Z,K,T K ) ΛT K,j ) expβ T Z)h { Λ0 T K,j ) expβ0 T Z) } ΛT K,j ) expβ T. Z) Since hx) 1/4)x 1) 2 for 0 x 5, for any θ in a sufficiently small neighborhood of θ 0 M θ 0 ) M θ) 1 { 4 E Z,K,T K ) ΛT K,j ) expβ T Λ0 T K,j ) expβ0 T Z) Z) } 2 ΛT K,j ) expβ T Z) 1 C {Λu)e βt z Λ 0 u)e βt 0 z } 2 dν 1 u, z) 5.4) by C1, C2 and C6. Let gt) = Λ t U) expβ T t Z) with Λ t = tλ + 1 t)λ 0 and β t = tβ + 1 t)β 0 for 0 t 1 with U, Z) ν 1. Then ΛU) expβ T Z) Λ 0 U) expβ0 T Z) = g1) g0), and hence, by the mean value theorem, there exists a 0 ξ 1 such that g1) g0) = g ξ). Since g ξ) = expβ T ξ Z) [ Λ Λ 0 )U) + {Λ 0 + ξλ Λ 0 )} U)β β 0 ) T Z ] = expβ T ξ Z) [ Λ Λ 0 )U) { 1 + ξβ β 0 ) T Z } + β β 0 ) T ZΛ 0 U) ], from 5.4) we have { } P m θ 0 X) m θ X) C [Λ Λ0 )u) { 1 + ξβ β 0 ) T z } + β β 0 ) T zλ 0 u) ] 2 dν1 u, z) = ν 1 {g 1 h + g 2 } 2, where g 1 U, Z) β β 0 ) T ZΛ 0 U), g 2 U) = Λ Λ 0 )U), and hu, Z) = 1 + ξλ Λ 0 )U)/Λ 0 U) in the notation of Lemma 8.8, page 432, van der Vaart 2002). To 23

24 apply van der Vaart s lemma we need to bound [ν 1 g 1 g 2 )] 2 by a constant less than one times ν 1 g 2 1 )ν 1g 2 2 ). For the moment we write expectations under ν 1 as E 1. But by the Cauchy-Schwarz inequality and then computing conditionally on U we have [E 1 g 1 g 2 )] 2 = {E 1 [E 1 g 1 g 2 U)]} 2 E 1 {g 2 2}E 1 {[E 1 g 1 U)] 2 } E 1 {g 2 2}E 1 {Λ 2 0U)E 1 [{β β 0 ) T Z} 2 U]} = E 1 {g 2 2}E 1 {Λ 2 0U)E 1 [β β 0 ) T Z Z E 1 Z U)) 2 β β 0 ) U]} 1 η)e 1 {g 2 2}E 1 {Λ 2 0U)β β 0 ) T E 1 ZZ T U)β β 0 ) T } = 1 η)e 1 {g 2 2}E 1 {g 2 1}, where the last inequality follows from C13. By van der Vaart s lemma, ν 1 {g 1 h + g 2 } 2 C{ν 1 g 2 1) + ν 1 g 2 2)} = C{ β β Λ Λ 0 2 L 2 µ 1 ) } = Cd2 1θ, θ 0 ). To derive the rate of convergence, next we need to find a φ n σ) such that E sup d 1 θ,θ 0 )<σ G n m θ X) m θ 0 X)) Cφ n σ). { } We let M 1 δ θ 0) = m θ X) m θ 0 X) : d 1 θ, θ 0 ) < δ be the class of differences. We shall find an upper bound for the bracketing entropy numbers of this class. We also let F δ = {Λ F : Λ Λ 0 L2 µ 1 ) δ}. Since F δ is a class of monotone nondecreasing functions, by Theorem of van der Vaart and Wellner 1996), for any ɛ > 0, there exists a set of brackets: [Λ l 1, Λr 1 ], [Λl 2, Λr 2 ],..., [Λl q, Λ r q] with q exp M/ɛ), such that for any Λ F δ, Λ l i t) Λt) Λr i t) for all t O[T ] and some 1 i q, and {Λ r i u) Λl i u)}2 dµ 1 u) ɛ 2. Here we use the fact that µ 1 is a finite measure under our hypotheses, and hence can be normalized to be a probability measure.) For sufficiently small ɛ > 0 and δ > 0, we can construct the bracketing functions so that Λ r i t) Λl i t) γ 1 and Λ l i t) γ 2 with γ 1, γ 2 > 0 for all t O[T ] and 1 i q. Here is the proof for this claim: For any Λ F δ, the result of Lemma 7.1 implies that Λ 0 t) ɛ 1 Λt) Λ 0 t) + ɛ 1 for a sufficiently small ɛ 1 > 0 ɛ 1 can be chosen as δ/c) 2/3 in view of Lemma 7.1) and for all t O[T ]. For any 1 i q, there is a Λ F δ such that Λ r i Λ L 2 µ 1 ) ɛ and Λ Λ l i L 2 µ 1 ) ɛ, which implies that Λ r i Λ 0 L2 µ 1 ) ɛ ɛ = ɛ 2 + δ 2 ) and Λ l i Λ 0 L2 µ 1 ) ɛ. By Lemma 7.1, this yields that Λ r i t) Λ 0t) + ɛ 2 and Λ l i t) Λ 0t) ɛ 2 for a sufficient small 24

25 ɛ 2 > 0. ɛ 2 can be chosen as ɛ /C) 2/3 ) Therefore our claim is justified by letting γ 1 = 2ɛ 2 and γ 2 = Λ 0 σ) ɛ 2, in view of C8. Since β R, a compact set in R d, we can construct an ɛ-net for R, β 1, β 2,..., β p with p = [ M /ɛ d)] such that for any β R there is a s such that β T Z β T s Z ɛ and expβ T Z) expβ T s Z) Cɛ. Therefore we can construct a set of brackets for M 1 δ θ 0) as follows: where and [m l i,s m l m r X), mr i,s X)], for i = 1, 2,..., q; s = 1, 2,..., p, i,s X) = K [ N Kj log Λ l it K,j ) + N Kj β T s Z ɛ) Λ r i T K,j ) { expβ T s Z) + Cɛ }] m θ0 X) K i,s X) = [ NKj log Λ r i T K,j ) + N Kj βs T Z + ɛ) Λ l it K,j ) { expβ T s Z) Cɛ }] m θ0 X). In what follows, we show that f i,s X) 2 P,B = mr i,s X) ml i,s X) 2 P,B Cɛ2, where P,B is the Bernstein norm defined by f P,B = {2P e f 1 f ) } 1/2 see van der Vaart and Wellner, 1996, page 324). Since 2e x 1 x) x 2 e x for x 0, it follows that f 2 P,B P e f f 2). Therefore, f i,s X) 2 P,B P e f i,sx) f i,s X) 2). By writing out f i,s X) = m r i,s Since K f i,s X) N KK log Λ r i T K,j ) log Λ l it K,j ) + 2ɛ) + expβ T s Z) ) Λ r i T K,j ) Λ l it K,j ) + Cɛ X) ml i,s X), we find that ) Λ r i T K,j ) + Λ l it K,j ). log y = log x + x + ξy x)) 1 y x) for 0 < x y, some ξ [0, 1], 5.5) 25

26 we find that log Λ r i T K,j ) log Λ l it K,j ) + γ 1 2 ) Λ r i T K,j ) Λ l it K,j ) by construction of Λ l i. Hence, by C9 and our claim above, we conclude further that K log Λ r i T K,j ) log Λ l i T K,j) + 2ɛ ), K Λ r i T K,j ) Λ l i T K,j) ), and K Λ r i T K,j ) + Λ l i T K,j) ) are all uniformly bounded in O[T ]. More explicitly, taking ɛ Λ 0 σ), noting that this implies δ C2 1 Λ 0 σ)) 3/2 δ 0 with C = c 0 /24f 0 )) 1/2 by Lemma 7.1, and using the relations ɛ 2 = ɛ /C) 2/3, ɛ = ɛ 2 + δ 2 ) 1/2 δ, and ɛ Λ 0 σ), we find that log Λ r i T K,j ) log Λ l it K,j ) + 2ɛ) 2 ) 2ɛ ɛ 4k δ Λ 0 σ) ɛ 2 0 )2. Therefore, by arguing conditionally on Z, K, T K ) and using C10, f i,s X) 2 P,B P e fi,sx) f i,s X) 2) ) 2 CP log Λ r i T K,j ) log Λ l it K,j ) + 2ɛ evn KK N 2 KK + exp2β T s Z) ) 2 Λ r i T K,j ) Λ l it K,j ) + Cɛ 2. By C6, C10, and Taylor expansion for log Λ r i T K,j) at Λ l i T K,j) as shown above, we have ) 2 f i,s X) 2 P,B C E K,T K ) Λ r i T K,j ) Λ l it K,j ) + ɛ 2 Cɛ2. This shows that the total number of ɛ-brackets for M 1 δ θ 0) will be of the order M/ɛ) d e CM /ɛ) ) and hence log N [ ] ɛ, M 1 δ θ 0 ), P,B C 1/ɛ). We can similarly verify that P f θ X)) 2 Cδ 2 for any f θ X) = m θ X) m θ 0 X) M 1 δ θ 0). Hence by Lemma of van der Vaart and Wellner 1996), EP G n M 1 δ θ 0 ) C J ) [ )] [ ] δ, M 1 J [ ] δ, M 1 δ θ 0 ), P,B 1 + δ θ 0 ), P,B δ 2, n 26

27 where J [ ] δ, M 1 δ θ 0), P,B ) = δ = C ) 1 + log N [ ] ɛ, M 1 δ θ 0 ), P,B dɛ δ ɛ dɛ C ɛ 1/2 dɛ Cδ 1/2. 0 δ 0 0 Hence φ n δ) = δ 1/2 1 + δ 1/2 /δ 2 n)) = δ 1/2 + δ 1 / n. Then it is easy to see that φ n δ)/δ is a decreasing function of δ, and n 2/3 φ n n 1/3 ) = n 2/3 n 1/6 +n 1/3 n 1/2 ) = 2 n. So it follows by Theorem of van der Vaart and Wellner 1996) that ) n 1/3 d 1 ˆβ n, ˆΛ n ), β 0, Λ 0 ) = O P 1). For the maximum likelihood estimator ˆβ n, ˆΛ n ) the proof of the rate of convergence result as stated in Theorem 3.2 proceeds along the same lines as the rate result for the maximum eudo-likelihood estimator given above, but with g 1 U, V, Z) β β 0 ) T Z Λ 0 U, V ), g 2 U, V ) = Λ Λ 0 )U, V ), and hu, V, Z) = 1+ξ Λ Λ 0 )U, V )/ Λ 0 U, V ) in the application of van der Vaart s lemma 8.8. For details see Wellner and Zhang 2005). Proof of Theorem 3.3: We give a detailed proof for the first part of the theorem, and only outline the differences in the proof for the second. the theorem by checking the conditions A1-A6 of Theorem 6.1. We prove Note that A1 holds with γ = 1/3 because of the rate of convergence given in Theorem 3.2. The criterion function with only one observation is given by m β, Λ; X) = K {N Kj log Λ Kj + N Kj β T Z e βt Z Λ Kj }, and thus we have m K 1 β, Λ; X) = ZN Kj ΛT K,j ) expβ T Z)) K m 2 β, Λ; X)[h] = ) NKj expβ T Z) h Kj Λ Kj m K 11 β, Λ; X)[h] = Λ Kj ZZ T expβ T Z) K m 12 β, Λ; X)[h] = m 21 β, Λ; X)[h] = Z expβ T Z)h Kj 27

28 and K m 22 β, Λ; X)[h, h] = N Kj Λ 2 h Kj h Kj, Kj where Λ Kj = ΛT K,j ) and h Kj = T K,j 0 ht)dλt) for h L 2 Λ). A2 automatically holds by the model assumption 1.1). For A3, we need to find a h such that Ṡ 12 β 0, Λ 0 )[h] Ṡ 22 β 0, Λ 0 )[h, h] for all h L 2 Λ 0 ). Note that = P {m 12 β 0, Λ 0 ; X)[h] m 22 β 0, Λ 0 ; X)[h, h]} = 0, P {m 12 β 0, Λ 0 ; X)[h] m 22 β 0, Λ 0 ; X)[h, h]} [ = E Ze βt 0 Z [ = E K,T K,Z) Therefore, an obvious choice of h is Hence N Kj Λ 0Kj ) 2 h Kj ] h Kj Ze βt 0 Z eβt 0 Z h Kj Λ 0Kj ] h Kj. h Kj = Λ 0Kj EZe βt 0 Z K, T K,j )/Ee βt 0 Z K, T K,j ) Λ 0Kj R K, T K,j ). m β 0, Λ 0 ; X) = m 1 β 0, Λ 0 ; X) m 2 β 0, Λ 0 ; X)[h ] { ) ) } = Z N Kj e βt 0 Z NKj Λ 0Kj e βt 0 Z Λ 0Kj R K, T K,j ) Λ 0Kj = N Kj e βt 0 Z Λ 0Kj ) [Z R K, T K,j )], A = Ṡ 11 β 0, Λ 0 ) + Ṡ 21 β 0, Λ 0 )[h ] [ ] = E Λ 0Kj e βt 0 Z ZZ T e βt 0 Z Λ 0Kj R K, T K,j ) = E K,T K,Z) Λ 0Kj e βt 0 Z [Z R K, T K,j )] Z T = E K,T K,Z) Λ 0Kj e βt 0 Z [Z R K, T K,j )] 2, 28

29 and with B = Em β 0, Λ 0 ; X) 2 = E K,T K,Z) C j,j Z) [Z R K, T K,j )] [Z R K, T K,j )] T, C j,j Z) = E j,j =1 ) ) ] [N Kj e βt 0 Z Λ 0Kj N Kj e βt 0 Z Λ 0Kj Z, K, T K,j, T K,j. To verify A4, we note that the first part automatically holds, because S 1n ˆβ n, ˆΛ n ) = P n m 1 ˆβ n, ˆΛ n ; X) = 0 since ˆβ n satisfies the eudo-score equation. Next we shall show that ˆΛ S 2n ˆβ n, n )[h ] 5.6) = P n 1 { } T ˆΛ N Kj ˆΛ exp ˆβ n Z) Λ 0Kj R K, T K,j ) = o P n 1/2 ) with ˆΛ = ˆΛ n T K,j ). Since ˆβ n, ˆΛ n ) maximizes P n m θ X) over the feasible region, consider a path θ ɛ = ˆβ n, ˆΛ n + ɛh) for h F. Then d lim ɛ 0 dɛ P nm θ ɛ X) = P n 1 { } T ˆΛ N Kj ˆΛ exp ˆβ n Z) h Kj = 0. ˆΛ Now choose h Kj = EZ expβt 0 Z) K, T K,j)/Eexpβ0 T Z) K, T K,j). demonstrate 5.6), it suffices to show that I = P n 1 { } T ˆΛ N Kj ˆΛ exp ˆβ n Z) Λ 0Kj ˆΛ = o P n 1/2 ), )α Kj Then to where α Kj = EZ expβ T 0 Z) K, T K,j)/Eexpβ T 0 Z) K, T K,j). But I can be decomposed as I = I 1 I 2 + I 3, where N Kj I 1 = P n P ) ˆΛ Λ 0Kj ˆΛ 29 )α Kj,

30 and I 3 = P T I 2 = P n P ) exp ˆβ n Z)Λ 0Kj ˆΛ )α Kj 1 { } T ˆΛ N Kj ˆΛ exp ˆβ n Z) Λ 0Kj ˆΛ )α Kj. We show that I 1, I 2 and I 3 are all o P n 1/2 ). Let φ 1 X; Λ) = φ 2 X; β, Λ) = N Kj Λ Kj Λ 0Kj Λ Kj )α Kj, expβ T Z)Λ 0Kj Λ Kj )α Kj, and define two classes Φ 1 η) and Φ 2 η) as follows: and Φ 1 η) = { φ 1 : Λ F and Λ Λ 0 L2 µ 1 ) η } Φ 2 η) = {φ 2 : β, Λ) R F and d 1 β, Λ), β 0, Λ 0 )) η}. Using the same bracketing entropy arguments as used in deriving the rate of convergence, it follows that both Φ 1 η) and Φ 2 η) are P -Donsker classes under conditions C1, C6 and C8. Moreover, for the seminorm ρ P f) = { P f P f) 2} 1/2, under conditions C1, C6, C8 and C9, we have sup φ1 Φ 1 η) ρ P φ 1 ) 0 and sup φ2 Φ 2 η) ρ P φ 2 ) 0 if η 0. Due to the relationship between P -Donsker and asymptotic equicontinuity see Corollary of van der Vaart and Wellner 1996)), this yields I 1 = o P n 1/2 ) and I 2 = o P n 1/2 ). For I 3, we have I 3 = P = E = E { } N Kj ˆΛ exp ˆβ n Z) ˆΛ Λ 0Kj { Λ 0Kj expβ0 T T Z) ˆΛ exp ˆβ n Z) ˆΛ ˆΛ )α Kj } Λ 0Kj { Λ 0Kj ˆΛ )eβt 0 Z + ˆΛ eβt 0 Z e βt Z ) { } Cd 2 1 ˆβ n, ˆΛ n ), β 0, Λ 0 ), ˆΛ 30 } ˆΛ )α Kj Λ 0Kj ˆΛ )α Kj

arxiv:math/ v2 [math.st] 5 Dec 2007

arxiv:math/ v2 [math.st] 5 Dec 2007 The Annals of Statistics 2007, Vol. 35, No. 5, 2106 2142 DOI: 10.1214/009053607000000181 c Institute of Mathematical Statistics, 2007 arxiv:math/0509132v2 [math.st] 5 Dec 2007 TWO LIKELIHOOD-BASED SEMIPARAMETRIC

More information

Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines

Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines By MINGGEN LU, YING ZHANG Department of Biostatistics, The University of Iowa, 200 Hawkins Drive, C22 GH Iowa City,

More information

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University

More information

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient?

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient? A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood stimators Become Badly Inefficient? Jon A. Wellner 1, Ying Zhang 2, and Hao Liu Department of Statistics, University of

More information

Panel Count Data Regression with Informative Observation Times

Panel Count Data Regression with Informative Observation Times UW Biostatistics Working Paper Series 3-16-2010 Panel Count Data Regression with Informative Observation Times Petra Buzkova University of Washington, buzkova@u.washington.edu Suggested Citation Buzkova,

More information

Spline-based sieve semiparametric generalized estimating equation for panel count data

Spline-based sieve semiparametric generalized estimating equation for panel count data University of Iowa Iowa Research Online Theses and Dissertations Spring 2010 Spline-based sieve semiparametric generalized estimating equation for panel count data Lei Hua University of Iowa Copyright

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida

of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida Two Estimators of the Mean of acounting Process with Panel Count Data Jon A. Wellner 1 and Ying Zhang 2 University of Washington and University of Central Florida November 1, 1998 Abstract We study two

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient?

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient? A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood stimators Become Badly Inefficient? Jon A. Wellner 1, Ying Zhang 2, and Hao Liu 3 1 Statistics, University of Washington

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0534.R2 Title A NONPARAMETRIC REGRESSION MODEL FOR PANEL COUNT DATA ANALYSIS Manuscript ID SS-2016-0534.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0534

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality Communications for Statistical Applications and Methods 014, Vol. 1, No. 3, 13 4 DOI: http://dx.doi.org/10.5351/csam.014.1.3.13 ISSN 87-7843 An Empirical Characteristic Function Approach to Selecting a

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models doi: 10.1111/j.1467-9469.2005.00487.x Published by Blacwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 33: 1 23, 2006 Ran Regression Analysis

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Large sample theory for merged data from multiple sources

Large sample theory for merged data from multiple sources Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Appendix. Proof of Theorem 1. Define. [ ˆΛ 0(D) ˆΛ 0(t) ˆΛ (t) ˆΛ. (0) t. X 0 n(t) = D t. and. 0(t) ˆΛ 0(0) g(t(d t)), 0 < t < D, t.

Appendix. Proof of Theorem 1. Define. [ ˆΛ 0(D) ˆΛ 0(t) ˆΛ (t) ˆΛ. (0) t. X 0 n(t) = D t. and. 0(t) ˆΛ 0(0) g(t(d t)), 0 < t < D, t. Appendix Proof of Theorem. Define [ ˆΛ (D) X n (t) = ˆΛ (t) D t ˆΛ (t) ˆΛ () g(t(d t)), t < t < D X n(t) = [ ˆΛ (D) ˆΛ (t) D t ˆΛ (t) ˆΛ () g(t(d t)), < t < D, t where ˆΛ (t) = log[exp( ˆΛ(t)) + ˆp/ˆp,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Modelling and Analysis of Recurrent Event Data

Modelling and Analysis of Recurrent Event Data Modelling and Analysis of Recurrent Event Data Edsel A. Peña Department of Statistics University of South Carolina Research support from NIH, NSF, and USC/MUSC Collaborative Grants Joint work with Prof.

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:

More information

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.

More information

Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data

Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data University of South Carolina Scholar Commons Theses and Dissertations 2016 Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data Bin Yao University of South Carolina

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information