A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models

Size: px
Start display at page:

Download "A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models"

Transcription

1 A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University of Iowa May 5, 2008 Abstract A method is proposed for consistent information estimation in a class of semiparametric models. The method is based on the geometric interpretation of the efficient score function, that it is the residual of the orthogonal projection of the score function for the finite-dimensional parameter onto the tangent space for the infinitedimensional parameter. The empirical version of this projection is a least-squares nonparametric regression problem. Under appropriate conditions, the sum of squared residuals of this regression is shown to be a consistent estimator of the efficient Fisher information and is actually the observed information for a class of sieve maximum likelihood estimators. Simulations studies are conducted to evaluate finite sample performance of the estimator in two illustrating examples: Poisson proportional mean model for panel count data and Cox model for interval-censored data. Finally the method is applied to two real-life examples: bladder tumor study and breast cosmesis study. Key words and phrases. counting process, Cox model, empirical processes, information, interval censoring, maximum (profile) likelihood, monotone polynomial splines, panel count data, proportional hazards model, semiparametric model 1

2 1. Introduction In a regular parametric model, the maximum likelihood estimator (MLE) is asymptotically normal with variance equal to the inverse of the Fisher information, and the Fisher information can be estimated by the observed information. This result provides large sample justification for the use of normal approximation to the distribution of MLE. An important factor making this approximation useful in statistical inference is that the observed information can be readily computed and is consistent. In many situations, consistency of the observed information follows directly from the law of large numbers and consistency of MLE. Asymptotic normality of MLE of the regular parameters continues to hold in many semiparametric and nonparametric models. See for example, Chen (1988, 1995), Chang (1990), Geskus and Groeneboom (1996), Gill (1989), Groeneboom (1996), Groeneboom and Wellner (1992), Gu and Zhang (1993), Murphy (1995), Murphy, Rossini and van der Vaart (1997), Qin and Lawless (1993), Severini and Wong (1991), van der Laan (1993), van der Vaart (1994, 1996), Wong and Severini (1991), Huang (1996), Huang and Rossini (1997) and Wellner and Zhang (2007). In all these examples, the MLE or a smooth functional of the MLE is asymptotically normal with variance equal to the inverse of the efficient Fisher information. The asymptotic normality and efficiency results provide insight to the theoretical properties of maximum likelihood estimators. Unfortunately, in many semiparametric models studied in the aforementioned articles, the efficient Fisher information is either very complicated or may not have an explicit expression and they often can not be estimated directly as in the case of parametric MLE. Thus effort is required to estimate the asymptotic variance in order to apply these results to statistical inference. In this paper, we consider consistent information estimation in a class of semiparametric 2

3 models that are parametrized in terms of a finite-dimensional parameter θ and a parameter φ whose dimension increases with sample size. Hence φ is often called an infinite-dimensional parameter. Two important examples are Cox s (1972) proportional hazards model for interval-censored data studied by Huang and Wellner (1995) and the proportional mean model for panel count data proposed by Sun and Wei (2000) and Wellner and Zhang (2007). In these two examples, θ is the finite-dimensional regression coefficient, φ is the log of baseline hazard function or the log of baseline mean function. At least two methods used in parametric models for estimating the variance have been suggested in semiparametric models. The first method is to use the inverse observed information based on the likelihood, correcting for the presence of the infinite-dimensional parameter φ. However, when φ cannot be estimated at the usual root-n rate, consistency of this estimator has not been proved in general. The second method is to use the second derivative of the profile likelihood of θ at the maximum likelihood estimate. For a fixed value of θ, the profile likelihood is the maximum of the likelihood with respect to φ. Because the profile likelihood often can only be computed numerically, discretized versions of its second derivative are proposed by Nielson, Gill, Andersen and Sorensen (1992), and used by Huang and Wellner (1995) and Murphy and van der Vaart (1996). Using an ingenious sandwich inequality, Murphy and van der Vaart (2000) showed that the discretized versions of the second derivative of profile likelihood provides consistent variance estimator in a large class of semiparametric models. Murphy and van der Vaart (1999) also proved that the profile likelihood resembles the ordinary likelihood in many essential aspects. When the dimension of θ is one or two, it is relatively easy to compute and visualize the profile likelihood. For moderate to high dimensional θ, implementation of the profile likelihood approach may be computationally demanding and 3

4 sometimes difficult. In a general semiparametric maximum likelihood estimation problem, the joint estimation of (θ, φ) is often a quite challenging problem numerically. Sieve semiparametric M-estimation using polynomial splines proposed by Lu, Zhang and Huang (2008) for panel count data is shown numerically efficient. However, the estimation of standard error for the M-estimator of regression parameter still remains a big task and alternatively, they used the bootstrap standard error by taking the computation advantage in spline-based sieve M-estimation. In this article, we propose a least-squares approach to consistent estimation of the information matrix of the semiparametric maximum likelihood estimator of θ. The proposed method is based on the geometric interpretation of the efficient score function, that it is the residual of the projection of the score function for θ onto the tangent space for φ (van der Vaart, 1991, Bickel, Klaassen, Ritov and Wellner, 1993, Chapter 3). Thus the theoretical information calculation is a least-squares problem in a Hilbert space. When a sample from the model is available, this theoretical information can be estimated by its empirical version. It turns out that this empirical version is essentially a least-squares nonparametric regression problem, due to the fact that the score function for the infinite-dimensional parameter is a linear operator. In this nonparametric regression problem, the response is the score function for the finite-dimensional parameter θ, the covariate is the linear score operator for the infinite-dimensional parameter φ, and the regression parameter is the least favorable direction which is used to define the efficient score. Computationally, the proposed method can be implemented with a least-squares nonparametric regression fitting program. In a class of sieve MLE s using a linear approximation space, the proposed estimator of information matrix is shown to be the same as the inverse of the observed information matrix for the sieve MLE. This equivalence is useful from both the theoretical and the computational 4

5 point of view. First, this equivalence enables a simple and indirect consistency proof of the observed information matrix in the semiparametric setting. Second, it provides two ways of computing the observed information matrix: one can either directly compute the observed information matrix or fit a least-squares nonparametric regression. Because of its numerical convenience and good theoretical properties, the class of sieve MLE s using polynomial splines is utilized in our numerical demonstration and is recommended for applications of general semiparametric estimation. The paper is organized as follows. Section 2 describes the motivation and the leastsquares approach. Section 3 specializes the general approach to a class of sieve MLE s. Section 4 applies the proposed method along with the spline-based sieve MLE to two models, semiparametric Poisson mean model for panel count data and Cox proportional hazards model for interval censored data, studied in Wellner and Zhang (2007) and Huang and Wellner (1995), respectively. Section 5 renders numerical results via simulations and applications in real-life examples for the models discussed in Section 4. Section 6 concludes with some discussions. Some technical details are included in appendices. 2. The Least-Squares Approach Let X 1,..., X n be independent random variables with a common probability measure P θ,φ, where (θ, φ) Θ Φ. Here Θ is a subset of R d and Φ is a general space. Assume that P θ,φ has a density p( ; θ, φ) with respect to a σ-finite measure. Denote τ = (θ, φ) and let τ 0 = (θ 0, φ 0 ) Θ Φ be the true parameter value under which the data are generated. The maximum likelihood estimator of τ 0 is the value ˆτ n (ˆθ n, ˆφ n ) that maximizes the 5

6 log-likelihood l n (τ) = n log p(x i ; θ, φ) i=1 over the parameter space T Θ Φ. Let be the Euclidean distance of R d, and Φ be an appropriate norm defined on Φ. We assume it has been shown that (2.1) ˆτ n τ 0 T { ˆθ n θ ˆφ } 1/2 n φ 0 2 Φ = Op (rn 1 ), where r n is a sequence of numbers converging to infinity. Consistency and rate of convergence in nonparametric and semiparametric models have been addressed by many authors, see for example, Birgé and Massart (1993), van der Geer (1993), Shen and Wong (1995), and van der Vaart and Wellner (1996). The results and methods developed by these authors can often be used to verify (2.1). The motivation to study consistent information estimation is the following. In many semiparametric models, in addition to (2.1), it can be shown that (2.2) n 1/2 (ˆθn θ 0 ) d N ( 0, I 1 (θ 0 ) ), where I(θ 0 ) is the efficient Fisher information for θ 0, adjusting for the presence of nuisance parameter φ. The definition of I(θ 0 ) is given below. This holds for models cited in the previous section and for the examples in Section 4. Thus estimation of the asymptotic variance of ˆθ n is equivalent to estimation of I(θ 0 ) provided I(θ 0 ) is nonsingular. Of course, for the problem of estimating I(θ 0 ) to be meaningful, we need to first establish (2.2). The calculation of I(θ 0 ) and its central role in asymptotic efficiency theory for semiparametric models have been systematically studied by Begun, Hall, Huang and Wellner 6

7 (1983), van der Vaart (1991), and BKRW (1993) and the references therein. In the following, We first briefly describe how the information I(θ 0 ) is defined in a general semiparametric MLE setting in order to motivate the proposed information estimator. Let l(θ, φ; x) = log p(x; θ, φ) be the log-likelihood for a sample of size one. Consider a parametric smooth submodel with parameter (θ, φ (s) ), where φ (0) = φ and φ (s) s = h. s=0 Let H be the class of functions h defined by this equation. Usually, H is a Hilbert space. The score operator for φ is (2.3) l 2 (τ; x)(h) = s l(θ, φ (s); x). s=0 Observe that l 2 is a linear operator mapping H to L 2 (P θ,φ ). So for constants c 1, c 2 and h 1, h 2 H, (2.4) l 2 (τ; x)(c 1 h 1 + c 2 h 2 ) = c 1 l2 (τ; x)(h 1 ) + c 2 l2 (τ; x)(h 2 ). The linearity of l 2 is crucial to the proposed method. For a d-dimensional θ, l 1 (τ; x) is the vector of partial derivatives of l(τ; x) with respect to θ. For each component of l 1, a score operator for φ is defined as in (2.3). So the score operator for φ corresponding to l 1 is defined as (2.5) l 2 (τ; x)(h) ( l 2 (τ; x)(h 1 ),..., l 2 (τ; x)(h d )), 7

8 where h (h 1,..., h d ) with h k H, 1 k d. Let Ṗ2 be the closed linear span of { l 2 (h) : h H}. Then the efficient score function for the kth component of θ is l 1,k Π( l 1,k Ṗ2), where l 1,k is the kth component of l 1 (τ; x) and Π( l 1,k Ṗ2) is the projection of l 1,k onto Ṗ2. Equivalently, Π( l 1,k Ṗ2) is the minimizer of the squared residual P [ l 1,k (τ 0 ; X) η] 2 over η Ṗ2. See for example, van der Vaart (1991), Section 6, and BKRW (1993), Theorem 1, page 70. In general, η may not be a score function for φ, that is, there may not exist a h H such that η = l 2 (h). However, in models with regularity conditions, η either is a score function or can be approximated by a score function. So we assume that the efficient score vector for θ is l 1 (τ; x) l 2 (τ; x)(ξ 0 ), where ξ 0 is an element of H d that minimizes (2.6) ρ(h) E l 1 (τ; X) l 2 (τ; X)(h) 2 over H d. The minimizer ξ 0 = (ξ 01, ξ 02,..., ξ 0d ) is called the least favorable direction. Denote the efficient score by l (τ; x) l 1 (τ; x) l 2 (τ; x)(ξ 0 ). Then the information for θ is (2.7) I(θ) = E l (τ; X) 2 = E l 1 (τ; X) l 2 (τ; X)(ξ 0 ) 2. Therefore, to estimate I(θ), it is natural to consider minimizing an empirical version of (2.6). In particular, with the random sample X 1,..., X n and the consistent estimator τ n, we can estimate I(θ) by the minimum value of (2.8) ρ n (h) n 1 n l 1 ( τ n ; X i ) l 2 ( τ n ; X i )(h) 2 i=1 over H d. That is, if ξ n is a minimizer of ρ n over H d, then a natural estimator of I(θ 0 ) 8

9 is În ρ n ( ξ n ). Because l 2 is a linear operator, this minimization problem is essentially a least-squares nonparametric regression problem and it can be solved for each component separately. In the next section, we show that, in a class of sieve MLE s, the minimum value ρ n ( ξ n ) is actually the observed information based on the outer product of the first derivatives of the log-likelihood. In the following, we state a simple proposition which provides sufficient conditions ensuring consistency of the estimated least favorable direction and În as an estimator of I(θ 0 ). This proposition appears to be useful in a large class of models and is easy to apply. As in nonparametric regression, if the space H is too large, minimization over this space may not yield consistent estimators of ξ 0 and I(θ 0 ). We can use an approximation space H n (a sieve) which is smaller than H and converges to H as n tends to infinity. Under appropriate conditions, minimization of ρ n over H n will yield a consistent estimator of ξ 0. On the other hand, if minimizing over H does yield consistent estimators, then H n can be taken to be H. For simplicity and without loss of generality, it suffices to consider the case of onedimensional θ. In the following and throughout, we will use the linear functional notations for integrals. So for any probability measure Q, Qf = fdq as long as the integral is well defined. Below, P = P θ0,φ 0. P n is the empirical measure of X i, 1 i n. So P n f = n 1 n i=1 f(x i). Proposition 2.1. Denote ξ n = argmin Hn ρ n (h) and l(τ, h; x) = [ l 1 (τ; x) l 2 (τ; x)(h)] 2. If the class of functions I = {l(x, τ, h) : τ T, h H} is Glivenko-Cantelli and τ n p τ 0, then The proof is given is Appendix A. ρ n ( ξ n ) p I(θ 0 ). 9

10 3. Observed Information in Sieve MLE We now apply the method described in Section 2 to a class of sieve MLE s. We show that, if the parameter space Φ and the space H can be approximated by a common approximation space, and if this space has a basis, then the least-squares calculation in Section 2 yields the observed information matrix. In other words, computation of the observed information matrix is equivalent to solving the least-squares problem of Section 2. So there is no need to actually carry out the least-squares computation when the observed information can be computed as in the ordinary setting of parametric estimation. This is computationally convenient, because the observed information matrix is based on either the first derivatives or the second derivatives of the log-likelihood function and these derivatives are often already computed in a numerical algorithm for computing the MLE of unknown regression parameters. On the other hand, for problems in which direct computation of the observed information matrix is difficult, one can instead solve the least-squares nonparametric regression problem to obtain the observed information matrix. These nonparametric regression problems can be solved by using standard least-squares fitting programs for linear regression. As in finite-dimensional parametric models, some regularity conditions are required for the MLE θ n to be root-n consistent and asymptotically normal. These regularity conditions usually include certain smoothness assumptions on the infinite-dimensional parameter φ and the underlying probability model. Consequently, the least favorable direction will be a smooth function such as a bounded Lipschitz function. Then we can take H to be the class of such smooth functions. From the approximation theory, many spaces designed for efficient computation can be used to approximate any element in H with arbitrary precision under appropriately defined distance. For example, we may use the space of polynomial spline 10

11 functions (Schumaker, 1981). This class not only has good approximation power, but is also computationally convenient. We will use this approximation space in the next section. Let Φ n be an approximation space for both Φ and H. Suppose it has a set of basis functions B n = (b 1,..., b qn ), such that every φ Φ n can be represented as φ = q n j=1 β jb j B nβ, where β = (β 1,..., β qn ) B n R qn is a vector of real numbers. So every φ Φ n can be identified with a vector β B n. Here the dimension q n is a positive integer depending on sample size n. To ensure consistency of τ n, we need q n as n. In general, for θ n to be asymptotically normal, we need to control the growth rate of q n appropriately. The sieve MLE of τ 0 = (θ 0, φ 0 ) is defined to be the ( θ n, φ n ) that maximizes the loglikelihood l n (θ, φ) over Θ Φ n. Equivalently, one can find ( θ n, β n ) that maximizes l n (θ, B nβ) over Θ B n. Then φ n = B n β n. Now consider estimation of I(θ 0 ). First we introduce some convenient notations. Denote l 2 ( τ n ; x)(b n ) = ( l 2 ( τ n ; x)(b 1 ),..., l 2 ( τ n ; x)(b qn )) T, and ) 2 A 11 = P n ( l1 ( τ n ; X), A12 = P n l1 ( τ n ; X) l 2 T ( τ n ; X)(B n ), A 21 = A T 12, A 22 = P n ( l2 ( τ n ; X)(B n )) 2, where a 2 = aa T. The outer product version of the observed information for θ is given by (3.1) Ô n = A 11 A 12 A 22A 21. Here for any matrix A, A denotes its generalized inverse. Although A 22 may not be unique, A 12 A 22A 21 is unique by the results on the generalized inverse, see for example, 11

12 Rao (1973), Chapter 1, result 1b.5 (vii), page 26. The use of the generalized inverse in (3.1) is for generality of these formulas. When the nonparametric component φ is a smooth function, then for any fixed sample size, the sieve MLE is obtained over a finite-dimensional approximation space whose (theoretical) dimension is O(n ν ), where ν is typically less than 1/2, A 22 is usually invertible. We now show that Ôn equals În, which is the minimum value of ρ n (h) = P n l 1 ( τ n ; X) l 2 ( τ n ; X)(h) 2 over h Φ n for h = (h 1, h 2,, h d ). Write h j = B nc j for j = 1, 2,, d. Let Y i,j = l 1,j ( τ n ; X i ), the jth component of Y i = l 1 ( τ n ; X i ) and Z i = l 2 ( τ n ; X i )(B n ). This minimization problem becomes a least-squares problem of finding ĉ n = (ĉ n,1, ĉ n,2,, ĉ n,d ) that minimizes [ d ] P n (Y i,j Z ic j ) 2. j=1 By standard least-squares calculation, ( n ) ( n ) ĉ n,j = Z i Z i Z i Y i,j i=1 i=1 for j = 1, 2,, n. Hence, we have Î n = ρ n (ˆξ n ) = P n [ l1 ( τ n ; X) l ] 2 2 ( τ n ; X)( ξ n ) ] 2 = P n [ l1 ( τ n ; X) (ĉ n,1, ĉ n,2,, ĉ n,d ) l2 ( τ n ; X)(B n ) ( n ) ( n ) = P n [ l 1 ( τ n ; X) Y i Z i Z i Z i l 2 ( τ n ; X)(B n ) i=1 12 i=1 ] 2

13 = P n [ l1 ( τ n ; X) A 12 A l ] ( τ n ; X)(B n ) = A 11 A 12 A 22A 21 A 12 A 22A 21 + A 12 A 22A 22 A 22A 21 = A 11 A 12 A 22A 21 = Ôn We summarize the above calculation in the following proposition. Proposition 3.1. Assume that there exist β n B n and c n,j R q n for j = 1, 2,, d such that (3.2) B nβ n φ 0 Φ = O(k 1 1n ), and B nc n,j ξ 0,j Φ = O(k 1 2n ), j = 1, 2, d, where k 1n and k 2n are two sequences of numbers satisfying k 1n and k 2n as n. Suppose that the conditions of Proposition 2.1 are satisfied. Then the observed information matrix Ôn defined in (3.1) is a consistent estimator of I(θ 0 ). 4. Examples In this section, we illustrate the proposed method in two semiparametric regression models, including Poisson proportional mean model for panel count data studied in Wellner and Zhang (2007) and Cox model (1972) for interval-censored data studied in Huang and Wellner (1995). In these examples, the parameter space Φ will be the space of smooth functions defined below. The sieve Φ n is the space of polynomial splines. The polynomial splines have been used in many fully nonparametric regression models, see for example, Stone (1985, 1986). 13

14 Let a = d 0 < d 1 < < d Kn < d Kn +1 = b be a partition of [a, b] into K n subintervals I Kt = [d t, d t+1 ), t = 0,..., K 1 and I KK = [d K, d K+1 ], where K K n n v is a positive integer such that max 1 k K+1 d k d k 1 = O(n v ). Denote the set of partition points by D n = {d 1,..., d Kn }. Let S n (D n, K n, m) be the space of polynomial splines of order m 1 consisting of functions s satisfying: (i) the restriction of s to I Kt is a polynomial of order m for m K; (ii) for m 2 and 0 m m 2, s is m times continuously differentiable on [a, b]. This definition is phrased after Stone (1985), which is a descriptive version of Schumaker (1981), page 108, Definition 4.1. According to Schumaker (1981), page 117, Corollary 4.10, there exists a local basis B n {b t, 1 t q n }, so called B-splines, for S n (D n, K n, m), where q n K n + m. These basis functions are nonnegative and sum up to one at each point in [a, b], and each b t is zero outside the interval [d t, d t+m ] Poisson Proportional Mean Model for Panel Count Data Let {N(t) : t 0} be a univariate counting process. K is the total number of observations on the counting process and T = (T K,1,, T K,K ) is a sequence of random observation times with 0 < T K,1 < < T K,K. The counting process is only observed at those times with the cumulative events denoted by N = {N(T K,1 ),, N(T K,K )}. This type of data is referred to to panel count data by Sun and Kalbfleish (1995). In this manuscript, we assume that (K, T ) is conditionally independent of N given a vector of covariates Z and we denote the observed data consisting of independent and identically distributed X 1,, X n, where X i = (K i, T (i), N (i), Z i ) with T (i) = (T (i) K i,1,, T (i) K i,k i ) and N (i) = (N (i) (T (i) K i,1 ),, N(i) (T (i) K i,k i )), for i = 1,, n. Panel count data are often seen in clinical trials, social demographic and industrial reliability studies. Sun and Wei (2000) and Zhang (2002) proposed the proportional mean 14

15 model (4.1) Λ(t Z) = Λ 0 (t) exp(θ 0Z) to analyze panel count data semiparametrically, where Λ(t Z) = E(N(t) Z) is the expected cumulative events observed at time t, conditional on Z with the true baseline mean function given by Λ 0 (t). Wellner and Zhang (2007) proposed a nonhomogeneous Poisson process with the conditional mean function given by (4.1) to study the MLE of τ 0 = (θ 0, Λ 0 (t)). The log likelihood for (θ, Λ(t)) under the Poisson proportional mean model is given by l n (θ, Λ) = n K i i=1 j=1 [ ] N (i) K i,j log Λ K i,j + N (i) K i,j θ Z i exp(θ Z i ) Λ Ki,j, where N (i) K i,j = N(i) (T (i) K i,j ) N(i) (T (i) K i,j 1 ) and Λ Ki,j = Λ(T (i) (i) K i,j) Λ(T K i,j 1 ), for j = 1, 2,, K. To study the asymptotic properties of the MLE, Wellner and Zhang (2007) defined the following L 2 -norm, d(τ 1, τ 2 ) = { θ 1 θ } 1/2 Λ 1 (t) Λ 2 (t) 2 dµ 1 (t) with k µ 1 (t) = P (K = k Z = z) P (T K,j t K = k, Z = z)df (z). R d k=1 j=1 15

16 They showed that the semiparametric MLE, τ n = (ˆθ n, ˆΛ n ) converges to the true parameters τ 0 = (θ 0, Λ 0 ) (under some mild regularity conditions) in a rate lower than n 1/2, i.e. d( τ n, τ 0 ) = O p (n 1/3 ), however, the MLE of θ 0 is still semiparametric efficient, that is n(ˆθn θ 0 ) d N ( 0, I 1 (θ 0 ) ) with the Fisher information matrix given by I(θ 0 ) = E [ ] { Λ 0 (T K,j )e θ 0 Z Z E(Zeθ 0 Z } K, T K,j 1, T K,j ). E(e θ 0 Z K, T K,j 1, T K,j ) The computation of the semiparametric MLE is very time consuming as it requires the joint estimation of θ 0 and the infinite dimensional parameter Λ 0. Although the Fisher information has a nice explicit form, there is no easy method available to calculate the observed information. Lu (2007) studied the sieve MLE for the above semiparametric Poisson model using monotone polynomial splines. Let B n = (b 1,..., b qn ) be the basis of B-splines defined earlier. The monotone polynomial spline space is defined to be M n (D n, K n, m) = { φ n : φ n (t) = q n j=1 β j b j (t) S n (D n, K n, m), β B n, t [a, b] }. where B n = {β : β 1 β 2 β qn }. Each element of M n (D n, K n, m) is a nondecreasing function because of the monotonicity constraints on β 1,..., β qn. This fact is a consequence of the variation diminishing properties of B-splines. See for instance, Schumaker (1981), Example 4.75 and Theorem 4.76, pages Replaying Λ(t) by the exp( q n l=1 β lb j (t)) in the likelihood above, Lu (2007) solved the constraint optimization problem over the space 16

17 Θ B n. It turns out that, compared to Wellner and Zhang s method, the B-splines sieve MLE is much less computational demanding (Lu, Zhang and Huang, 2008) with a better overall convergence rate. In addition, the sieve MLE of θ is still semiparametric efficient with the same Fisher information matrix, I(θ 0 ). The bootstrap procedure was implemented to estimate I(θ 0 ) consistently by Lu, Huang and Zhang (2008) due to the computation advantage of the B-splines sieve MLE method. A straightforward algebra leads that l 1 (τ; x) l 2 (τ; x)(h) = K j=1 ( ) ( NK,j exp(β T Z) Λ Kj Z h ) Kj Λ Kj Under the regularity conditions given in Wellner and Zhang (2007), following the empirical process arguments used in Lu (2007) for monotone B-splines estimation, it can be easily shown that the class { l 1 (τ; x) l 2 (τ; x)(h) : τ T, h H} is Glivenko-Canteli, by showing the bracketing number with L 1 (P ) norm of this class is bounded. This will imply that {( l 1 (τ; x) l 2 (τ; x)(h)) 2 : τ T, h H} is Glivenko-Canteli as well, based on the given regularity conditions. Moreover, using monotone cubic B-splines in the sieve estimation automatically gives B nβ n φ 0 Φ = O(n pν ) and B nc n ξ 0 Φ = O(n pν ) due to Corollary 6.20 of Schumaker (1981). Hence the Fisher information I(θ 0 ) in semiparametric proportional mean model for panel count data can be consistently estimated using the proposed least-squares approach. 17

18 4.2. Cox Model for Interval-Censored Data Interval-censored data occur very frequently in long-term follow-up study for an event time of interest. The exact event time T is not observable; it is only known with certainty that T is bracketed between two adjacent examination times, or occurs before the first or after the last follow-up examination. Let (L, R) be the pair of examination times bracketing the event time T. That is, L is the last examination time before and R is the first examination time after the event. If 0 < L < R <, then T is interval-censored. If the event occurs before the first examination, then T is left-censored. If the event has not occurred after last examination, then T is right-censored. Such data is called case 2 interval-censored data. Nonparametric estimation of a distribution function and its smooth functionals with interval-censored data have been studied by Groeneboom and Wellner (1992), Groeneboom (1996), and Geskus and Groeneboom (1996). In this example, we consider the B-splines sieve MLE of the Cox proportional hazards model for the interval-censored data. With the proportional hazards model, the conditional hazard of T given a covariate vector Z R d is proportional to the baseline hazard. In terms of cumulative hazard functions, this model is (4.2) Λ(t z) = Λ 0 (t)e θ 0 z, where θ 0 is a d-dimensional regression parameter and Λ 0 is the unspecified baseline cumulative hazard function. Denote the two censoring variables by U and V, where P (U V ) = 1. Let G be the joint distribution function of (U, V ). Let δ 1 = 1 [T U], δ 2 = 1 [U<T V ] and δ 3 = 1 δ 1 δ 2. Assume that conditional on Z, T is independent of (U, V ), and that the joint distribution of 18

19 (U, V ) and Z does not depend on the parameters of interest. Then the density function of X (δ 1, δ 2, U, V, Z) with respect to the product of the counting measure on {0, 1} {0, 1} and the probability measure induced by the distribution of (Z, U, V ), is p(x; θ, Λ) = (1 exp( Λ(u)e θ z )) δ 1 (exp( Λ(u)e θ z ) exp( Λ(v)e θ z )) δ 2 (exp( Λ(v)e θ z ) δ 3. Let φ = log Λ. we reparametrize this log-likelihood in terms of (θ, φ). The resulting loglikelihood for a sample of size one is, up to an additive term not dependent on (θ, φ) is l(θ, φ; x) = log p(x, θ, φ) = δ 1 log(1 exp( e z θ+φ(u) )) + δ 2 log(exp( e z θ+φ(u) ) exp( e z θ+φ(v) )) δ 3 e z θ+φ(v). Let X = (X 1, X 2,, X n ) with X i = (δ 1i, δ 2i, U i, V i, Z i ), for 1 i n being a random sample with the same distribution as X = (δ 1, δ 2, U, V, Z). The log-likelihood for this random sample is l n (θ, φ) = n l(θ, φ; X i ). i=1 Because φ is a nondecreasing function, it is desirable to restrict its estimator to be also nondecreasing. Therefore, we seek an estimate of φ in the space of M n. (The abbreviation of M n (D n, K n, m)) The B-splines sieve MLE of τ 0 = (θ 0, φ 0 ) is the τ n = ( θ n, φ n ) that maximizes l n (θ, φ) over Θ M n. This is equivalent to maximizing l n (θ, B nβ) over Θ B n. No restriction will be placed on Θ. Thus Θ can be taken to be R d. As the same as in the model for panel count data, the B-splines sieve MLE is much easy to compute than the semiparametric MLE studied in Huang and Wellner (1995). In 19

20 addition, we can show that under some mild regularity conditions, the sieve MLE of θ, θ n, achieves the semiparametric efficiency as well, with the information given by I(θ 0 ) = P (l (θ 0, φ 0 ; X) 2 = P ( l1 (θ 0, φ 0 ; X) l ) 2 2 (θ 0, φ 0 ; X)(ξ 0 ), where ξ 0 (t) is the solution of a Fredholm integral equation of the second kind, ξ 0 (t) K(t, x)ξ 0 (x)dx = d(t) with two complicate functions K(t, x) and d(t) described in Huang and Wellner (1995). It is obvious that a direct estimation of the information matrix is impossible for this model. However, with the B-splines sieve MLE approach, variance of θ n can be readily estimated using the observed information matrix defined in (3.1) due to Propositions 2.1 and 3.1. For the asymptotic normality of θ n and consistency of the inverse observed information matrices, the following conditions are assumed. (C1) (a) E(ZZ ) is nonsingular; (b) Z is bounded, that is, there exists z 0 > 0 such that P ( Z z 0 ) = 1. (C2) (a) There exists a positive number η such that P (V U η) = 1; (b) the union of the supports of U and V is contained in an interval [a, b], where 0 < a < b <, and 0 < Λ 0 (a) < Λ 0 (b) <. (C3) Λ 0 belongs to Φ, a class of functions with bounded pth derivative in [a, b] for p 1 and the first derivative of Λ 0 is strictly positive and continuous on [a, b]. (C4) The conditional density g(u, v z) of (U, V ) given Z has bounded partial derivatives with respect to (u, v). The bounds of these partial derivatives do not depend on (u, v, z). 20

21 (C5) For some κ (0, 1), a T var(z U)a κa T E(ZZ T U)a and a T var(z V )a κa T E(ZZ T V )a a.s. for all a R d. It should be noted that in applications, implementation of the proposed estimation method does not require these conditions to be satisfied. These conditions are sufficient but may not be necessary to prove the following asymptotic theorem. Some conditions may be weakened but will make the proof considerably more difficulty. However, from a view point of practice, these conditions are usually satisfied. To study the asymptotic properties, we facilitate a metric defined as follows: for any φ 1, φ 2 Φ, define φ 1 φ 2 2 Φ = E[φ 1 (U) φ 2 (U)] 2 + E[φ 1 (V ) φ 2 (V )] 2. and for any τ 1 = (θ 1, φ 1 ) and τ 2 = (θ 2, φ 2 ) in the space of T = Θ Φ, define d(τ 1, τ 2 ) = τ 1 τ 2 T = { θ 1 θ φ 1 φ 2 2 Φ} 1/2. Theorem 4.1. Let K n = O(n ν ), where ν satisfies the restriction 1 2(1+p) < ν < 1 2p. Suppose that T and (U, V ) are conditionally independent given Z and that the distribution of (U, V, Z) does not involve (θ, Λ). Furthermore, suppose that conditions (C1) (C5) hold. Then (i) d ( τ n, τ 0 ) p 0. (ii) d ( τ n, τ 0 ) = O p ( n min(pν,(1 ν)/2) ). Thus if ν = 1/(1 + 2p), d( τ n, τ 0 ) = O p (n p/(1+2p) ). This is the optimal rate of convergence in nonparametric regression with comparable smoothness assumptions. 21

22 (iii) n 1/2 ( θ n θ 0 ) = n 1/2 I 1 (θ 0 ) n i=1 l (θ 0, φ 0 ; X i ) + o p (1) d N(0, I 1 (θ 0 )). Thus θ n is asymptotically normal and efficient. (iv) The inverse observed information matrix is a consistent estimator of I 1 (θ 0 ), the asymptotic variance of n 1/2 ( θ n θ 0 ). The proof of this theorem is considerably long, we will provide a sketch of the proof in Appendix A. 5. Numerical Results 5.1. Simulations Studies In this section, we conduct simulation studies for the examples discussed in the preceding section to evaluate the finite sample performance of the proposed estimator. In each example, we estimate the unknown parameters using the cubic B-splines sieve maximum likelihood estimation and estimate the standard error of the regression parameter estimates using the proposed least-squares method based on the cubic B-splines as well. For the B-splines sieve, the number of knots is chosen as K n = [N 1/3 ], the largest integer below N 1/3, where N is the number of distinct observation time points in the data, and the knots are evenly placed between (0, 1) in the first example and (0, 5) in the second example. Simulation 1: Panel Count Data. We generate the data with the setting given in Wellner and Zhang (2007). For each subject, we independently generate X i = (Z i, K i, T (i) K i, N (i) K i ), for i = 1, 2,, n, where Z i = (Z i,1, Z i,2, Z i,3 ) with Z i,1 Unif(0, 1), Z i,2 N(0, 1), and Z i,3 Bernoulli(0.5); K i is sampled randomly from the discrete set, {1, 2, 3, 4, 5, 6}; Given K i, T (i) K i = (T (i) K i,1, T (i) K i,2,, T (i) K i,k i ) are the order statistics of 22

23 Table 1: Monte-Carlo simulations results of the B-splines sieve MLE of θ 0 with 1000 repetitions for semiparametric analysis of panel count data n=50 n=100 n=200 θ 0,1 θ 0,2 θ 0,3 θ 0,1 θ 0,2 θ 0,3 θ 0,1 θ 0,2 θ 0,3 Bias M-C sd ASE %-CP 98.4% 98.5% 97.5% 97.6% 97.9% 96.5% 96.2% 95.1% 96.6% K i random draws from Unif(0, 1); The panel counts N (i) K i = (N (i) K i,1, N(i) K i,2,, N(i) K i,k i ) are generated from the Poisson process with the conditional mean function given by Λ(t Z i ) = 2t exp(θ T 0 Z i ) with θ 0 = ( 1.0, 0.5, 1.5) T. We conduct the simulation study for n =50, 100 and 200, respectively. In each case, we perform Monte-Carlo study with 1000 repetitions. Table 1 displays the estimation bias(bias), Monte-Carlo standard deviation(m-c s.d.), the average of standard errors using the proposed method(ase), and the coverage probability of 95% Wald-confidence interval using the proposed estimator of standard error(95%-pc). The results show that the B-splines sieve MLE performs quite well. It has very little bias with seemingly decrease of estimation variability as sample size increases. The proposed method tends to overestimate the standard error slightly but the overestimation lessens as sample size increases. As the result of overestimation, the coverage probability of 95% confidence interval exceeds the nominal value when sample size is 50 or 100 but gets closer to 95% with sample size increasing to 200. Simulation 2: Interval-Censored Data. We generate the data in a way similar to 23

24 Table 2: Monte-Carlo simulations results for B-splines sieve MLE of θ 0 with 1000 repetitions for semiparametric analysis of interval-censored data n=50 n=100 n=200 Bias M-C sd ASE %-CP 97.4% 96.4% 95.6% what was used in Huang and Rossini (1997). For each subject, we independently generate X i = (U i, V i, δ i,1 δ i,2, Z i ), for i = 1, 2,, n, where Z i Bernoulli(0.5); we simulate a series of examination times by the partial sum of interarrival times that are independently and identically distributed according to exp(1), then U i is the last examination time within 5 at which the event has not occurred yet and V i is the first observation time within 5 at which the event has occurred; the event time is generated according to Cox proportional hazards model Λ(t) = t exp(z) for which the true parameters are: θ 0 = 1 and Λ 0 (t) = t. Similarly, Monte- Carlo study with 1000 repetition is performed and the corresponding results are displayed in Table 2. The results show that both bias and Monte-Carlo standard deviation decrease as sample size increases. As observed earlier, the proposed least square estimate of the standard error may overestimate the true value but the overestimation lessens as sample size increasing to 200. With sample size 200, the Wald 95% confidence-interval achieves the desired coverage probability. 24

25 5.2. Applications This section illustrates the method in two real life examples: the bladder tumor randomized clinical trial conducted by Byar, Blackard and VACURG (1977) and the breast cosmesis trial described by Finkelstein and Wolfe (1985). We adopt the cubic B-splines sieve semiparametric MLE method and we estimate the asymptotic standard error of the estimates of regression parameter using the least-squares approach with cubic B-splines as well. The inference is made based on asymptotic theorem developed in this paper. Example 1: Bladder Tumor Study. The data set of the bladder tumor randomized clinical trial conducted by the Veterans Administration Cooperative Urological Research Group (Byar, Blackard and VACURG, 1977) is extracted from Andrews and Herzberg (1985, p ). In this study, a randomized clinical trial of three treatments : placebo, pyridoxine pill and thiotepa instillation into bladder, was conducted for patients with superficial bladder tumor (a total of 116 subjects: 40 were randomized to placebo, 31 to pyridoxine pill and 38 to thiotepa instillation) when entering the trial. At each follow-up visit, tumors were counted, measured and then removed after being found. The treatments as originally assigned will continue after each visit. The number of follow-up visits and follow-up times varied greatly from patient to patient and hence the observation of bladder tumor counts in this study falls in the framework of panel count data as described in Section 4. For this trial, the treatment effects, particularly the thiotepa instillation method, on reducing the bladder tumor recurrent have been the focal point of interest in many studies, see for example, Wei, Lin and Weissfeld (1989), Sun and Wei (2000), Zhang (2002) and Wellner and Zhang (2007). In this paper, we study the proportional mean model as proposed by Wellner and Zhang (2007), 25

26 (5.1) E{N(t) Z} = Λ 0 (t) exp(θ 0,1 Z 1 + θ 0,2 Z 2 + θ 0,3 Z 3 + θ 0,4 Z 4 ), where Z 1 and Z 2 are the baseline tumor count and size, measured when subjects entered the study, and Z 3 and Z 4 define the indicators of the pyridoxine pill and instillation treatments, respectively. Lu, Zhang and Huang (2008) has used the cubic B-splines sieve semiparametric MLE method for this model and estimated the asymptotic standard error of the estimate of θ 0 based on the bootstrap approach. In this paper, we reanalyze the data using the same method but estimate the asymptotic standard error using the least-squares approach. The semiparametric sieve MLE of θ 0 is θ n = (0.2076, , , ) with the asymptotic standard errors given by (0.0066, , , ) and the corresponding p- values=(0.0000,0.0004,0.0553,0.0000) based on the asymptotic theorem developed in Wellner and Zhang (2007). We note that the result of this analysis is very different from that of Lu, Zhang and Huang (2008). The conflict of the results indirectly indicates that the working assumption of Poisson process model to form the likelihood may not be valid as Lu-Zhang- Huang s inference procedure is shown to be robust against the underlying counting process. Example 2: Breast Cosmesis Study. The breast cosmesis study is the clinical trial for comparing radiotherapy alone with primary radiotherapy with adjuvant chemotherapy in terms of subsequent cosmetic deterioration of the breast following tumorectomy. Subjects (46 assigned to radiotherapy alone and 48 to radiotherapy plus chemotherapy) were followed for up to 60 months, with pre-scheduled follow-up visits for every 4-6 months. In this paper, we propose Cox proportional hazards model to analyze the difference in time until appearance 26

27 of breast retraction, (5.2) Λ(t Z) = Λ 0 (t) exp(θ 0 Z), where Λ 0 is the baseline hazard (the hazard of the time to appearance of breast retraction) for radiotherapy alone) and Z is the indicator for the treatment of radiotherapy plus chemotherapy. Using the method proposed in this paper, the cubic B-splines sieve semiparametric MLE of θ 0 is θ n = with asymptotic standard error given by The Wald test statistic is Z = with the associated p-value= This indicates that the treatment of radiotherapy with adjuvant chemotherapy significantly increases the risk of the breast retraction and the result is comparable with what has been concluded in Finkelstein and Wolfe (1985). 6. Conclusion and Discussion When the infinite dimensional parameter as nuisance parameter can not be eliminated in estimating the finite dimensional parameter, a general semiparametric maximum likelihood estimation is often a challenging task. Sieve MLE method, proposed originally by Geman and Hwang (1982), renders a practical approach for alleviating the difficulty in semiparametric estimation problem. Particularly, the spline-based sieve semiparametric method, as exemplified by Lu, Zhang and Huang (2008) has many attractions in practice. Not only it reduces the numerical difficulty in computing the semiparametric MLE, it also achieves the semiparametric asymptotic estimation efficiency for the finite dimensional parameter. However, the estimation of the information matrix remains a difficult task in general. In this paper, we propose an easy-to-implement least-squares approach to estimate 27

28 the semiparametric information matrix. We show that the estimator is asymptotic consistent, as often a byproduct of the establishment of asymptotic normality for sieve semiparametric MLE. Interestingly, this estimator is exactly the observed information matrix if we treat the semiparametric sieve MLE as a parametric MLE problem. Although the estimator of asymptotic error overestimates the true value slightly in finite sample situation as shown in our simulation studies, the overestimation issue is generally alleviated as sample size increases, say to 200. In addition to the expression of information matrix given in (2.7), we note that it can be expressed through the second derivatives. Denote l 11 (τ; x) = l θ 1 (τ; x), l12 (τ; x) = l s 1 (θ, φ (s) ; x) s=0, l 21 (τ; x)(h) = l θ 2 (τ; x)(h), and letting ( / s) φ 1(s) s=0 = h 1, l22 (τ; x)(h, h 1 ) = l s 2 (θ, φ 1(s) ; x)(h). Then the information matrix can be written as I(θ 0 ) = E [ l11 (τ 0 ; X) 2 l 12 (τ 0 ; X)(ξ 0 ) + l ] 22 (τ 0 ; X)(ξ 0, ξ 0 ). This expression leads to an alternative estimator of I(θ 0 ), given by (6.1) ρ n ( ξ n ) = n 1 n i=1 [ l11 ( τ n ; X i ) 2 l 12 ( τ n ; X i )( ξ n ) + l 22 ( τ n ; X i )( ξ n, ξ n )], where ξ n is the minimizer of ρ n (h) over H n. The consistency of ρ n ( ξ n ) can be similarly shown and it would be interesting to investigate how this estimator behaves compared to the one proposed in this paper. As implied in Example 1, this semiparametric inference procedure is not robust against the underlying probability model. If the true model is not the working model for forming the likelihood, the inference may be invalid. Wellner and Zhang (2007) developed a general theorem for making robust semiparametric inference and Lu, Zhang and Huang (2007) extend 28

29 the robust semiparametric inference to the spline-based sieve semiparametric estimation. It remains an open research problem on how to generalize the proposed least-squares method to estimate the asymptotic error of the regression parameter estimates in order to make robust semiparametric inference. 7. Appendix A This section contains the sketch of the proofs for Proposition 2.1 and Theorem 4.1. In the following, C represents a constant that may varies from place to place. The proof for Proposition 2.1: For the least favorable direction ξ 0, we define H n (ɛ) = {h : h H n and h ξ 0 H ɛ} for ɛ 0. Note that for any h H, ρ n (h) ρ n (ξ 0 ) = (P n P )l( τ n, h; X) (P n P )l( τ n, ξ 0 ; X) +P {l( τ n, h; X) l(τ 0, h; X)} P {l( τ n, ξ 0 ; X) l(τ 0, ξ 0 ; X)} +P {l(τ 0, h; X) l(τ 0, ξ 0 ; X)} Because the class I = {l(τ, h; X) : τ T and h H} is Glivenko-Cantelli and τ n H n H, we have that (P n P )l( τ n, h; X) = o P (1) and (P n P )l( τ n, ξ 0 ; X) = o p (1). In addition, by the weak consistency of τ n p τ 0, Continuous Mapping Theorem and 29

30 Dominate Convergence Theorem, we can conclude that P {l( τ n, h; X) l(τ 0, h; X)} p 0 and P {l( τ n, ξ 0 ; X) l(τ 0, ξ 0 ; X)} p 0. Hence for any h H, ρ n (h) ρ n (ξ 0 ) = ρ(h) ρ(ξ 0 ) + o p (1) > 0 in probability as n by the characteristics of the least favorable direction. This implies that ( ) P inf ρ n(h) > ρ n (ξ 0 ) p 1 h H n (ɛ) and hence let ɛ 0 we can conclude that ˆξ n ξ 0 H p 0. Subsequently, we can conclude ρ n (ˆξ n ) = P n l( τ n, ˆξ n ; X) = (P n P )l( τ n, ˆξ n ; X) + P l( τ n, ˆξ n ; X) p P l(τ 0, ξ 0 ; X) = I(θ 0 ), by the consistency of τ n and ˆξ n and I being a Glivenko-Cantelli. The proof for Theorem 4.1: To prove Part (i) of Theorem 4.1, we verify the conditions of Theorem 5.7 in van der Vaart (1998). Let M(τ) = P l(τ; X) = P l(θ, φ; X) and M n (τ) = P n l(τ; X) = P n l(θ, φ; X). Hence for any τ = T n = Θ M n, M n (τ) M(τ) = (P n P )l(τ; X). Let L 1 = {l(τ; X) : τ T n }. By the calculation of Shen and Wong (1994), page 597, ɛ > 0, the bracketing number of M n computed with L 1 (P )-norm is bounded by (1/ɛ) Cqn. 30

31 Since Θ R d is compact, Θ can be covered by C(1/ɛ) d balls with radius ɛ. Then by Conditions (C1)-(C3), we can easily construct ɛ-brackets for L 1 with the bracketing number with L 1 (P )-norm bounded by C(1/ɛ) Cqn+d. Hence L 1 is Glivenko-Cantelli by Theorem of van der Vaart and Wellner (1996). Therefore, sup τ Tn M n (τ) M(τ) p 0. Let g(z, t) = exp(θ z + φ(t)) and g 0 (z, t) = exp(θ 0z + φ 0 (t)). A straightforward algebra yields that M(τ 0 ) M(τ) = E { [1 exp( g 0 (Z, U))] log 1 exp( g 0(Z, U)) 1 exp( g(z, U)) +[exp( g 0 (Z, U)) exp( g 0 (Z, V ))] log exp( g 0(Z, U)) exp( g 0 (Z, V )) exp( g(z, U)) exp( g(z, V )) + exp( g 0 (Z, V )) log exp( g } 0(Z, V )) exp( g(z, V )) { ( ) 1 exp( g0 (Z, U)) = E [1 exp( g(z, U))]m 1 exp( g(z, U)) ( ) exp( g0 (Z, U)) exp( g 0 (Z, V )) +[exp( g(z, U)) exp( g(z, V ))]m exp( g(z, U)) exp( g(z, V )) ( )} exp( g0 (Z, V )) + exp( g(z, V ))m, exp( g(z, V )) where m(x) = x log x x + 1 (x 1) 2 /4 for 0 x 5. Further analysis using Taylor expansion and Conditions (C1)-(C3) leads to { 1 M(τ 0 ) M(τ) CE 1 exp( g(z, U)) [exp( g 0(Z, U)) exp( g(z, U))] 2 } 1 + exp( g(z, V )) [exp( g 0(Z, U)) exp( g(z, U))] 2 { CE [(θ 0 θ) Z + (φ 0 φ)(u)] 2 + [(θ 0 θ) Z + (φ 0 φ)(v )] 2}. With Conditions (C1)-(C5), using the same arguments as those in Wellner and Zhang (2007), 31

32 page leads to M(τ 0 ) M(τ) C ( θ θ φ φ 0 2 Φ) = Cd 2 (τ 0, τ). Then it implies that sup τ:d(τ,τ0 ) ɛ M(τ) M(τ 0 ) Cɛ 2 < M(τ 0 ). that For φ 0 Φ, Lu (2007) has shown that there exists a φ 0,n M n of order m p + 2 such This also implies that φ 0,n φ 0 Φ Cq p n φ 0,n φ 0 Cq p n = O ( n pν). = O (n pν ). Now let τ 0,n = (θ 0, φ 0,n ), we have M n ( τ n ) M n (τ 0 ) = M n ( τ n ) M n (τ 0,n ) + M n (τ 0,n ) M n (τ 0 ) P n l(τ 0,n ; X) P n l(τ 0 ; X) = (P n P ) {l(τ 0,n ; X) l(τ 0 ; X)} + M(τ 0,n ) M(τ 0 ). We can easily show that the class L 2 = {l(θ 0, φ; x) l(θ 0, φ 0 ; x) : φ M n and φ φ 0 Φ Cn pν } is a P -Donsker by calculating the bracketing number with L 2 (P )-norm. It is obvious that in this class P (l(θ 0, φ; X) l(θ 0, φ 0 ; X)) 2 0 as n. Hence (P n P ) {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)} = o p (n 1/2 ) by the relationship between P-Donsker and asymptotic equicontinuity given by Corollary of van der Vaart and Wellner (1996). By the Dominated Convergence Theorem, it is easy to see that M(τ 0,n ) M(τ 0 ) > o(1). Therefore, M n ( τ n ) M n (τ 0 ) o p (n 1/2 ) o(1) = o p (1). 32

33 This completes the proof of d( τ n, τ 0 ) p 0. To prove Part (ii), we verify the conditions of Theorem of van der Vaart and Wellner (1996). First, we have already shown in the proof of consistency that M(τ 0 ) M(τ) Cd 2 (τ 0, τ). Next, we further explore M n ( τ n ) M n (τ 0 ). In the proof of Part (i), we know that M n ( τ n ) M n (τ 0 ) I 1,n + I 2,n, where I 1,n = (P n P ) {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)} and I 2,n = P {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)}. By Taylor expansion, we have I 1,n = (P n P ) { l2 (θ 0, φ; } { X)(φ 0,n φ 0 ) = n pν+ɛ (P n P ) l 2 (θ 0, φ; X) φ } 0,n φ 0 n pν+ɛ for any 0 < ɛ < 1/2 pν. Because φ 0,n φ 0 = O(n pν ), using Corollary of van der Vaart and Wellner (1996), we can show that (P n P ) { l2 (θ 0, φ; } X) φ 0,n φ 0 = o n pν+ɛ p (n 1/2 ). Hence I 1,n = o p (n pν+ɛ n 1/2 ) = o p (n 2pν ). Using the fact that the function m(x) = x log x x+1 (x 1) 2 in the neighborhood of x = 1, it can be easily argued that M(τ 0 ) M(τ 0,n ) C φ 0,n φ 0 2 Φ = O(n 2pν ), which implies that I 2,n = M(τ 0,n ) M(τ 0 ) O(n 2pν ). Thus we conclude that M n ( τ n ) M n (τ 0 ) O p (n 2pν ) = O p ( n 2 min(pν,(1 ν)/2) ). Let L 3 (η) = {l(τ; x) l(τ 0 ; x) : φ M n and d(τ, τ 0 ) η}. Using the same argument as that in the proof of consistency, we obtain that the logarithm of the ɛ- bracketing number of L 3 (η), log N [ ] (ɛ, L 3 (η), L 2 (P )) is bounded by Cq n log(η/ɛ). This leads to J [ ] (η, L 3 (η), L 2 (P )) = η log N[ ] (ɛ, L 3 (η), L 2 (P ))dɛ Cqn 1/2 η. Because Conditions (C1) and (C2) guarantee the uniform boundedness of l(τ; x), using Theorem of van der Vaart and Wellner (1996), the key function φ n (η) in Theorem of van der Vaart and 33

Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines

Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines By MINGGEN LU, YING ZHANG Department of Biostatistics, The University of Iowa, 200 Hawkins Drive, C22 GH Iowa City,

More information

An Adaptive Spline-Based Sieve Semiparametric Maximum Likelihood Estimation for the Cox Model with Interval-Censored Data

An Adaptive Spline-Based Sieve Semiparametric Maximum Likelihood Estimation for the Cox Model with Interval-Censored Data An Adaptive Spline-Based Sieve Semiparametric Maximum Likelihood Estimation for the Cox Model with Interval-Censored Data By YING ZHANG Department of Biostatistics, The University of Iowa ying-j-zhang@uiowa.edu

More information

Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates

Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Jon A. Wellner 1 and Ying Zhang 2 December 1, 2006 Abstract We consider estimation in a particular semiparametric

More information

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Spline-based sieve semiparametric generalized estimating equation for panel count data

Spline-based sieve semiparametric generalized estimating equation for panel count data University of Iowa Iowa Research Online Theses and Dissertations Spring 2010 Spline-based sieve semiparametric generalized estimating equation for panel count data Lei Hua University of Iowa Copyright

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

arxiv:math/ v2 [math.st] 5 Dec 2007

arxiv:math/ v2 [math.st] 5 Dec 2007 The Annals of Statistics 2007, Vol. 35, No. 5, 2106 2142 DOI: 10.1214/009053607000000181 c Institute of Mathematical Statistics, 2007 arxiv:math/0509132v2 [math.st] 5 Dec 2007 TWO LIKELIHOOD-BASED SEMIPARAMETRIC

More information

Models for Multivariate Panel Count Data

Models for Multivariate Panel Count Data Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

Panel Count Data Regression with Informative Observation Times

Panel Count Data Regression with Informative Observation Times UW Biostatistics Working Paper Series 3-16-2010 Panel Count Data Regression with Informative Observation Times Petra Buzkova University of Washington, buzkova@u.washington.edu Suggested Citation Buzkova,

More information

SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam

SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam The Annals of Statistics 1997, Vol. 25, No. 4, 1471 159 SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam Likelihood

More information

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient?

A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient? A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood stimators Become Badly Inefficient? Jon A. Wellner 1, Ying Zhang 2, and Hao Liu Department of Statistics, University of

More information

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data

Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data University of South Carolina Scholar Commons Theses and Dissertations 2016 Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data Bin Yao University of South Carolina

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Large sample theory for merged data from multiple sources

Large sample theory for merged data from multiple sources Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2006 Paper 213 Targeted Maximum Likelihood Learning Mark J. van der Laan Daniel Rubin Division of Biostatistics,

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0534.R2 Title A NONPARAMETRIC REGRESSION MODEL FOR PANEL COUNT DATA ANALYSIS Manuscript ID SS-2016-0534.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0534

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA. A Dissertation Presented. the Faculty of the Graduate School

STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA. A Dissertation Presented. the Faculty of the Graduate School STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA A Dissertation Presented to the Faculty of the Graduate School University of Missouri-Columbia In Partial Fulfillment Of the Requirements

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Weighted likelihood estimation under two-phase sampling

Weighted likelihood estimation under two-phase sampling Weighted likelihood estimation under two-phase sampling Takumi Saegusa Department of Biostatistics University of Washington Seattle, WA 98195-7232 e-mail: tsaegusa@uw.edu and Jon A. Wellner Department

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Appendix E : Note on regular curves in Euclidean spaces

Appendix E : Note on regular curves in Euclidean spaces Appendix E : Note on regular curves in Euclidean spaces In Section III.5 of the course notes we posed the following question: Suppose that U is a connected open subset of R n and x, y U. Is there a continuous

More information

Locally Robust Semiparametric Estimation

Locally Robust Semiparametric Estimation Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

The partially monotone tensor spline estimation of joint distribution function with bivariate current status data

The partially monotone tensor spline estimation of joint distribution function with bivariate current status data University of Iowa Iowa Research Online Theses and Dissertations Summer 00 The partially monotone tensor spline estimation of joint distribution function with bivariate current status data Yuan Wu University

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Integrated likelihoods in survival models for highlystratified

Integrated likelihoods in survival models for highlystratified Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

More information

5.1 Approximation of convex functions

5.1 Approximation of convex functions Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and

More information

Ensemble estimation and variable selection with semiparametric regression models

Ensemble estimation and variable selection with semiparametric regression models Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 11, 2006 Abstract The behavior of maximum likelihood estimates (MLEs) and the likelihood ratio

More information

CURE MODEL WITH CURRENT STATUS DATA

CURE MODEL WITH CURRENT STATUS DATA Statistica Sinica 19 (2009), 233-249 CURE MODEL WITH CURRENT STATUS DATA Shuangge Ma Yale University Abstract: Current status data arise when only random censoring time and event status at censoring are

More information

Simple techniques for comparing survival functions with interval-censored data

Simple techniques for comparing survival functions with interval-censored data Simple techniques for comparing survival functions with interval-censored data Jinheum Kim, joint with Chung Mo Nam jinhkim@suwon.ac.kr Department of Applied Statistics University of Suwon Comparing survival

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information