A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models
|
|
- Philomena Park
- 5 years ago
- Views:
Transcription
1 A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University of Iowa May 5, 2008 Abstract A method is proposed for consistent information estimation in a class of semiparametric models. The method is based on the geometric interpretation of the efficient score function, that it is the residual of the orthogonal projection of the score function for the finite-dimensional parameter onto the tangent space for the infinitedimensional parameter. The empirical version of this projection is a least-squares nonparametric regression problem. Under appropriate conditions, the sum of squared residuals of this regression is shown to be a consistent estimator of the efficient Fisher information and is actually the observed information for a class of sieve maximum likelihood estimators. Simulations studies are conducted to evaluate finite sample performance of the estimator in two illustrating examples: Poisson proportional mean model for panel count data and Cox model for interval-censored data. Finally the method is applied to two real-life examples: bladder tumor study and breast cosmesis study. Key words and phrases. counting process, Cox model, empirical processes, information, interval censoring, maximum (profile) likelihood, monotone polynomial splines, panel count data, proportional hazards model, semiparametric model 1
2 1. Introduction In a regular parametric model, the maximum likelihood estimator (MLE) is asymptotically normal with variance equal to the inverse of the Fisher information, and the Fisher information can be estimated by the observed information. This result provides large sample justification for the use of normal approximation to the distribution of MLE. An important factor making this approximation useful in statistical inference is that the observed information can be readily computed and is consistent. In many situations, consistency of the observed information follows directly from the law of large numbers and consistency of MLE. Asymptotic normality of MLE of the regular parameters continues to hold in many semiparametric and nonparametric models. See for example, Chen (1988, 1995), Chang (1990), Geskus and Groeneboom (1996), Gill (1989), Groeneboom (1996), Groeneboom and Wellner (1992), Gu and Zhang (1993), Murphy (1995), Murphy, Rossini and van der Vaart (1997), Qin and Lawless (1993), Severini and Wong (1991), van der Laan (1993), van der Vaart (1994, 1996), Wong and Severini (1991), Huang (1996), Huang and Rossini (1997) and Wellner and Zhang (2007). In all these examples, the MLE or a smooth functional of the MLE is asymptotically normal with variance equal to the inverse of the efficient Fisher information. The asymptotic normality and efficiency results provide insight to the theoretical properties of maximum likelihood estimators. Unfortunately, in many semiparametric models studied in the aforementioned articles, the efficient Fisher information is either very complicated or may not have an explicit expression and they often can not be estimated directly as in the case of parametric MLE. Thus effort is required to estimate the asymptotic variance in order to apply these results to statistical inference. In this paper, we consider consistent information estimation in a class of semiparametric 2
3 models that are parametrized in terms of a finite-dimensional parameter θ and a parameter φ whose dimension increases with sample size. Hence φ is often called an infinite-dimensional parameter. Two important examples are Cox s (1972) proportional hazards model for interval-censored data studied by Huang and Wellner (1995) and the proportional mean model for panel count data proposed by Sun and Wei (2000) and Wellner and Zhang (2007). In these two examples, θ is the finite-dimensional regression coefficient, φ is the log of baseline hazard function or the log of baseline mean function. At least two methods used in parametric models for estimating the variance have been suggested in semiparametric models. The first method is to use the inverse observed information based on the likelihood, correcting for the presence of the infinite-dimensional parameter φ. However, when φ cannot be estimated at the usual root-n rate, consistency of this estimator has not been proved in general. The second method is to use the second derivative of the profile likelihood of θ at the maximum likelihood estimate. For a fixed value of θ, the profile likelihood is the maximum of the likelihood with respect to φ. Because the profile likelihood often can only be computed numerically, discretized versions of its second derivative are proposed by Nielson, Gill, Andersen and Sorensen (1992), and used by Huang and Wellner (1995) and Murphy and van der Vaart (1996). Using an ingenious sandwich inequality, Murphy and van der Vaart (2000) showed that the discretized versions of the second derivative of profile likelihood provides consistent variance estimator in a large class of semiparametric models. Murphy and van der Vaart (1999) also proved that the profile likelihood resembles the ordinary likelihood in many essential aspects. When the dimension of θ is one or two, it is relatively easy to compute and visualize the profile likelihood. For moderate to high dimensional θ, implementation of the profile likelihood approach may be computationally demanding and 3
4 sometimes difficult. In a general semiparametric maximum likelihood estimation problem, the joint estimation of (θ, φ) is often a quite challenging problem numerically. Sieve semiparametric M-estimation using polynomial splines proposed by Lu, Zhang and Huang (2008) for panel count data is shown numerically efficient. However, the estimation of standard error for the M-estimator of regression parameter still remains a big task and alternatively, they used the bootstrap standard error by taking the computation advantage in spline-based sieve M-estimation. In this article, we propose a least-squares approach to consistent estimation of the information matrix of the semiparametric maximum likelihood estimator of θ. The proposed method is based on the geometric interpretation of the efficient score function, that it is the residual of the projection of the score function for θ onto the tangent space for φ (van der Vaart, 1991, Bickel, Klaassen, Ritov and Wellner, 1993, Chapter 3). Thus the theoretical information calculation is a least-squares problem in a Hilbert space. When a sample from the model is available, this theoretical information can be estimated by its empirical version. It turns out that this empirical version is essentially a least-squares nonparametric regression problem, due to the fact that the score function for the infinite-dimensional parameter is a linear operator. In this nonparametric regression problem, the response is the score function for the finite-dimensional parameter θ, the covariate is the linear score operator for the infinite-dimensional parameter φ, and the regression parameter is the least favorable direction which is used to define the efficient score. Computationally, the proposed method can be implemented with a least-squares nonparametric regression fitting program. In a class of sieve MLE s using a linear approximation space, the proposed estimator of information matrix is shown to be the same as the inverse of the observed information matrix for the sieve MLE. This equivalence is useful from both the theoretical and the computational 4
5 point of view. First, this equivalence enables a simple and indirect consistency proof of the observed information matrix in the semiparametric setting. Second, it provides two ways of computing the observed information matrix: one can either directly compute the observed information matrix or fit a least-squares nonparametric regression. Because of its numerical convenience and good theoretical properties, the class of sieve MLE s using polynomial splines is utilized in our numerical demonstration and is recommended for applications of general semiparametric estimation. The paper is organized as follows. Section 2 describes the motivation and the leastsquares approach. Section 3 specializes the general approach to a class of sieve MLE s. Section 4 applies the proposed method along with the spline-based sieve MLE to two models, semiparametric Poisson mean model for panel count data and Cox proportional hazards model for interval censored data, studied in Wellner and Zhang (2007) and Huang and Wellner (1995), respectively. Section 5 renders numerical results via simulations and applications in real-life examples for the models discussed in Section 4. Section 6 concludes with some discussions. Some technical details are included in appendices. 2. The Least-Squares Approach Let X 1,..., X n be independent random variables with a common probability measure P θ,φ, where (θ, φ) Θ Φ. Here Θ is a subset of R d and Φ is a general space. Assume that P θ,φ has a density p( ; θ, φ) with respect to a σ-finite measure. Denote τ = (θ, φ) and let τ 0 = (θ 0, φ 0 ) Θ Φ be the true parameter value under which the data are generated. The maximum likelihood estimator of τ 0 is the value ˆτ n (ˆθ n, ˆφ n ) that maximizes the 5
6 log-likelihood l n (τ) = n log p(x i ; θ, φ) i=1 over the parameter space T Θ Φ. Let be the Euclidean distance of R d, and Φ be an appropriate norm defined on Φ. We assume it has been shown that (2.1) ˆτ n τ 0 T { ˆθ n θ ˆφ } 1/2 n φ 0 2 Φ = Op (rn 1 ), where r n is a sequence of numbers converging to infinity. Consistency and rate of convergence in nonparametric and semiparametric models have been addressed by many authors, see for example, Birgé and Massart (1993), van der Geer (1993), Shen and Wong (1995), and van der Vaart and Wellner (1996). The results and methods developed by these authors can often be used to verify (2.1). The motivation to study consistent information estimation is the following. In many semiparametric models, in addition to (2.1), it can be shown that (2.2) n 1/2 (ˆθn θ 0 ) d N ( 0, I 1 (θ 0 ) ), where I(θ 0 ) is the efficient Fisher information for θ 0, adjusting for the presence of nuisance parameter φ. The definition of I(θ 0 ) is given below. This holds for models cited in the previous section and for the examples in Section 4. Thus estimation of the asymptotic variance of ˆθ n is equivalent to estimation of I(θ 0 ) provided I(θ 0 ) is nonsingular. Of course, for the problem of estimating I(θ 0 ) to be meaningful, we need to first establish (2.2). The calculation of I(θ 0 ) and its central role in asymptotic efficiency theory for semiparametric models have been systematically studied by Begun, Hall, Huang and Wellner 6
7 (1983), van der Vaart (1991), and BKRW (1993) and the references therein. In the following, We first briefly describe how the information I(θ 0 ) is defined in a general semiparametric MLE setting in order to motivate the proposed information estimator. Let l(θ, φ; x) = log p(x; θ, φ) be the log-likelihood for a sample of size one. Consider a parametric smooth submodel with parameter (θ, φ (s) ), where φ (0) = φ and φ (s) s = h. s=0 Let H be the class of functions h defined by this equation. Usually, H is a Hilbert space. The score operator for φ is (2.3) l 2 (τ; x)(h) = s l(θ, φ (s); x). s=0 Observe that l 2 is a linear operator mapping H to L 2 (P θ,φ ). So for constants c 1, c 2 and h 1, h 2 H, (2.4) l 2 (τ; x)(c 1 h 1 + c 2 h 2 ) = c 1 l2 (τ; x)(h 1 ) + c 2 l2 (τ; x)(h 2 ). The linearity of l 2 is crucial to the proposed method. For a d-dimensional θ, l 1 (τ; x) is the vector of partial derivatives of l(τ; x) with respect to θ. For each component of l 1, a score operator for φ is defined as in (2.3). So the score operator for φ corresponding to l 1 is defined as (2.5) l 2 (τ; x)(h) ( l 2 (τ; x)(h 1 ),..., l 2 (τ; x)(h d )), 7
8 where h (h 1,..., h d ) with h k H, 1 k d. Let Ṗ2 be the closed linear span of { l 2 (h) : h H}. Then the efficient score function for the kth component of θ is l 1,k Π( l 1,k Ṗ2), where l 1,k is the kth component of l 1 (τ; x) and Π( l 1,k Ṗ2) is the projection of l 1,k onto Ṗ2. Equivalently, Π( l 1,k Ṗ2) is the minimizer of the squared residual P [ l 1,k (τ 0 ; X) η] 2 over η Ṗ2. See for example, van der Vaart (1991), Section 6, and BKRW (1993), Theorem 1, page 70. In general, η may not be a score function for φ, that is, there may not exist a h H such that η = l 2 (h). However, in models with regularity conditions, η either is a score function or can be approximated by a score function. So we assume that the efficient score vector for θ is l 1 (τ; x) l 2 (τ; x)(ξ 0 ), where ξ 0 is an element of H d that minimizes (2.6) ρ(h) E l 1 (τ; X) l 2 (τ; X)(h) 2 over H d. The minimizer ξ 0 = (ξ 01, ξ 02,..., ξ 0d ) is called the least favorable direction. Denote the efficient score by l (τ; x) l 1 (τ; x) l 2 (τ; x)(ξ 0 ). Then the information for θ is (2.7) I(θ) = E l (τ; X) 2 = E l 1 (τ; X) l 2 (τ; X)(ξ 0 ) 2. Therefore, to estimate I(θ), it is natural to consider minimizing an empirical version of (2.6). In particular, with the random sample X 1,..., X n and the consistent estimator τ n, we can estimate I(θ) by the minimum value of (2.8) ρ n (h) n 1 n l 1 ( τ n ; X i ) l 2 ( τ n ; X i )(h) 2 i=1 over H d. That is, if ξ n is a minimizer of ρ n over H d, then a natural estimator of I(θ 0 ) 8
9 is În ρ n ( ξ n ). Because l 2 is a linear operator, this minimization problem is essentially a least-squares nonparametric regression problem and it can be solved for each component separately. In the next section, we show that, in a class of sieve MLE s, the minimum value ρ n ( ξ n ) is actually the observed information based on the outer product of the first derivatives of the log-likelihood. In the following, we state a simple proposition which provides sufficient conditions ensuring consistency of the estimated least favorable direction and În as an estimator of I(θ 0 ). This proposition appears to be useful in a large class of models and is easy to apply. As in nonparametric regression, if the space H is too large, minimization over this space may not yield consistent estimators of ξ 0 and I(θ 0 ). We can use an approximation space H n (a sieve) which is smaller than H and converges to H as n tends to infinity. Under appropriate conditions, minimization of ρ n over H n will yield a consistent estimator of ξ 0. On the other hand, if minimizing over H does yield consistent estimators, then H n can be taken to be H. For simplicity and without loss of generality, it suffices to consider the case of onedimensional θ. In the following and throughout, we will use the linear functional notations for integrals. So for any probability measure Q, Qf = fdq as long as the integral is well defined. Below, P = P θ0,φ 0. P n is the empirical measure of X i, 1 i n. So P n f = n 1 n i=1 f(x i). Proposition 2.1. Denote ξ n = argmin Hn ρ n (h) and l(τ, h; x) = [ l 1 (τ; x) l 2 (τ; x)(h)] 2. If the class of functions I = {l(x, τ, h) : τ T, h H} is Glivenko-Cantelli and τ n p τ 0, then The proof is given is Appendix A. ρ n ( ξ n ) p I(θ 0 ). 9
10 3. Observed Information in Sieve MLE We now apply the method described in Section 2 to a class of sieve MLE s. We show that, if the parameter space Φ and the space H can be approximated by a common approximation space, and if this space has a basis, then the least-squares calculation in Section 2 yields the observed information matrix. In other words, computation of the observed information matrix is equivalent to solving the least-squares problem of Section 2. So there is no need to actually carry out the least-squares computation when the observed information can be computed as in the ordinary setting of parametric estimation. This is computationally convenient, because the observed information matrix is based on either the first derivatives or the second derivatives of the log-likelihood function and these derivatives are often already computed in a numerical algorithm for computing the MLE of unknown regression parameters. On the other hand, for problems in which direct computation of the observed information matrix is difficult, one can instead solve the least-squares nonparametric regression problem to obtain the observed information matrix. These nonparametric regression problems can be solved by using standard least-squares fitting programs for linear regression. As in finite-dimensional parametric models, some regularity conditions are required for the MLE θ n to be root-n consistent and asymptotically normal. These regularity conditions usually include certain smoothness assumptions on the infinite-dimensional parameter φ and the underlying probability model. Consequently, the least favorable direction will be a smooth function such as a bounded Lipschitz function. Then we can take H to be the class of such smooth functions. From the approximation theory, many spaces designed for efficient computation can be used to approximate any element in H with arbitrary precision under appropriately defined distance. For example, we may use the space of polynomial spline 10
11 functions (Schumaker, 1981). This class not only has good approximation power, but is also computationally convenient. We will use this approximation space in the next section. Let Φ n be an approximation space for both Φ and H. Suppose it has a set of basis functions B n = (b 1,..., b qn ), such that every φ Φ n can be represented as φ = q n j=1 β jb j B nβ, where β = (β 1,..., β qn ) B n R qn is a vector of real numbers. So every φ Φ n can be identified with a vector β B n. Here the dimension q n is a positive integer depending on sample size n. To ensure consistency of τ n, we need q n as n. In general, for θ n to be asymptotically normal, we need to control the growth rate of q n appropriately. The sieve MLE of τ 0 = (θ 0, φ 0 ) is defined to be the ( θ n, φ n ) that maximizes the loglikelihood l n (θ, φ) over Θ Φ n. Equivalently, one can find ( θ n, β n ) that maximizes l n (θ, B nβ) over Θ B n. Then φ n = B n β n. Now consider estimation of I(θ 0 ). First we introduce some convenient notations. Denote l 2 ( τ n ; x)(b n ) = ( l 2 ( τ n ; x)(b 1 ),..., l 2 ( τ n ; x)(b qn )) T, and ) 2 A 11 = P n ( l1 ( τ n ; X), A12 = P n l1 ( τ n ; X) l 2 T ( τ n ; X)(B n ), A 21 = A T 12, A 22 = P n ( l2 ( τ n ; X)(B n )) 2, where a 2 = aa T. The outer product version of the observed information for θ is given by (3.1) Ô n = A 11 A 12 A 22A 21. Here for any matrix A, A denotes its generalized inverse. Although A 22 may not be unique, A 12 A 22A 21 is unique by the results on the generalized inverse, see for example, 11
12 Rao (1973), Chapter 1, result 1b.5 (vii), page 26. The use of the generalized inverse in (3.1) is for generality of these formulas. When the nonparametric component φ is a smooth function, then for any fixed sample size, the sieve MLE is obtained over a finite-dimensional approximation space whose (theoretical) dimension is O(n ν ), where ν is typically less than 1/2, A 22 is usually invertible. We now show that Ôn equals În, which is the minimum value of ρ n (h) = P n l 1 ( τ n ; X) l 2 ( τ n ; X)(h) 2 over h Φ n for h = (h 1, h 2,, h d ). Write h j = B nc j for j = 1, 2,, d. Let Y i,j = l 1,j ( τ n ; X i ), the jth component of Y i = l 1 ( τ n ; X i ) and Z i = l 2 ( τ n ; X i )(B n ). This minimization problem becomes a least-squares problem of finding ĉ n = (ĉ n,1, ĉ n,2,, ĉ n,d ) that minimizes [ d ] P n (Y i,j Z ic j ) 2. j=1 By standard least-squares calculation, ( n ) ( n ) ĉ n,j = Z i Z i Z i Y i,j i=1 i=1 for j = 1, 2,, n. Hence, we have Î n = ρ n (ˆξ n ) = P n [ l1 ( τ n ; X) l ] 2 2 ( τ n ; X)( ξ n ) ] 2 = P n [ l1 ( τ n ; X) (ĉ n,1, ĉ n,2,, ĉ n,d ) l2 ( τ n ; X)(B n ) ( n ) ( n ) = P n [ l 1 ( τ n ; X) Y i Z i Z i Z i l 2 ( τ n ; X)(B n ) i=1 12 i=1 ] 2
13 = P n [ l1 ( τ n ; X) A 12 A l ] ( τ n ; X)(B n ) = A 11 A 12 A 22A 21 A 12 A 22A 21 + A 12 A 22A 22 A 22A 21 = A 11 A 12 A 22A 21 = Ôn We summarize the above calculation in the following proposition. Proposition 3.1. Assume that there exist β n B n and c n,j R q n for j = 1, 2,, d such that (3.2) B nβ n φ 0 Φ = O(k 1 1n ), and B nc n,j ξ 0,j Φ = O(k 1 2n ), j = 1, 2, d, where k 1n and k 2n are two sequences of numbers satisfying k 1n and k 2n as n. Suppose that the conditions of Proposition 2.1 are satisfied. Then the observed information matrix Ôn defined in (3.1) is a consistent estimator of I(θ 0 ). 4. Examples In this section, we illustrate the proposed method in two semiparametric regression models, including Poisson proportional mean model for panel count data studied in Wellner and Zhang (2007) and Cox model (1972) for interval-censored data studied in Huang and Wellner (1995). In these examples, the parameter space Φ will be the space of smooth functions defined below. The sieve Φ n is the space of polynomial splines. The polynomial splines have been used in many fully nonparametric regression models, see for example, Stone (1985, 1986). 13
14 Let a = d 0 < d 1 < < d Kn < d Kn +1 = b be a partition of [a, b] into K n subintervals I Kt = [d t, d t+1 ), t = 0,..., K 1 and I KK = [d K, d K+1 ], where K K n n v is a positive integer such that max 1 k K+1 d k d k 1 = O(n v ). Denote the set of partition points by D n = {d 1,..., d Kn }. Let S n (D n, K n, m) be the space of polynomial splines of order m 1 consisting of functions s satisfying: (i) the restriction of s to I Kt is a polynomial of order m for m K; (ii) for m 2 and 0 m m 2, s is m times continuously differentiable on [a, b]. This definition is phrased after Stone (1985), which is a descriptive version of Schumaker (1981), page 108, Definition 4.1. According to Schumaker (1981), page 117, Corollary 4.10, there exists a local basis B n {b t, 1 t q n }, so called B-splines, for S n (D n, K n, m), where q n K n + m. These basis functions are nonnegative and sum up to one at each point in [a, b], and each b t is zero outside the interval [d t, d t+m ] Poisson Proportional Mean Model for Panel Count Data Let {N(t) : t 0} be a univariate counting process. K is the total number of observations on the counting process and T = (T K,1,, T K,K ) is a sequence of random observation times with 0 < T K,1 < < T K,K. The counting process is only observed at those times with the cumulative events denoted by N = {N(T K,1 ),, N(T K,K )}. This type of data is referred to to panel count data by Sun and Kalbfleish (1995). In this manuscript, we assume that (K, T ) is conditionally independent of N given a vector of covariates Z and we denote the observed data consisting of independent and identically distributed X 1,, X n, where X i = (K i, T (i), N (i), Z i ) with T (i) = (T (i) K i,1,, T (i) K i,k i ) and N (i) = (N (i) (T (i) K i,1 ),, N(i) (T (i) K i,k i )), for i = 1,, n. Panel count data are often seen in clinical trials, social demographic and industrial reliability studies. Sun and Wei (2000) and Zhang (2002) proposed the proportional mean 14
15 model (4.1) Λ(t Z) = Λ 0 (t) exp(θ 0Z) to analyze panel count data semiparametrically, where Λ(t Z) = E(N(t) Z) is the expected cumulative events observed at time t, conditional on Z with the true baseline mean function given by Λ 0 (t). Wellner and Zhang (2007) proposed a nonhomogeneous Poisson process with the conditional mean function given by (4.1) to study the MLE of τ 0 = (θ 0, Λ 0 (t)). The log likelihood for (θ, Λ(t)) under the Poisson proportional mean model is given by l n (θ, Λ) = n K i i=1 j=1 [ ] N (i) K i,j log Λ K i,j + N (i) K i,j θ Z i exp(θ Z i ) Λ Ki,j, where N (i) K i,j = N(i) (T (i) K i,j ) N(i) (T (i) K i,j 1 ) and Λ Ki,j = Λ(T (i) (i) K i,j) Λ(T K i,j 1 ), for j = 1, 2,, K. To study the asymptotic properties of the MLE, Wellner and Zhang (2007) defined the following L 2 -norm, d(τ 1, τ 2 ) = { θ 1 θ } 1/2 Λ 1 (t) Λ 2 (t) 2 dµ 1 (t) with k µ 1 (t) = P (K = k Z = z) P (T K,j t K = k, Z = z)df (z). R d k=1 j=1 15
16 They showed that the semiparametric MLE, τ n = (ˆθ n, ˆΛ n ) converges to the true parameters τ 0 = (θ 0, Λ 0 ) (under some mild regularity conditions) in a rate lower than n 1/2, i.e. d( τ n, τ 0 ) = O p (n 1/3 ), however, the MLE of θ 0 is still semiparametric efficient, that is n(ˆθn θ 0 ) d N ( 0, I 1 (θ 0 ) ) with the Fisher information matrix given by I(θ 0 ) = E [ ] { Λ 0 (T K,j )e θ 0 Z Z E(Zeθ 0 Z } K, T K,j 1, T K,j ). E(e θ 0 Z K, T K,j 1, T K,j ) The computation of the semiparametric MLE is very time consuming as it requires the joint estimation of θ 0 and the infinite dimensional parameter Λ 0. Although the Fisher information has a nice explicit form, there is no easy method available to calculate the observed information. Lu (2007) studied the sieve MLE for the above semiparametric Poisson model using monotone polynomial splines. Let B n = (b 1,..., b qn ) be the basis of B-splines defined earlier. The monotone polynomial spline space is defined to be M n (D n, K n, m) = { φ n : φ n (t) = q n j=1 β j b j (t) S n (D n, K n, m), β B n, t [a, b] }. where B n = {β : β 1 β 2 β qn }. Each element of M n (D n, K n, m) is a nondecreasing function because of the monotonicity constraints on β 1,..., β qn. This fact is a consequence of the variation diminishing properties of B-splines. See for instance, Schumaker (1981), Example 4.75 and Theorem 4.76, pages Replaying Λ(t) by the exp( q n l=1 β lb j (t)) in the likelihood above, Lu (2007) solved the constraint optimization problem over the space 16
17 Θ B n. It turns out that, compared to Wellner and Zhang s method, the B-splines sieve MLE is much less computational demanding (Lu, Zhang and Huang, 2008) with a better overall convergence rate. In addition, the sieve MLE of θ is still semiparametric efficient with the same Fisher information matrix, I(θ 0 ). The bootstrap procedure was implemented to estimate I(θ 0 ) consistently by Lu, Huang and Zhang (2008) due to the computation advantage of the B-splines sieve MLE method. A straightforward algebra leads that l 1 (τ; x) l 2 (τ; x)(h) = K j=1 ( ) ( NK,j exp(β T Z) Λ Kj Z h ) Kj Λ Kj Under the regularity conditions given in Wellner and Zhang (2007), following the empirical process arguments used in Lu (2007) for monotone B-splines estimation, it can be easily shown that the class { l 1 (τ; x) l 2 (τ; x)(h) : τ T, h H} is Glivenko-Canteli, by showing the bracketing number with L 1 (P ) norm of this class is bounded. This will imply that {( l 1 (τ; x) l 2 (τ; x)(h)) 2 : τ T, h H} is Glivenko-Canteli as well, based on the given regularity conditions. Moreover, using monotone cubic B-splines in the sieve estimation automatically gives B nβ n φ 0 Φ = O(n pν ) and B nc n ξ 0 Φ = O(n pν ) due to Corollary 6.20 of Schumaker (1981). Hence the Fisher information I(θ 0 ) in semiparametric proportional mean model for panel count data can be consistently estimated using the proposed least-squares approach. 17
18 4.2. Cox Model for Interval-Censored Data Interval-censored data occur very frequently in long-term follow-up study for an event time of interest. The exact event time T is not observable; it is only known with certainty that T is bracketed between two adjacent examination times, or occurs before the first or after the last follow-up examination. Let (L, R) be the pair of examination times bracketing the event time T. That is, L is the last examination time before and R is the first examination time after the event. If 0 < L < R <, then T is interval-censored. If the event occurs before the first examination, then T is left-censored. If the event has not occurred after last examination, then T is right-censored. Such data is called case 2 interval-censored data. Nonparametric estimation of a distribution function and its smooth functionals with interval-censored data have been studied by Groeneboom and Wellner (1992), Groeneboom (1996), and Geskus and Groeneboom (1996). In this example, we consider the B-splines sieve MLE of the Cox proportional hazards model for the interval-censored data. With the proportional hazards model, the conditional hazard of T given a covariate vector Z R d is proportional to the baseline hazard. In terms of cumulative hazard functions, this model is (4.2) Λ(t z) = Λ 0 (t)e θ 0 z, where θ 0 is a d-dimensional regression parameter and Λ 0 is the unspecified baseline cumulative hazard function. Denote the two censoring variables by U and V, where P (U V ) = 1. Let G be the joint distribution function of (U, V ). Let δ 1 = 1 [T U], δ 2 = 1 [U<T V ] and δ 3 = 1 δ 1 δ 2. Assume that conditional on Z, T is independent of (U, V ), and that the joint distribution of 18
19 (U, V ) and Z does not depend on the parameters of interest. Then the density function of X (δ 1, δ 2, U, V, Z) with respect to the product of the counting measure on {0, 1} {0, 1} and the probability measure induced by the distribution of (Z, U, V ), is p(x; θ, Λ) = (1 exp( Λ(u)e θ z )) δ 1 (exp( Λ(u)e θ z ) exp( Λ(v)e θ z )) δ 2 (exp( Λ(v)e θ z ) δ 3. Let φ = log Λ. we reparametrize this log-likelihood in terms of (θ, φ). The resulting loglikelihood for a sample of size one is, up to an additive term not dependent on (θ, φ) is l(θ, φ; x) = log p(x, θ, φ) = δ 1 log(1 exp( e z θ+φ(u) )) + δ 2 log(exp( e z θ+φ(u) ) exp( e z θ+φ(v) )) δ 3 e z θ+φ(v). Let X = (X 1, X 2,, X n ) with X i = (δ 1i, δ 2i, U i, V i, Z i ), for 1 i n being a random sample with the same distribution as X = (δ 1, δ 2, U, V, Z). The log-likelihood for this random sample is l n (θ, φ) = n l(θ, φ; X i ). i=1 Because φ is a nondecreasing function, it is desirable to restrict its estimator to be also nondecreasing. Therefore, we seek an estimate of φ in the space of M n. (The abbreviation of M n (D n, K n, m)) The B-splines sieve MLE of τ 0 = (θ 0, φ 0 ) is the τ n = ( θ n, φ n ) that maximizes l n (θ, φ) over Θ M n. This is equivalent to maximizing l n (θ, B nβ) over Θ B n. No restriction will be placed on Θ. Thus Θ can be taken to be R d. As the same as in the model for panel count data, the B-splines sieve MLE is much easy to compute than the semiparametric MLE studied in Huang and Wellner (1995). In 19
20 addition, we can show that under some mild regularity conditions, the sieve MLE of θ, θ n, achieves the semiparametric efficiency as well, with the information given by I(θ 0 ) = P (l (θ 0, φ 0 ; X) 2 = P ( l1 (θ 0, φ 0 ; X) l ) 2 2 (θ 0, φ 0 ; X)(ξ 0 ), where ξ 0 (t) is the solution of a Fredholm integral equation of the second kind, ξ 0 (t) K(t, x)ξ 0 (x)dx = d(t) with two complicate functions K(t, x) and d(t) described in Huang and Wellner (1995). It is obvious that a direct estimation of the information matrix is impossible for this model. However, with the B-splines sieve MLE approach, variance of θ n can be readily estimated using the observed information matrix defined in (3.1) due to Propositions 2.1 and 3.1. For the asymptotic normality of θ n and consistency of the inverse observed information matrices, the following conditions are assumed. (C1) (a) E(ZZ ) is nonsingular; (b) Z is bounded, that is, there exists z 0 > 0 such that P ( Z z 0 ) = 1. (C2) (a) There exists a positive number η such that P (V U η) = 1; (b) the union of the supports of U and V is contained in an interval [a, b], where 0 < a < b <, and 0 < Λ 0 (a) < Λ 0 (b) <. (C3) Λ 0 belongs to Φ, a class of functions with bounded pth derivative in [a, b] for p 1 and the first derivative of Λ 0 is strictly positive and continuous on [a, b]. (C4) The conditional density g(u, v z) of (U, V ) given Z has bounded partial derivatives with respect to (u, v). The bounds of these partial derivatives do not depend on (u, v, z). 20
21 (C5) For some κ (0, 1), a T var(z U)a κa T E(ZZ T U)a and a T var(z V )a κa T E(ZZ T V )a a.s. for all a R d. It should be noted that in applications, implementation of the proposed estimation method does not require these conditions to be satisfied. These conditions are sufficient but may not be necessary to prove the following asymptotic theorem. Some conditions may be weakened but will make the proof considerably more difficulty. However, from a view point of practice, these conditions are usually satisfied. To study the asymptotic properties, we facilitate a metric defined as follows: for any φ 1, φ 2 Φ, define φ 1 φ 2 2 Φ = E[φ 1 (U) φ 2 (U)] 2 + E[φ 1 (V ) φ 2 (V )] 2. and for any τ 1 = (θ 1, φ 1 ) and τ 2 = (θ 2, φ 2 ) in the space of T = Θ Φ, define d(τ 1, τ 2 ) = τ 1 τ 2 T = { θ 1 θ φ 1 φ 2 2 Φ} 1/2. Theorem 4.1. Let K n = O(n ν ), where ν satisfies the restriction 1 2(1+p) < ν < 1 2p. Suppose that T and (U, V ) are conditionally independent given Z and that the distribution of (U, V, Z) does not involve (θ, Λ). Furthermore, suppose that conditions (C1) (C5) hold. Then (i) d ( τ n, τ 0 ) p 0. (ii) d ( τ n, τ 0 ) = O p ( n min(pν,(1 ν)/2) ). Thus if ν = 1/(1 + 2p), d( τ n, τ 0 ) = O p (n p/(1+2p) ). This is the optimal rate of convergence in nonparametric regression with comparable smoothness assumptions. 21
22 (iii) n 1/2 ( θ n θ 0 ) = n 1/2 I 1 (θ 0 ) n i=1 l (θ 0, φ 0 ; X i ) + o p (1) d N(0, I 1 (θ 0 )). Thus θ n is asymptotically normal and efficient. (iv) The inverse observed information matrix is a consistent estimator of I 1 (θ 0 ), the asymptotic variance of n 1/2 ( θ n θ 0 ). The proof of this theorem is considerably long, we will provide a sketch of the proof in Appendix A. 5. Numerical Results 5.1. Simulations Studies In this section, we conduct simulation studies for the examples discussed in the preceding section to evaluate the finite sample performance of the proposed estimator. In each example, we estimate the unknown parameters using the cubic B-splines sieve maximum likelihood estimation and estimate the standard error of the regression parameter estimates using the proposed least-squares method based on the cubic B-splines as well. For the B-splines sieve, the number of knots is chosen as K n = [N 1/3 ], the largest integer below N 1/3, where N is the number of distinct observation time points in the data, and the knots are evenly placed between (0, 1) in the first example and (0, 5) in the second example. Simulation 1: Panel Count Data. We generate the data with the setting given in Wellner and Zhang (2007). For each subject, we independently generate X i = (Z i, K i, T (i) K i, N (i) K i ), for i = 1, 2,, n, where Z i = (Z i,1, Z i,2, Z i,3 ) with Z i,1 Unif(0, 1), Z i,2 N(0, 1), and Z i,3 Bernoulli(0.5); K i is sampled randomly from the discrete set, {1, 2, 3, 4, 5, 6}; Given K i, T (i) K i = (T (i) K i,1, T (i) K i,2,, T (i) K i,k i ) are the order statistics of 22
23 Table 1: Monte-Carlo simulations results of the B-splines sieve MLE of θ 0 with 1000 repetitions for semiparametric analysis of panel count data n=50 n=100 n=200 θ 0,1 θ 0,2 θ 0,3 θ 0,1 θ 0,2 θ 0,3 θ 0,1 θ 0,2 θ 0,3 Bias M-C sd ASE %-CP 98.4% 98.5% 97.5% 97.6% 97.9% 96.5% 96.2% 95.1% 96.6% K i random draws from Unif(0, 1); The panel counts N (i) K i = (N (i) K i,1, N(i) K i,2,, N(i) K i,k i ) are generated from the Poisson process with the conditional mean function given by Λ(t Z i ) = 2t exp(θ T 0 Z i ) with θ 0 = ( 1.0, 0.5, 1.5) T. We conduct the simulation study for n =50, 100 and 200, respectively. In each case, we perform Monte-Carlo study with 1000 repetitions. Table 1 displays the estimation bias(bias), Monte-Carlo standard deviation(m-c s.d.), the average of standard errors using the proposed method(ase), and the coverage probability of 95% Wald-confidence interval using the proposed estimator of standard error(95%-pc). The results show that the B-splines sieve MLE performs quite well. It has very little bias with seemingly decrease of estimation variability as sample size increases. The proposed method tends to overestimate the standard error slightly but the overestimation lessens as sample size increases. As the result of overestimation, the coverage probability of 95% confidence interval exceeds the nominal value when sample size is 50 or 100 but gets closer to 95% with sample size increasing to 200. Simulation 2: Interval-Censored Data. We generate the data in a way similar to 23
24 Table 2: Monte-Carlo simulations results for B-splines sieve MLE of θ 0 with 1000 repetitions for semiparametric analysis of interval-censored data n=50 n=100 n=200 Bias M-C sd ASE %-CP 97.4% 96.4% 95.6% what was used in Huang and Rossini (1997). For each subject, we independently generate X i = (U i, V i, δ i,1 δ i,2, Z i ), for i = 1, 2,, n, where Z i Bernoulli(0.5); we simulate a series of examination times by the partial sum of interarrival times that are independently and identically distributed according to exp(1), then U i is the last examination time within 5 at which the event has not occurred yet and V i is the first observation time within 5 at which the event has occurred; the event time is generated according to Cox proportional hazards model Λ(t) = t exp(z) for which the true parameters are: θ 0 = 1 and Λ 0 (t) = t. Similarly, Monte- Carlo study with 1000 repetition is performed and the corresponding results are displayed in Table 2. The results show that both bias and Monte-Carlo standard deviation decrease as sample size increases. As observed earlier, the proposed least square estimate of the standard error may overestimate the true value but the overestimation lessens as sample size increasing to 200. With sample size 200, the Wald 95% confidence-interval achieves the desired coverage probability. 24
25 5.2. Applications This section illustrates the method in two real life examples: the bladder tumor randomized clinical trial conducted by Byar, Blackard and VACURG (1977) and the breast cosmesis trial described by Finkelstein and Wolfe (1985). We adopt the cubic B-splines sieve semiparametric MLE method and we estimate the asymptotic standard error of the estimates of regression parameter using the least-squares approach with cubic B-splines as well. The inference is made based on asymptotic theorem developed in this paper. Example 1: Bladder Tumor Study. The data set of the bladder tumor randomized clinical trial conducted by the Veterans Administration Cooperative Urological Research Group (Byar, Blackard and VACURG, 1977) is extracted from Andrews and Herzberg (1985, p ). In this study, a randomized clinical trial of three treatments : placebo, pyridoxine pill and thiotepa instillation into bladder, was conducted for patients with superficial bladder tumor (a total of 116 subjects: 40 were randomized to placebo, 31 to pyridoxine pill and 38 to thiotepa instillation) when entering the trial. At each follow-up visit, tumors were counted, measured and then removed after being found. The treatments as originally assigned will continue after each visit. The number of follow-up visits and follow-up times varied greatly from patient to patient and hence the observation of bladder tumor counts in this study falls in the framework of panel count data as described in Section 4. For this trial, the treatment effects, particularly the thiotepa instillation method, on reducing the bladder tumor recurrent have been the focal point of interest in many studies, see for example, Wei, Lin and Weissfeld (1989), Sun and Wei (2000), Zhang (2002) and Wellner and Zhang (2007). In this paper, we study the proportional mean model as proposed by Wellner and Zhang (2007), 25
26 (5.1) E{N(t) Z} = Λ 0 (t) exp(θ 0,1 Z 1 + θ 0,2 Z 2 + θ 0,3 Z 3 + θ 0,4 Z 4 ), where Z 1 and Z 2 are the baseline tumor count and size, measured when subjects entered the study, and Z 3 and Z 4 define the indicators of the pyridoxine pill and instillation treatments, respectively. Lu, Zhang and Huang (2008) has used the cubic B-splines sieve semiparametric MLE method for this model and estimated the asymptotic standard error of the estimate of θ 0 based on the bootstrap approach. In this paper, we reanalyze the data using the same method but estimate the asymptotic standard error using the least-squares approach. The semiparametric sieve MLE of θ 0 is θ n = (0.2076, , , ) with the asymptotic standard errors given by (0.0066, , , ) and the corresponding p- values=(0.0000,0.0004,0.0553,0.0000) based on the asymptotic theorem developed in Wellner and Zhang (2007). We note that the result of this analysis is very different from that of Lu, Zhang and Huang (2008). The conflict of the results indirectly indicates that the working assumption of Poisson process model to form the likelihood may not be valid as Lu-Zhang- Huang s inference procedure is shown to be robust against the underlying counting process. Example 2: Breast Cosmesis Study. The breast cosmesis study is the clinical trial for comparing radiotherapy alone with primary radiotherapy with adjuvant chemotherapy in terms of subsequent cosmetic deterioration of the breast following tumorectomy. Subjects (46 assigned to radiotherapy alone and 48 to radiotherapy plus chemotherapy) were followed for up to 60 months, with pre-scheduled follow-up visits for every 4-6 months. In this paper, we propose Cox proportional hazards model to analyze the difference in time until appearance 26
27 of breast retraction, (5.2) Λ(t Z) = Λ 0 (t) exp(θ 0 Z), where Λ 0 is the baseline hazard (the hazard of the time to appearance of breast retraction) for radiotherapy alone) and Z is the indicator for the treatment of radiotherapy plus chemotherapy. Using the method proposed in this paper, the cubic B-splines sieve semiparametric MLE of θ 0 is θ n = with asymptotic standard error given by The Wald test statistic is Z = with the associated p-value= This indicates that the treatment of radiotherapy with adjuvant chemotherapy significantly increases the risk of the breast retraction and the result is comparable with what has been concluded in Finkelstein and Wolfe (1985). 6. Conclusion and Discussion When the infinite dimensional parameter as nuisance parameter can not be eliminated in estimating the finite dimensional parameter, a general semiparametric maximum likelihood estimation is often a challenging task. Sieve MLE method, proposed originally by Geman and Hwang (1982), renders a practical approach for alleviating the difficulty in semiparametric estimation problem. Particularly, the spline-based sieve semiparametric method, as exemplified by Lu, Zhang and Huang (2008) has many attractions in practice. Not only it reduces the numerical difficulty in computing the semiparametric MLE, it also achieves the semiparametric asymptotic estimation efficiency for the finite dimensional parameter. However, the estimation of the information matrix remains a difficult task in general. In this paper, we propose an easy-to-implement least-squares approach to estimate 27
28 the semiparametric information matrix. We show that the estimator is asymptotic consistent, as often a byproduct of the establishment of asymptotic normality for sieve semiparametric MLE. Interestingly, this estimator is exactly the observed information matrix if we treat the semiparametric sieve MLE as a parametric MLE problem. Although the estimator of asymptotic error overestimates the true value slightly in finite sample situation as shown in our simulation studies, the overestimation issue is generally alleviated as sample size increases, say to 200. In addition to the expression of information matrix given in (2.7), we note that it can be expressed through the second derivatives. Denote l 11 (τ; x) = l θ 1 (τ; x), l12 (τ; x) = l s 1 (θ, φ (s) ; x) s=0, l 21 (τ; x)(h) = l θ 2 (τ; x)(h), and letting ( / s) φ 1(s) s=0 = h 1, l22 (τ; x)(h, h 1 ) = l s 2 (θ, φ 1(s) ; x)(h). Then the information matrix can be written as I(θ 0 ) = E [ l11 (τ 0 ; X) 2 l 12 (τ 0 ; X)(ξ 0 ) + l ] 22 (τ 0 ; X)(ξ 0, ξ 0 ). This expression leads to an alternative estimator of I(θ 0 ), given by (6.1) ρ n ( ξ n ) = n 1 n i=1 [ l11 ( τ n ; X i ) 2 l 12 ( τ n ; X i )( ξ n ) + l 22 ( τ n ; X i )( ξ n, ξ n )], where ξ n is the minimizer of ρ n (h) over H n. The consistency of ρ n ( ξ n ) can be similarly shown and it would be interesting to investigate how this estimator behaves compared to the one proposed in this paper. As implied in Example 1, this semiparametric inference procedure is not robust against the underlying probability model. If the true model is not the working model for forming the likelihood, the inference may be invalid. Wellner and Zhang (2007) developed a general theorem for making robust semiparametric inference and Lu, Zhang and Huang (2007) extend 28
29 the robust semiparametric inference to the spline-based sieve semiparametric estimation. It remains an open research problem on how to generalize the proposed least-squares method to estimate the asymptotic error of the regression parameter estimates in order to make robust semiparametric inference. 7. Appendix A This section contains the sketch of the proofs for Proposition 2.1 and Theorem 4.1. In the following, C represents a constant that may varies from place to place. The proof for Proposition 2.1: For the least favorable direction ξ 0, we define H n (ɛ) = {h : h H n and h ξ 0 H ɛ} for ɛ 0. Note that for any h H, ρ n (h) ρ n (ξ 0 ) = (P n P )l( τ n, h; X) (P n P )l( τ n, ξ 0 ; X) +P {l( τ n, h; X) l(τ 0, h; X)} P {l( τ n, ξ 0 ; X) l(τ 0, ξ 0 ; X)} +P {l(τ 0, h; X) l(τ 0, ξ 0 ; X)} Because the class I = {l(τ, h; X) : τ T and h H} is Glivenko-Cantelli and τ n H n H, we have that (P n P )l( τ n, h; X) = o P (1) and (P n P )l( τ n, ξ 0 ; X) = o p (1). In addition, by the weak consistency of τ n p τ 0, Continuous Mapping Theorem and 29
30 Dominate Convergence Theorem, we can conclude that P {l( τ n, h; X) l(τ 0, h; X)} p 0 and P {l( τ n, ξ 0 ; X) l(τ 0, ξ 0 ; X)} p 0. Hence for any h H, ρ n (h) ρ n (ξ 0 ) = ρ(h) ρ(ξ 0 ) + o p (1) > 0 in probability as n by the characteristics of the least favorable direction. This implies that ( ) P inf ρ n(h) > ρ n (ξ 0 ) p 1 h H n (ɛ) and hence let ɛ 0 we can conclude that ˆξ n ξ 0 H p 0. Subsequently, we can conclude ρ n (ˆξ n ) = P n l( τ n, ˆξ n ; X) = (P n P )l( τ n, ˆξ n ; X) + P l( τ n, ˆξ n ; X) p P l(τ 0, ξ 0 ; X) = I(θ 0 ), by the consistency of τ n and ˆξ n and I being a Glivenko-Cantelli. The proof for Theorem 4.1: To prove Part (i) of Theorem 4.1, we verify the conditions of Theorem 5.7 in van der Vaart (1998). Let M(τ) = P l(τ; X) = P l(θ, φ; X) and M n (τ) = P n l(τ; X) = P n l(θ, φ; X). Hence for any τ = T n = Θ M n, M n (τ) M(τ) = (P n P )l(τ; X). Let L 1 = {l(τ; X) : τ T n }. By the calculation of Shen and Wong (1994), page 597, ɛ > 0, the bracketing number of M n computed with L 1 (P )-norm is bounded by (1/ɛ) Cqn. 30
31 Since Θ R d is compact, Θ can be covered by C(1/ɛ) d balls with radius ɛ. Then by Conditions (C1)-(C3), we can easily construct ɛ-brackets for L 1 with the bracketing number with L 1 (P )-norm bounded by C(1/ɛ) Cqn+d. Hence L 1 is Glivenko-Cantelli by Theorem of van der Vaart and Wellner (1996). Therefore, sup τ Tn M n (τ) M(τ) p 0. Let g(z, t) = exp(θ z + φ(t)) and g 0 (z, t) = exp(θ 0z + φ 0 (t)). A straightforward algebra yields that M(τ 0 ) M(τ) = E { [1 exp( g 0 (Z, U))] log 1 exp( g 0(Z, U)) 1 exp( g(z, U)) +[exp( g 0 (Z, U)) exp( g 0 (Z, V ))] log exp( g 0(Z, U)) exp( g 0 (Z, V )) exp( g(z, U)) exp( g(z, V )) + exp( g 0 (Z, V )) log exp( g } 0(Z, V )) exp( g(z, V )) { ( ) 1 exp( g0 (Z, U)) = E [1 exp( g(z, U))]m 1 exp( g(z, U)) ( ) exp( g0 (Z, U)) exp( g 0 (Z, V )) +[exp( g(z, U)) exp( g(z, V ))]m exp( g(z, U)) exp( g(z, V )) ( )} exp( g0 (Z, V )) + exp( g(z, V ))m, exp( g(z, V )) where m(x) = x log x x + 1 (x 1) 2 /4 for 0 x 5. Further analysis using Taylor expansion and Conditions (C1)-(C3) leads to { 1 M(τ 0 ) M(τ) CE 1 exp( g(z, U)) [exp( g 0(Z, U)) exp( g(z, U))] 2 } 1 + exp( g(z, V )) [exp( g 0(Z, U)) exp( g(z, U))] 2 { CE [(θ 0 θ) Z + (φ 0 φ)(u)] 2 + [(θ 0 θ) Z + (φ 0 φ)(v )] 2}. With Conditions (C1)-(C5), using the same arguments as those in Wellner and Zhang (2007), 31
32 page leads to M(τ 0 ) M(τ) C ( θ θ φ φ 0 2 Φ) = Cd 2 (τ 0, τ). Then it implies that sup τ:d(τ,τ0 ) ɛ M(τ) M(τ 0 ) Cɛ 2 < M(τ 0 ). that For φ 0 Φ, Lu (2007) has shown that there exists a φ 0,n M n of order m p + 2 such This also implies that φ 0,n φ 0 Φ Cq p n φ 0,n φ 0 Cq p n = O ( n pν). = O (n pν ). Now let τ 0,n = (θ 0, φ 0,n ), we have M n ( τ n ) M n (τ 0 ) = M n ( τ n ) M n (τ 0,n ) + M n (τ 0,n ) M n (τ 0 ) P n l(τ 0,n ; X) P n l(τ 0 ; X) = (P n P ) {l(τ 0,n ; X) l(τ 0 ; X)} + M(τ 0,n ) M(τ 0 ). We can easily show that the class L 2 = {l(θ 0, φ; x) l(θ 0, φ 0 ; x) : φ M n and φ φ 0 Φ Cn pν } is a P -Donsker by calculating the bracketing number with L 2 (P )-norm. It is obvious that in this class P (l(θ 0, φ; X) l(θ 0, φ 0 ; X)) 2 0 as n. Hence (P n P ) {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)} = o p (n 1/2 ) by the relationship between P-Donsker and asymptotic equicontinuity given by Corollary of van der Vaart and Wellner (1996). By the Dominated Convergence Theorem, it is easy to see that M(τ 0,n ) M(τ 0 ) > o(1). Therefore, M n ( τ n ) M n (τ 0 ) o p (n 1/2 ) o(1) = o p (1). 32
33 This completes the proof of d( τ n, τ 0 ) p 0. To prove Part (ii), we verify the conditions of Theorem of van der Vaart and Wellner (1996). First, we have already shown in the proof of consistency that M(τ 0 ) M(τ) Cd 2 (τ 0, τ). Next, we further explore M n ( τ n ) M n (τ 0 ). In the proof of Part (i), we know that M n ( τ n ) M n (τ 0 ) I 1,n + I 2,n, where I 1,n = (P n P ) {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)} and I 2,n = P {l(θ 0, φ 0,n ; X) l(θ 0, φ 0 ; X)}. By Taylor expansion, we have I 1,n = (P n P ) { l2 (θ 0, φ; } { X)(φ 0,n φ 0 ) = n pν+ɛ (P n P ) l 2 (θ 0, φ; X) φ } 0,n φ 0 n pν+ɛ for any 0 < ɛ < 1/2 pν. Because φ 0,n φ 0 = O(n pν ), using Corollary of van der Vaart and Wellner (1996), we can show that (P n P ) { l2 (θ 0, φ; } X) φ 0,n φ 0 = o n pν+ɛ p (n 1/2 ). Hence I 1,n = o p (n pν+ɛ n 1/2 ) = o p (n 2pν ). Using the fact that the function m(x) = x log x x+1 (x 1) 2 in the neighborhood of x = 1, it can be easily argued that M(τ 0 ) M(τ 0,n ) C φ 0,n φ 0 2 Φ = O(n 2pν ), which implies that I 2,n = M(τ 0,n ) M(τ 0 ) O(n 2pν ). Thus we conclude that M n ( τ n ) M n (τ 0 ) O p (n 2pν ) = O p ( n 2 min(pν,(1 ν)/2) ). Let L 3 (η) = {l(τ; x) l(τ 0 ; x) : φ M n and d(τ, τ 0 ) η}. Using the same argument as that in the proof of consistency, we obtain that the logarithm of the ɛ- bracketing number of L 3 (η), log N [ ] (ɛ, L 3 (η), L 2 (P )) is bounded by Cq n log(η/ɛ). This leads to J [ ] (η, L 3 (η), L 2 (P )) = η log N[ ] (ɛ, L 3 (η), L 2 (P ))dɛ Cqn 1/2 η. Because Conditions (C1) and (C2) guarantee the uniform boundedness of l(τ; x), using Theorem of van der Vaart and Wellner (1996), the key function φ n (η) in Theorem of van der Vaart and 33
Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines
Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines By MINGGEN LU, YING ZHANG Department of Biostatistics, The University of Iowa, 200 Hawkins Drive, C22 GH Iowa City,
More informationAn Adaptive Spline-Based Sieve Semiparametric Maximum Likelihood Estimation for the Cox Model with Interval-Censored Data
An Adaptive Spline-Based Sieve Semiparametric Maximum Likelihood Estimation for the Cox Model with Interval-Censored Data By YING ZHANG Department of Biostatistics, The University of Iowa ying-j-zhang@uiowa.edu
More informationTwo Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates
Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates Jon A. Wellner 1 and Ying Zhang 2 December 1, 2006 Abstract We consider estimation in a particular semiparametric
More informationasymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data
asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationSpline-based sieve semiparametric generalized estimating equation for panel count data
University of Iowa Iowa Research Online Theses and Dissertations Spring 2010 Spline-based sieve semiparametric generalized estimating equation for panel count data Lei Hua University of Iowa Copyright
More informationM- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010
M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationarxiv:math/ v2 [math.st] 5 Dec 2007
The Annals of Statistics 2007, Vol. 35, No. 5, 2106 2142 DOI: 10.1214/009053607000000181 c Institute of Mathematical Statistics, 2007 arxiv:math/0509132v2 [math.st] 5 Dec 2007 TWO LIKELIHOOD-BASED SEMIPARAMETRIC
More informationModels for Multivariate Panel Count Data
Semiparametric Models for Multivariate Panel Count Data KyungMann Kim University of Wisconsin-Madison kmkim@biostat.wisc.edu 2 April 2015 Outline 1 Introduction 2 3 4 Panel Count Data Motivation Previous
More informationEstimation and Inference of Quantile Regression. for Survival Data under Biased Sampling
Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased
More informationChapter 2 Inference on Mean Residual Life-Overview
Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate
More informationA FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA
Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen
More informationEstimation for two-phase designs: semiparametric models and Z theorems
Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More informationPanel Count Data Regression with Informative Observation Times
UW Biostatistics Working Paper Series 3-16-2010 Panel Count Data Regression with Informative Observation Times Petra Buzkova University of Washington, buzkova@u.washington.edu Suggested Citation Buzkova,
More informationSEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE. By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam
The Annals of Statistics 1997, Vol. 25, No. 4, 1471 159 SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE By S. A. Murphy 1 and A. W. van der Vaart Pennsylvania State University and Free University Amsterdam Likelihood
More informationA Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood Estimators Become Badly Inefficient?
A Semiparametric Regression Model for Panel Count Data: When Do Pseudo-likelihood stimators Become Badly Inefficient? Jon A. Wellner 1, Ying Zhang 2, and Hao Liu Department of Statistics, University of
More informationA Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators
Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationBahadur representations for bootstrap quantiles 1
Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationApproximation of Survival Function by Taylor Series for General Partly Interval Censored Data
Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationEstimation of the Bivariate and Marginal Distributions with Censored Data
Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new
More informationSemiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data
University of South Carolina Scholar Commons Theses and Dissertations 2016 Semiparametric Regression Analysis of Panel Count Data and Interval-Censored Failure Time Data Bin Yao University of South Carolina
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH
FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter
More informationQuantile Regression for Residual Life and Empirical Likelihood
Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationLarge sample theory for merged data from multiple sources
Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationarxiv:submit/ [math.st] 6 May 2011
A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationEfficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models
NIH Talk, September 03 Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models Eric Slud, Math Dept, Univ of Maryland Ongoing joint project with Ilia
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations
Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2006 Paper 213 Targeted Maximum Likelihood Learning Mark J. van der Laan Daniel Rubin Division of Biostatistics,
More informationVerifying Regularity Conditions for Logit-Normal GLMM
Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More information1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models
Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following
More informationLikelihood Construction, Inference for Parametric Survival Distributions
Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make
More informationFull likelihood inferences in the Cox model: an empirical likelihood approach
Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationNETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA
Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2016-0534.R2 Title A NONPARAMETRIC REGRESSION MODEL FOR PANEL COUNT DATA ANALYSIS Manuscript ID SS-2016-0534.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202016.0534
More informationand Comparison with NPMLE
NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/
More informationSTATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA. A Dissertation Presented. the Faculty of the Graduate School
STATISTICAL ANALYSIS OF MULTIVARIATE INTERVAL-CENSORED FAILURE TIME DATA A Dissertation Presented to the Faculty of the Graduate School University of Missouri-Columbia In Partial Fulfillment Of the Requirements
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationWeighted likelihood estimation under two-phase sampling
Weighted likelihood estimation under two-phase sampling Takumi Saegusa Department of Biostatistics University of Washington Seattle, WA 98195-7232 e-mail: tsaegusa@uw.edu and Jon A. Wellner Department
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationAppendix E : Note on regular curves in Euclidean spaces
Appendix E : Note on regular curves in Euclidean spaces In Section III.5 of the course notes we posed the following question: Suppose that U is a connected open subset of R n and x, y U. Is there a continuous
More informationLocally Robust Semiparametric Estimation
Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper
More informationSTATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University
STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL A Thesis Presented to the Faculty of San Diego State University In Partial Fulfillment of the Requirements for the Degree
More informationLikelihood Based Inference for Monotone Response Models
Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic
More informationThe partially monotone tensor spline estimation of joint distribution function with bivariate current status data
University of Iowa Iowa Research Online Theses and Dissertations Summer 00 The partially monotone tensor spline estimation of joint distribution function with bivariate current status data Yuan Wu University
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationIntegrated likelihoods in survival models for highlystratified
Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola
More information5.1 Approximation of convex functions
Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and
More informationEnsemble estimation and variable selection with semiparametric regression models
Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationCURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University
CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationEfficient Estimation of Population Quantiles in General Semiparametric Regression Models
Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu
More informationMinimax Estimation of a nonlinear functional on a structured high-dimensional model
Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationSome New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.
Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationLikelihood Based Inference for Monotone Response Models
Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 11, 2006 Abstract The behavior of maximum likelihood estimates (MLEs) and the likelihood ratio
More informationCURE MODEL WITH CURRENT STATUS DATA
Statistica Sinica 19 (2009), 233-249 CURE MODEL WITH CURRENT STATUS DATA Shuangge Ma Yale University Abstract: Current status data arise when only random censoring time and event status at censoring are
More informationSimple techniques for comparing survival functions with interval-censored data
Simple techniques for comparing survival functions with interval-censored data Jinheum Kim, joint with Chung Mo Nam jinhkim@suwon.ac.kr Department of Applied Statistics University of Suwon Comparing survival
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More information