Semiparametric Additive Transformation Model under Current Status Data

Size: px
Start display at page:

Download "Semiparametric Additive Transformation Model under Current Status Data"

Transcription

1 Semiparametric Additive Transformation Model under Current Status Data Guang Cheng and Xiao Wang Purdue University Abstract We consider the efficient estimation of the semiparametric additive transformation model with current status data. A wide range of survival models and econometric models can be incorporated into this general transformation framework. We apply the B-spline approach to simultaneously estimate the linear regression vector, the nondecreasing transformation function, and a set of nonparametric regression functions. We show that the parametric estimate is semiparametric efficient in the presence of multiple nonparametric nuisance functions. An explicit consistent B-spline estimate of the asymptotic variance is also provided. All nonparametric estimates are smooth, and shown to be uniformly consistent and have faster than cubic rate of convergence. Interestingly, we observe the convergence rate interfere phenomenon, i.e., the convergence rates of B-spline estimators are all slowed down to equal the slowest one. The constrained optimization is not required in our implementation. Numerical results are used to illustrate the finite sample performance of the proposed estimators. Key Words: B-spline; Consistent variance estimation; Current status data; Efficient estimation; Semiparametric transformation models 1 Introduction We consider the efficient estimation of the following semiparametric additive transformation model: H(U) = Z β + h j (W j ) + ɛ, (1) Corresponding Author, Department of Statistics, Purdue University, West Lafayette, IN 4797, chengg@purdue.edu. Department of Statistics, Purdue University, West Lafayette, IN 4797, wangxiao@purdue.edu. 1

2 where H( ) is a monotone transformation function, h j ( ) s are smooth regression functions (with possibly different degrees of smoothness), and ɛ has a known distribution F ( ) with support R. A wide range of survival models and econometric models can be incorporated into the above general transformation framework, e.g., (Huang & Rossini, 1997; Shen, 1998; Huang, 1999; Banerjee et al., 26, 29). In particular, the model (1) can be readily applied to a failure time T by letting U = log T. We can obtain the partly linear additive Cox model, i.e., Huang (1999), by assuming F (s) = 1 exp( e s ) and H(u) = log A(e u ), where A is an unspecified cumulative hazard function. Specifically, the hazard function of T, given the covariates (z, w), has the form λ(t z, w) = a(t) exp( β z + h j (w j )), (2) where a(t) is the baseline hazard function, β = β and h j = h j. However, if we change the form of F (s) to e s /(1 + e s ), the model (1) just becomes the partly linear additive proportional odds model. Motivated by the close connection with survival models, we focus on the current status data in this paper which arises not only in survival analysis but also in demography, epidemiology, econometrics and bioassay. More specifically, we observe X = (V,, Z, W ), where V R is a random examination time and = 1{U V. We assume that U and V are independent given (Z, W ). Under current status data, the model (1) is also related to the semiparametric binary model studied in econometrics. Using the link function F ( ), we assume that the probability of = 1, given the covariates (Z, W, V ), is of the expression: ( ) P ( = 1 Z, W, V ) = F β Z + h j (W j ) + H(V ). (3) Note that Banerjee et al. (26) and Banerjee et al. (29) have done a great deal of statistical estimation and hypothesis testing on the model (3) (without h j terms) by assuming F ( ) to be log-log function and logistic function, respectively. An extensive discussions on the relation between (3) and survival models can be found in Doksum & Gasko (199). Recently a similar transformation model has been considered by Chen & Tong (21) but for the right censored data. They showed that the monotone transformation function is root-n estimable which will never be achieved in the case of current status data. This is the key theoretical difference between the two types of survival data. In this paper, we employ the B-spline approach to simultaneously estimate the vector β, monotone H and smooth h j s. The corresponding estimates are denoted as β, Ĥ and ĥj. In contrast, Ma & Kosorok (25a) apply the penalized NPMLE approach to (1) (with d = 1) which yields a non-smooth step function Ȟ and the penalized estimate ȟ. Our B-spline framework 2

3 has the following theoretical and computational advantages over the existing penalized NPMLE approach: 1. Our B-spline estimate Ĥ is smooth and uniformly consistent. However, Ȟ is always discontinues (regardless of the smoothness of its true function H ) and has a bias which does not vanish asymptotically; see Page 2258 of Ma & Kosorok (25a). We can also obtain the faster rate of convergence for Ĥ than that for Ȟ, i.e., O P (n 1/3 ), by using the B-spline estimation approach. Therefore, we expect more accurate inferences drawn from Ĥ. 2. We are able to give an explicit B-spline estimate for the asymptotic variance of β based on which the asymptotic confidence interval of β can be easily constructed. Under very weak conditions, its consistency is proven. However, the block jackknife approach in Ma & Kosorok (25a) requires more computation, and is even not theoretically justified. 3. Our spline estimation algorithm requires much less computation than the isotonic type algorithm used in Ma & Kosorok (25a) since the order of jumps in the step function is supposed to be much larger than the order of knots we choose for estimating H and h j s. In contrast with Huang (1999), we deal with the current status data rather than the right censored data, and thus we also need to estimate the monotone transformation function which has been profiled out in their partial likelihood framework. Despite the non-root-n convergence rates of Ĥ and ĥj s, we are able to show that β is root-n consistent, asymptotically normal and semiparametric efficient. We derive the efficient information bound by taking the general two-stage projection approach from Sasieni (1992) which is needed due to the involvement of multiple nonparametric functions in semiparametric models. Interestingly, we observe the convergence rate interfere phenomenon for the B-spline estimators, i.e., the convergence rates of nonparametric estimators are all slowed down to equal the slowest one. Moreover, by approximating log Ḣ with the B-spline, we can avoid the monotonicity constraint in the implementation, which is usually required in the literature, e.g., Zhang et al. (21). The remainder of the paper is organized as follows. Section 2 describes the B-spline estimation procedure. The asymptotic properties such as consistency and convergence rates of the estimates are obtained in Section 3. The asymptotic distribution of the parametric component is studied in Section 4, and its efficient information and the corresponding explicit B-spline estimate are given in Section 5. Simulation studies are presented in Section 6.1. We close with an appendix containing technical details. 3

4 2 Semiparametric B-spline Estimation 2.1 Assumptions We first define some notations. For any vector v, v 2 = vv. The notations > and < mean greater than, or smaller than, up to a universal constant. We denote A n B n if A n < B n and A n > B n. The notations P n and G n are used for the empirical distribution and the empirical process of the observations, respectively. Furthermore, we use the operator notation for evaluating expectation. Thus, for every measurable function f and true probability P, P n f = 1 n n f(x i ), P f = i=1 We next present some model assumptions. M1. U and V are independent given (Z, W ). fdp and G n f = 1 n n (f(x i ) P f). M2. (a) The covariates (Z, W ) are assumed to belong to a bounded subset in R l+d, say [, 1] l [, 1] d. The support for V is [l v, u v ], where < l v < u v < + ; (b) The joint density for (Z, V, W ) w.r.t. Lebesgue measure stays away from zero, and the joint density for (V, W ) stays away from infinity. M3. E(Z E(Z V, W )) 2 is strictly positive definite. M4. The residual error distribution F ( ) is assumed to be known and has support R. Denote the first, second and third derivative of F as f, f and f, respectively. We assume that (a) (f(u) f(u) f(u) ) M < over the whole R and f(u) stays away from zero in any compact set of R; (b) [f 2 (v) f(v)f (v)] [f 2 (v) + f(v)(1 F (v))] >, for all v R. Since we employ the smooth B-spline estimation rather than the penalized NPML estimation, our residue error Condition M4 is much less restrictive than that in Ma & Kosorok (25a). Note that Condition M4(b) ensures the concavity of the function s δ log F (s)+(1 δ) log(1 F (s)) for δ =, 1. It is easy to verify that the above Condition M4 is satisfied in the following two general classes of residue error distribution functions after some algebra. F1. F (s) = γ[2γ(γ 1 )] 1 s exp( t γ )dt for γ > 1 is a family of distributions, which includes the standard normal distribution after appropriate rescaling (γ = 2). This corresponds to the probit model Kalbfleisch & Prentice (198). i=1 4

5 F2. F (s) = 1 [1 + γe s ] 1/γ is a Pareto distribution with parameter γ (, ) and corresponds to the odds-rate transformation family, see Dabrowska & Doksum (1988a,b). It includes the following two well-known special cases: (a). Given γ, it yields the extreme value distribution, i.e. F (s) = 1 exp( e s ), which corresponds to the complementary log-log transformation, see Banerjee et al. (26); (b). Given γ = 1, it gives the logistic distribution, i.e. F (s) = e s /(1 + e s ), which corresponds to the logit transformation, see Banerjee et al. (29). 2.2 B-spline Estimation Framework From now on, we change the signs of β and h j for simplicity of exposition. In addition, we recenter H(v) to H(v) H(l v ) so that H(l v ) = for the purpose of identifiability. The additional parameter H(l v ) will be absorbed into the vector β, i.e., the first coordinate of z is set as one. Given a single observation at x = (v, δ, z, w), the log-likelihood of model (1) is written as { [ ] l(β, h 1,..., h d, H) = δ log F H(v) + β z + h j (w j ) +(1 δ) log { 1 F [ H(v) + β z + ] h j (w j ). (4) We assume that β B, which is a bounded open subset in R l, and that its true value β is an interior point of B. Before specifying the parameter spaces for H and h j s, we first introduce the Hölder ball H r c(y), which is a class of smooth functions widely used in the nonparametric estimation, e.g., Stone (1982, 1985). For any f H r c(y), it is J < r times continuously differentiable on Y and its J-th derivative is uniformly Hölder continuous with exponent κ r J (, 1], i.e., f (J) (y 1 ) f (J) (y 2 ) sup c. y 1,y 2 Y,y 1 y 2 y 1 y 2 κ The functions in the Hölder ball can always be approximated by a basis expansion, i.e., f(t) K γ k B k (t) = γ B(t), (5) k=1 where γ = (γ 1,..., γ K ) and B(t) = (B 1 (t),..., B K (t)). Actually, if the degree d of the B-spline satisfies d (r 1), we have f γ B K r as K, (6) 5

6 where denotes the supremum norm.. Assume the following parameter space Condition P1 for the smooth h j. P1. For j = 1,..., d and some known c j, we assume that the parameter space for h j is H j, where { 1 H j = h j : h j H r j c j [, 1] with r j > 1/2 and h j (w j )dw j =, and that the corresponding spline space is { H jn = h j : h j (w) = γ jb j (w) with h j c j and 1 h j (w j )dw j =, based on a system of basis functions B j = (B j1,..., B jkj ) of degree d j (r j 1). As seen from the previous examples, it is reasonable to assume that H( ) is differentiable and strictly increasing over [l v, u v ], i.e., Ḣ(v) C >. Considering that H(l v ) =, we can thus write H(v) = v l v exp(g(s))ds, where g(v) log Ḣ(v) is well defined. Such reparametrization can get around the strict monotonicity and positivity constraints of H, and thus avoids the constrained optimization in the computation. The parameter space Condition P2 for g is specified below. P2. For some known c, we assume that the parameter space for g is G, where and that the corresponding spline space is G = { g : g H r c [l v, u v ] with r > 1/2, G n = {g : g(v) = γ B (v) and g c based on a system of basis functions B = (B 1,..., B K ) of degree d (r 1). Similarly, we define G n = {H(v) = v l v exp(g(s))ds : g G n. By some algebra, we can show that H H r +1 c [l v, u v ] for some c <. REMARK 1. Note that in the theoretical proofs and numerical calculations the exact values of c j are not necessary. Instead, only the boundedness condition, equivalently the compactness of parameter spaces and spline spaces, is needed. Here we assume this boundedness condition, which can be relaxed by invoking the chaining arguments, only for simplifying our theoretical derivations. 6

7 In this paper, we propose the B-spline approach to estimate H and h j s as follows. Let A = B G Π d H j and A n = B G n Π d H jn. Denote α as (β, g, h 1,..., h d ) and its true value α as (β, g, h 1,..., h d ), where g ( ) = log Ḣ( ). The log-likelihood (4) for the observation i can thus be reparametrized as { [ ] vi l i (α) = δ i log F β z i + exp(g(s))ds + h j (w ij ) l v { [ ] vi +(1 δ i ) log 1 F β z i + exp(g(s))ds + h j (w ij ). (7) l v The corresponding B-spline estimate α is defined as n α = arg max l i (α). (8) α A n We can also write α = ( β, ĝ, ĥ1,..., ĥd) = ( β, γ B, γ 1B 1,..., γ d B d). Then, the estimate Ĥ(v) = v l v exp( γ B (s))ds. Some tedious algebra reveals that the Hessian matrix of l i (α) w.r.t. (β, γ, γ 1,..., γ d ) is indeed negative semidefinite under Condition M4(b) which guarantees the existence of α. See more discussions on the computation feasibility in the simulation section. The above estimation procedure also applies to other linear sieves approximating the Hölder ball (or more generally Hölder space), e.g., wavelets. i=1 3 Consistency and Rates of Convergence In this section, we show that our B-spline estimate is consistent and the convergence rate of each nonparametric estimate appears to interfere with each other. Define d(α, α ) = β β + H H 2 + h j h j 2, where 2 is the L 2 norm. Now we give the main Theorem of this section. THEOREM 1. Suppose that Conditions M1-M4 and P1-P2 hold. If K j /n for j =, 1,..., d, then we have d( α, α ) = o P (1). (9) More specifically, we further prove that ( { ) d( α, α ) = O P max K r j j K j /n. (1) j d 7

8 If we further require that K j n 1/(2r j+1) for j =,..., d, then we have where r = min j d {r j. d( α, α ) = O P (n r/(2r+1) ), (11) Under the right censored data, Huang (1999) derived similar convergence rate result (1) in the partly linear additive Cox model by assuming equal r j s. According to Theorem 1, the smooth Ĥ can achieve the rate of convergence, i.e., O P (n r/(2r+1) ), no slower than n 1/3 -rate derived in the penalized estimation context, see Ma & Kosorok (25a), when we assume that g and h j s are all at least continuously differentiable, i.e., r 1. More importantly, we can further show that Ĥ is uniformly consistent, i.e., Ĥ H = o P (1), by applying Lemma 2 in Chen & Shen (1998) that f < f 2r/(2r+d) L 2 (Leb) for any f H r c[a, b] d and noting that Ĥ, H H r +1 c [l v, u v ] for some c >. The above theorem also holds when we employ the constrained monotone B-spline to approximate H, i.e., γ B (v) log H(v) with γ 1 γ 2... γ K. However, such constrained optimization usually requires additional computational effort, see Zhang et al. (21). REMARK 2. From the above Theorem 1, we observe the interesting convergence rate interfere phenomenon, i.e., the convergence rate for each B-spline estimate is forced to equal the slowest one. In Ma & Kosorok (25a), they also show that the convergence rate of the penalized estimate h is unfortunately slowed down to O P (n 1/3 ) by the NPMLE H regardless of the smoothness degree of h. One possible solution in achieving the optimal rate for each nonparametric estimate is to extend the most recent mixed rate asymptotic results Radchenko (28) to the semiparametric setup. Since we assume that r > 1/2, the convergence rate given in (11) is always o P (n 1/4 ). Such a rate is usually fast enough to guarantee the regular asymptotic behavior of β, i.e., n- consistency and asymptotic normality. Indeed, we will improve the current suboptimal rate of β in (11) to the optimal n rate, and further show that β is semiparametric efficient in next section. 4 Weak Convergence of the Parametric Estimate In this section, we study the weak convergence of the spline estimate β in the presence of multiple nonparametric nuisance functions. We first calculate the semiparametric efficient information based on the projection onto the nonorthogonal sumspace. Let ( δ Q θ (x) = f(θ) F (θ) 8 1 δ ), 1 F (θ)

9 where θ(z, v, w) = β z + H(v) + d h j(w j ). Denote θ as the true value of θ. The score functions (operators) for β, g and h j are separately calculated as l β (X; α) = ZQ θ (X), (12) [ V ] l g [a](x; α) = exp(g(s))a(s)ds Q θ (X), (13) l v l hj [b j ](X; α) = b j (W j )Q θ (X). (14) uv We assume that a L 2 (H) {a : l v a 2 (s)dh(s) < and b j L 2(w j ) {b j : 1 b j(w j )dw j = and 1 b2 j(w j )dw j < so that all the score functions defined above are square integrable. To calculate the efficient score function l β, we need to find the projection of l β onto the sumspace A = A g + A h1 + + A hd, where A g = { l g [a] : a L 2 (H) and A hj = { l hj [b j ] : b j L 2(w j ). For simplicity, we define l β (X; α ) and l β (X; α) as l β and l β, respectively. The same notation rule applies to l g [a](x; α) and l hj [b j ](X; α). We define l β (X; α) = l β (X; α) l g [ā ](X; α) l hj [ b j ](X; α), where ā = (a 1,..., a l ) and b j = (b j1,..., b jl ). And (a k, b 1k,..., b dk ) is the minimizer of (a k, b 1k,..., b dk ) E { [ l β ] k l g [a k ] l hj [b jk ] for k = 1,..., l. Similarly, denote l β (X; α ) and l β (X; α) as l β and l β, respectively. By taking the two-stage projection approach from Sasieni (1992), we have ( l β (X) = Z b (W ) E((Z b (W ))Q 2 ) θ (X) V ) Q E(Q 2 θ (X) (15) θ (X) V ) where b (W ) = d b j (W j) satisfies { [ E Z b (W ) E((Z b (W ))Q 2 ] θ V ) E(Q 2 θ V ) k Q 2 θ b jk (W j ) 2 = (16) for every b jk L 2(w j ), j = 1,..., d and k = 1,..., l. By slightly modifying the proof of Lemma 4 in Ma & Kosorok (25a), we can show that the above nonorthogonal projection is well defined and b ( ) exists by the alternating projection Theorem A.4.2 in Bickel et al. (1993). 9

10 Define Π j and Π a as the projection operators respectively. Define Π j g E[g(V, W )Q2 θ W j = w j ] E[Q 2 θ W j = w j ], Π a g E[g(V, W )Q2 θ V = v], E[Q 2 θ V = v] D(v, w) = E[ZQ2 θ V = v, W = w] E[Q 2 θ V = v, W = w], S(v, w j) = E[Q2 θ V = v, W j = w j ], E[Q 2 θ W j = w j ] T (w i, w j ) = E[Q2 θ W i = w i, W j = w j ] E[Q 2 θ W j = w j ], U(w j, v) = E[Q2 θ W j = w j, V = v]. E[Q 2 θ V = v] We say a function f(s, t) belongs to a uniform Hölder ball H r c(s T ) in t relative to s if it is J < r continuously differentiable w.r.t. t and its J-th partial derivative satisfies, with κ r J, sup s S f sup t 1 t 2 (J) t (s, t 1 ) f (J) t (s, t 2 ) c. t 1 t 2 κ Define Sf(v, w j ) = S(v, w j )f V Wj (v, w j ), T f(w i, w j ) = T (w i, w j )f Wi W j (w i, w j ) and Uf(w j, v) = U(w j, v)f Wj V (w j, v), where f V Wj, f Wi W j and f Wj V are the conditional densities of V given W j, W i given W j and W j given V w.r.t. Lebesgue measure, respectively. Here, we assume some model assumptions implying that both b jk and a k belong to some Hölder balls for any j = 1,..., d and k = 1,..., l. M5. We assume that [Π j D(v, w)] k H r j c j [, 1], Sf(v, w j ) H r j c j ([l v, u v ] [, 1]) in w j relative to v and T f(w i, w j ) H r j c j [, 1] 2 in w j relative to w i for some < c j < and j = 1,..., d. M6. We assume that [Π a D(v, w)] k H r +1 c [l v, u v ] and Uf(w j, v) H r +1 c ([, 1] [l v, u v ]) in v relative to w j for some < c <. Note that we can simplify Sf(v, w j ) (T f(w i, w j )) to S(v, w j ) (T (w i, w j )) in Condition M5 and simplify Uf(w j, v) to U(w j, v) in Condition M6 when we assume that V and W are independent and that W is pairwise independent. THEOREM 2. Suppose that Conditions M1-M6 and P1-P2 hold. If K j n 1/(2r j+1) and Ĩ is invertible, then we have n( β β ) = 1 n where Ĩ is the efficient information matrix defined as E l β l β. n i=1 Ĩ 1 l d β (X i ) + o P (1) N(, Ĩ 1 ), (17) 1

11 5 B-spline Estimate of the Efficient Information In this section, we give an explicit B-spline estimate for the efficient information as a by-product of the establishment of asymptotic normality of β. Indeed, it is simply the observed information matrix if we treat the semiparametric model as a parametric one after the B-spline approximation, i.e., H j = H jn and G = G n. Specifically, we treat l i (α) defined in (7) as if it were a parametric likelihood l i (β, γ, γ 1,..., γ d ). We construct the corresponding information estimator for (β, γ, γ 1,..., γ 2 ) : ) Ĵ = (Î11 Î 12 Î 21 Î 22 (l+ d j= K j) (l+ d j= K j) where Îj,k = n i=1 A j(x i ; α)a k (X i; α)/n, for j, k = 1, 2, and A 1 (X; α) = l β (X; α),, A 2 (X; α) = ( lg [B 1 ],..., l g [B K], l h1[b 11 ],..., l hd [B dkd ] ). The parametric inferences imply that the information estimator for β is of the form Î = Î11 Î12Î 1 22 Î21. (18) Some calculations further reveal that Î = P n [ l β lĝ[( γ ) B ] lĥj [( γ j ) B j ]] 2, (19) where [ γ j ] K j l = (γ j1,..., γ jl ) for j =, 1,..., d and (γ k,..., γ dk )T = Î 1 22 Î211 k where 1 k represents the l-vector with its k-th element as one and others as zeros. We will use (18) as our estimator for Ĩ. We need the following additional assumption for Theorem 3. M7. We assume that E sup a k G n [ V l V ] 2 [exp(g(s)) exp(g (s))]a k (s)ds < H H 2 2. THEOREM 3. Under Conditions M1-M7 and P1-P2, we have Î P Ĩ. Note that the consistency of the similar random-sieve efficient information estimate was also proven in the linear regression models with current data; see Theorem 3 of Shen (2). 11

12 6 Numerical Results 6.1 Simulations We perform a Monte-Carlo study to assess the finite-sample performance of our proposed method. To compare with the penalized NPMLE in Ma & Kosorok (25a), we adopt the same setting used in their paper. We simulate the current status data from the partly linear additive Cox model which is a special case of general transformation model. We choose H(u) = log A(e u ) where A(u) = e k (exp(u/3) 1) with k = The errors ɛ follow an extreme value distribution with F (s) = 1 exp( e s ). The regression coefficients β 1 =.3 and β 2 =.25. The covariate Z 1 is Uniform[.5, 1.5] and Z 2 is Bernoulli with success probability.5. We choose W as Uniform[1, 1] and h(w) = sin(w/1.2 1) k. Censoring times are standard exponential distribution conditional on being in the interval [.2, 1.8]. The sample sizes are n = 4 and n = 16. We simulate 4 realizations for both sample sizes. In practice, the location and the numbers of knots for H and h j need to be determined. For simplicity, we will use the equal-spaced knots for all functions. Common model selection methods such as the Akaike information criterion (AIC), and the Bayesian information criterion (BIC) can be employed for selecting the number of knots. In this paper, we determine K, K 1,..., K d by the AIC, given by n AIC = 2 l i (ˆα) + 2(l + K j ) i=1 In our simulation, we use a quadratic spline to approximate both function h and function g in H. Then, AIC = 2 n i=1 l i(ˆα) + 2(K + K 1 + 2). Based on our experiences, it is generally adequate to choose less than ten knots to achieve reasonable approximation, provided that h and H are not overly erratic. Figure 1 shows the AIC scores under different combinations of K and K 1 for one realization of the simulation with the sample size n = 16. It shows that the optimal choices for K and K 1 are 5 and 5, respectively. The estimated h and H with various values of K and K 1 are plotted in Figure 2. In the left panel of Figure 2, we fix K = 5 and plot the estimated h with K 1 = 3, 5, 1. When K 1 is small (e.g., K 1 = 3), there seems be to a big bias in our estimator. On the other hand, when K 1 is large (e.g., K 1 = 1), the estimator displays a wiggly behavior. In the right panel of Figure 2, we fix K 1 = 5 and plot the estimated H with K = 5, 7, 1. As the number of knots is increasing, the estimated H shows a similar wiggly shape. Hence, the numbers of knots should be chosen with caution. It is worth noting that the selected values K, K 1,..., K d based the AIC criterion can be regarded only as the minimum numbers of knots required. They may not be the optimal choices since the concept of optimality is not well defined here. See Xue et al. (24) for similar discussions. Simulation results show that our B-spline estimation procedure performs quite well in the 12 j=

13 K =3 K =5 K =7 K =1 885 AIC K 1 Figure 1: AIC scores under different combinations of K and K K =3 1 K 1 =5 K 1 =1 True K =5 K =7 K =1 True h(w) H(u) w u Figure 2: Plot of the estimated h and H with various values of K and K 1. 13

14 Table 1: Monte Carlo results for the partly linear Cox model with current status data based on 4 replicates Sample size 4 Sample size 16 β 1 Bias SD ESD Coverage ESD-WB β 2 Bias SD ESD Coverage ESD-WB Joint Coverage SD: Standard error; ESD: Estimated standard error; ESD-WB: Estimated standard error from the weighted bootstrap method semiparametric transformation model. The bias and standard errors of the spline estimates of β 1 and β 2 are given in Table 1. The table shows that the sample biases of both β 1 and β 2 are small. The ratio of the standard errors for the two sample sizes is close to 2, a result consistent with a n-convergence rate for β 1 and β 2. The estimated standard errors from (18) (denoted as ESD) are also displayed in Table 1, which are very close to the simulation results. Although our proposed method tends to overestimate the standard error slightly but the overestimation lessens as sample size increases. We also compare our results with the weighted bootstrap method in Ma & Kosorok (25b). The weights are from the exponential distribution with mean one. The estimated standard errors are also similar to the results obtained using our explicit B-spline estimate. The 95% confidence interval constructed from (18) generally have coverage close to the nominal value. Histograms of β 1 and β 2 are shown in Figure 3. It is clear that the marginal distributions of β 1 and β 2 are Gaussian. The left panel of Figure 4 displays the spline estimate of h(w) and the monotone estimate Ĥ is given in the right panel of Figure 4. The dashed line is the true function, the solid line is the average estimate over 4 realizations, and the dash-dotted line is the 95% pointwise confidence band for h(w) or H(v) when we know the true model, which is obtained by taking 2.5 percentile and 97.5 percentile of these 4 estimates at each w or v. As suggested by one of the referees, we also perform a Monte-Carlo study by including two nonparametric functions in the model. Under the same setting as in the last study, the 14

15 Histgram of β 1 Histgram of β Frequency 12 1 Frequency β β 2 Figure 3: Histogram of β 1 and β 2 based on 16 samples and 4 replicates h(w) H(u) w u Figure 4: Left: Estimate and pointwise confidence interval for h. Right: Estimate and pointwise confidence interval for H. The solid line is the average estimate over 4 realizations from sample size n = 16, and the dashed line is the true function. The dash-dotted lines are the 95% pointwise confidence interval. 15

16 AIC K K Figure 5: AIC scores under different combinations of K, K 1, and K 2 two nonparametric functions are h 1 (w 1 ) = sin(w 1 /1.2 1) k with w 1 following a uniform distribution on [1, 1] and h 2 (w 2 ) = 3w2 2 1 with w 2 following a uniform distribution on [ 1, 1]. Figure 6 shows the AIC scores under different combinations of K, K 1 and K 2 for one realization of the simulation with the sample size n = 16. For illustration, we only plot two choices of K where the top surface is for K = 1 and the bottom surface is for K = 4. The optimal choice by the AIC criterion is (K, K 1, K 2 ) = (4, 5, 3). The spline estimates of h 1, h 2 and H under the optimal number of knots are displayed in Figure 6, and the dotted lines are the true functions. To compare our spline based method with the penalized method in Ma & Kosorok (25a), there are four obvious advantages of our method. First, the computational cost of our spline estimate Ĥ is much less expensive than that used in Ma & Kosorok (25a), i.e. the cumulative sum diagram approach. This is because the number of basis B-splines, i.e., K, is often taken much smaller than the sample size n, thus the dimension of the estimation problem is greatly reduced. Secondly, our estimate of the transformation function H is smooth with a higher convergence rate. We obtain a narrower confidence interval for H shown in the right panel of Figure 4. Thirdly, we can obtain an explicit consistent estimate Î. However, the block jackknife approach proposed in Ma & Kosorok (25a) is not theoretically justified. At last, we do not require the constrained optimization in our implementations. 16

17 h 1 (w 1 ) h 2 (w 2 ).5 H(u) w w u Figure 6: Estimates of h 1, h 2 and H. The dotted lines are true functions. Table 2: The estimates and their corresponding estimated standard errors for the parametric part for the calcification data ESD: Estimated standard error extreme value distribution logistic distribution ˆβ ESD( ˆβ 1 ) ˆβ ESD( ˆβ 2 ) Application: Calcification data We illustrate the proposed method in a dataset from the calcification study. Yu et al. (21) investigated the calcification of intraocular lenses, which is an infrequently reported complication of cataract treatment. Understanding the effect of some clinical variables on the time to calcification of the lenses after implantation is the objective of the study. The patients were examined by an ophthalmologist to determine the status of calcification at a random time ranging from zero to thirty six months after implantation of the intraocular lenses. The severity of calcification was graded into five categories ranging from zero to four. In our analysis, we simply treat those with severity > 1 as calcified and those with severity 1 as not calcified. This dataset can be treated as the current status dataset because only the examination time and the calcification status at examination are available. The covariates of interest include Z 1 incision length, Z 2 gender ( 17

18 h 1 H Age/ Time Figure 7: The spline estimates of h(w) and H(v) under two different assumptions of the error distribution: extreme value distribution (solid) and logistic distribution (small dashes). for female and 1 for male), and W age at implantation/1. The original dataset has 379 records. We remove the one record with missing measurement, resulting the sample size n = 378. This dataset has been studied by Xue et al. (24), Lam & Xue (25), and Ma (29). Xue et al. (24) and Lam & Xue (25) modeled the event time by the log-transformation. A straightforward estimation of the hazard function is not available. Ma (29) used the cure model to fit the data, and assumed a generalized linear model for the cure probability. For subjects not cured, the linear and partly linear Cox proportional hazards models are used to model the survival risk. We fit this dateset using the semiparametric additive transformation model. We assume the error distribution F to be one of the two distributions: extreme value distribution and logistic distribution. We approximate h and log Ḣ by quadratic splines. The optimal choices of knots for h and log Ḣ are 6 and 5, respectively. The estimates and their corresponding estimated standard errors for the parametric part are summarized in Table 2. The estimates for h(w) based on different error distributions are displayed in the left panel of Figure 7, and the estimates of H(v) are plotted in the right panel of Figure 7. The analysis shows very similar results for these two error distributions. From Table 2, both incision length and gender are insignificant at the 5% level of significance. From the left panel of Figure 7, h(w) increases steadily from age 5, achieving a peak at age 6, decreasing gradually thereafter, which means that patients ages around 6 tend to enjoy a longer time to calcification. The estimated transformation function Ĥ in the right panel of Figure 7 displays a nonlinear behavior and it shows that the transformation is necessary. We can incorporate an unknown scale parameter into to the residual error distribution F ( ) to further improve the above analysis. Our general B-spline estimation framework can also handle 18

19 this type of transformation models easily. Acknowledgement The first author s research is supported by the National Science Foundation under grant DMS The second author s research is supported by the National Science Foundation under grant CMMI and DMS The authors would like to thank Professor Alexis K. F. Yu for providing the Calcification data and Professor Donglin Zeng for helpful discussions. The authors also would like to thank the editor, the associate editor, and two reviewers for their helpful comments and suggestions which led to a much improved presentation. Appendix Some useful Lemmas We define ɛ-covering number (ɛ-bracketing number) as N(ɛ, A, d) (N B (ɛ, A, d)). The corresponding ɛ-entropy (ɛ-bracketing entropy) is defined as H(ɛ, A, d) = log N(ɛ, A, d) (H B (ɛ, A, d) = log N B (ɛ, A, d)). Define G n (δ ; ) = {g : g(v) = γ B (v) satisfying g δ and H jn (δ j ; ) = {h j : h j (w j ) = γ j B j (w j ) satisfying h j δ j and 1 h j(w j )dw j =. Obviously, G n (c ; ) = G n and H jn (c j ; ) = H jn. Lemma 1 follows from the B-spline approximation property (6). Lemma 2 is directly implied by Lemma 2.5 in (Van de Geer, 2). Lemma 4 is adapted from Proposition 1 in (Cheng & Huang, 21). LEMMA 1. There exist g n G n and h jn H jn such that g n g K r, (A.1) H n H = O(K r ), (A.2) h jn h j K r j j, (A.3) ( ) h jn h j = O max,...,d {K r j j, (A.4) where H n (v) = v l v exp(g n (s))ds. LEMMA 2. for 1 j d. H(ɛ, G n (δ ; ), ) < K log(1 + 4δ /ɛ), (A.5) H(ɛ, H jn (δ j ; ), ) < K j log(1 + 4δ j /ɛ) (A.6) 19

20 LEMMA 3. Let h = (h 1,..., h d ). Define K = {ζ(β, h, H) : β B, h d H jn, g G n, where the form of ζ is defined in (A.12). We have sup G n ζ = O P ( max ζ K j=,1,...,d {K1/2 j ). (A.7) Proof: Define l (β, h, H) = δf (β z + d h j(w j ) + H(v)) + (1 δ)[1 F (β z + d h j(w j ) + H(v))]. The construction of l ( ) implies that l (β, h n, H n ) l (β, h, H ) = O( max j=,1,...,d {K r j j ) (A.8) based on (A.2), (A.4) and M4. Thus, l (β, h n, H n ) is bounded away from zero for sufficiently large n. For any β 1, β 2 B, h 1, h 2 d H jn and g 1, g 2 G n, we have ζ(β 1, h 1, H 1 ) ζ(β 2, h 2, H 2 ) < l (β 1, h 1, H 1 ) l (β 2, h 2, H 2 ) < β 1 β 2 + h 1j h 2j + g 1 g 2. (A.9) The first and second inequalities in the above follow from the fact that l (β, h n, H n ) is strictly positive for sufficiently large n by (A.8), and Condition M4(a), respectively. As shown in (A.9), the functions in the class K are Lipschitz continuous in (β, h, g). Therefore, by combining Lemma 2 and Theorem in (Van de Geer & Wellner, 1996), we obtain that H B (ɛ, K, L 2 (P )) < max j d {K j log(1 + M/ɛ), where M = max j d {4c j. In the end, we apply Lemma in (Van de Geer & Wellner, 1996) to this uniformly bounded class of functions K to obtain (A.7). LEMMA 4. Suppose the following Conditions (B1)-(B3) hold. B1. P n l β = o P (n 1/2 ), P n lĝ[ā ] = o P (n 1/2 ) and P n lĥj [ b j ] = o P (n 1/2 ); B2. sup {α:d(α,α ) C 1 n r/(2r+1) G n ( l β (X; α) l β (X; α )) = o P (1); B3. P ( l β (X; α) l β (X; α )) = Ĩ(β β ) + o( β β ) + o(n 1/2 ) for α satisfying d(α, α ) C 1 n r/(2r+1). 2

21 If α is consistent and Ĩ is invertible, then we have n( β β ) = 1 n n i=1 Ĩ 1 l β (X i ) + o P (1) d N(, Ĩ 1 ). LEMMA 5. (i) If a(s, t) = a(s 1, s 2, t) H r c(s 1 S 2 T ) in t relative to s 1 and s 2, then S 1 a(s 1, s 2, t) ds 1 H r c (S 2 T ) in t relative to s 2. (ii) If a(s, t), b(s, t) H r c(s T ) in t relative to s, then c(s, t) a(s, t)b(s, t) H r c (S T ) in t relative to s. (iii) If a(s, t) H r c(s T ) in t relative to s and f( ) C β, then f(a(s, t)) H r c (S T ) in t relative to s. Proof: Let r be the largest integer smaller than r. Denote the m-th derivative of a(s, t) w.r.t. t as Dt m a(s, t) for m =, 1,..., r. (i) Note that Dt m a(s 1, s 2, t) is bounded for m r, by the dominated convergence theorem, we can take derivative inside the integral to obtain ( ) Dt m a(s 1, s 2, t) ds 1 = Dt m a(s 1, s 2, t) ds 1, S 1 S 1 which implies that Dt m ( S 1 a(s 1, s 2, t) ds 1 ) is bounded for m r. Using this and the fact that r ) r ( ) D t ( S 1 a(s 1, s 2, t 2 ) ds 1 D t S 1 a(s 1, s 2, t 1 ) ds 1 t 2 t 1 r r S 1 sup s 1,s 2 sup t 1 t 2 D r t a(s 1, s 2, t 2 ) D mα t a(s 1, s 2, t 1 ) t 2 t 1 r r ds 1 c <, for all s 2 and t 1 t 2, we conclude that S 1 a(s 1, s 2, t) ds 1 H r c (S 2 T ) in t relative to s 2 for some c <. (ii) The result is true because Dt m c = DtaD i j t b i+j=m is bounded for m r. Also we note that for i < r, It can then be easily verified that D i ta(s, t 2 ) D i ta(s, t 1 ) t 2 t 1 r r = sup s D sup t 1 t 2 r t t2 a(s, t) dt. t 2 t 1 r r t 1 Dt i+1 c(s, t 2 ) D r t c(s, t 1 ) t 2 t 1 r r <. 21

22 (iii) When < α 1, the result follows from the observation that f(a(s, t 2 )) f(a(s, t 1 )) = f(a(s, t 2)) f(a(s, t 1 )) a(s, t 2) a(s, t 1 ). t 2 t 1 β a(s, t 2 ) a(s, t 1 ) t 2 t 1 β Using the chain rule, the above observation and part (ii) of the lemma, the desired result can be obtained by induction for general β. Denote S k (X; α, w k ) = [ l β (X; α)] k l g [a k ](X; α) l hj [b jk ](X; α), where w k = (a k, b 1k,..., b dk ). Let W n = G n d H jn and N = {α A : d(α, α ) = o(1). LEMMA 6. Under Conditions M1-M7 & P1-P2, we have for all α N and k = 1,..., l. E sup w k W n S k (X; α, w k ) S k (X; α, w k ) 2 < d 2 (α, α ) Proof: In view of (12)-(14), we can bound the left hand side of (A.1) by { [ V ] 2 < Q θ Q θ E sup (exp(g(s)) exp(g (s)))a k (s)ds (Q θ Q θ) 2 a k G n l v [ V ] 2 +E sup a k G n l v [ V exp(g (s))a k (s)ds(q θ Q θ ) ] 2 +E sup a k G n (exp(g(s)) exp(g (s)))a k (s)dsq θ + E l v [ sup b 2 jk (Q θ Q θ ) 2] b jk H jn (A.1) after some algebra. The compactness of G n and H jn imply that the third and fifth term in the above are both of the order Q θ Q θ 2 2. For the second term, we can further bound it by [ V V ] E sup a 2 k(s)ds [exp(g(s)) exp(g (s))] 2 ds(q θ Q θ ) 2. a k G n l V l V Considering the compactness of G and G n, we know the second term is also of the order Q θ Q θ 2 2. Assumption M4(a) together with Cauchy-Schwartz inequality implies that Q θ Q θ 2 2 < β β 2 + H H d (h j h j ) 2 2. Since we assume that the density for W is bounded away from zero and infinity, we have that d (h j h j ) 2 2 < d h j h j 2 2 considering the identifiability condition 1 h j(w j )dw j =. Assumption M7 implies that the fourth term is of the order H H 2 2. Considering the form of d(α, α ), we conclude the whole proof. 22

23 Proof of Theorem 1 We show the estimation consistency (9) by first establishing { 2 P ( β β ) Z + (ĥj h jn )(W j ) + Ĥ(V ) H n(v ) = o P (1). (A.11) Combining (A.11) with the identifiability Condition M3, we directly obtain ( β β ) = o P (1) { d 2 which, in turn, implies that P (ĥj h jn )(W j ) + Ĥ(V ) H n(v ) = op (1). Considering the assumption M2(b) and that 1 h j(w j )dw j = for h j H j H jn, we can further show d ĥj h jn 2 + Ĥ H n 2 = o P (1). The spline approximation result (A.2) and (A.3) conclude the proof of (9). In the below, we will show (A.11) to complete the proof of (9). Recall that h = (h 1,..., h d ). Denote h, h n and ĥ as the corresponding true value, B-spline approximation and sieve estimate, respectively. Recall that l (β, h n, H n ) is bounded away from zero for sufficiently large n as implied by (A.8). Then, by the definition of α, we have P n log{l ( β, ĥ, Ĥ)/l (β, h n, H n ), which implies that, by the inequality that α log(x) log(1 + α(x 1)) for any x > and α (, 1), [ { ] l ( β, ĥ, Ĥ) P n log 1 + α l (β, h n, H n ) 1 P n ζ( β, ĥ, Ĥ). (A.12) Lemma 3 implies that (P n P )ζ( β, ĥ, Ĥ) = o P (1) since K j /n = o(1) for any j =, 1,..., d. Thus, P ζ( β, ĥ, Ĥ) o P (1) based on (A.12). Let U n (X) = l ( β, ĥ, Ĥ)/l (β, h n, H n ). Based on (A.8) we know P U n (X) = 1 + o P (1), which further implies P ζ( β, ĥ, Ĥ) o P (1) by the concavity of s log(s). This in turn implies that P ζ( β, ĥ, Ĥ) = o P (1). This forces P (β Z + d h jn(w j ) + H n (V )) ( β Z + d ĥj(w j ) + Ĥ(V )) = o P (1) by the strict concavity of s log s, Conditions M4(a), P1 and P2. It is easy to verify that ERn 2 = o P (1) if E R n = o P (1). Thus, we have shown (A.11) in the end. As for the convergence rate results (1) & (11), we first apply Theorem in Van de Geer & Wellner (1996) to establish θ θ 2 = O P (δ 1n δ 2n ), (A.13) where θ is the plug-in sieve estimate of θ and δ 1n = max j d { K j / n and δ 2n = max j d {K r j j. (A.14) 23

24 Following similar arguments in proving the consistency, we know that (A.13) implies (1) and (11) by choosing K j n 1/(2r j+1). In the below, we show (A.13) by verifying the conditions of Theorem in Van de Geer & Wellner (1996). We first need to show that P [l(α ) l(α)] > θ θ 2 2 (A.15) for every α in the neighborhood of α. Define q(δ, t) = δ log(f (t)) + (1 δ) log(1 F (t)) and q(δ, t) as its second derivative w.r.t. t. Since α maximizes α P l(α), we have [ q(δ, P [l(α ) l(α)] = P θ) ] (θ θ ) 2, 2 where θ is on the line segment between θ and θ. The compactness of the parameter spaces imply that P [l(α ) l(α)] θ θ 2 2. This completes the proof of (A.15). We next calculate the order of E sup θ θ 2 δ G n (l(α) l(α )) as a function of δ, denoted as φ n (δ), by the use of Lemma of Van de Geer & Wellner (1996). Let F 1n (δ) = {l(α) l(α ) : g G n, h j H jn, θ θ 2 δ. Using the same argument as that in the proof of Lemma 3, we obtain that H B (ɛ, F 1n (δ), L 2 (P )) is bounded by C max j d {K j log(1 + δ/ɛ). This leads to J B (δ, F 1n (δ), L 2 (P )) = δ 1 + HB (ɛ, F 1n (δ), L 2 (P ))dɛ C max j d { K j δ. The compactness of G n and H jn implies the uniform boundedness of any f F 1n (δ). Thus, Lemma of Van de Geer & Wellner (1996) gives φ n (δ) = max j d { K j δ + max j d {K j/ n. By solving δ1n 2 φ n (δ 1n ) n, we get the form of δ 1n in (A.14). We next show that P n l( α) P n l(α ) O P (δ2n). 2 The definition of α implies that P n [l( α) l(α )] A n + B n, where A n = (P n P ){l(β, H n, h n ) l(α ) and B n = P {l(β, H n, h n ) l(α ). A straightforward Taylor expansion gives { A n = (P n P ) l 2 (β, H n, h n )(H n H ) + l 2+j (β, H n, h n )(h jn h j ), where l t is the Fréchet derivative of l(β, H n, h n ) w.r.t. the t-th argument. Considering (A.2), (A.3) and the fact that < ɛ 1 q(δ, t) ɛ 2 < for t in some compacta of R 1, we have { l2 (β, H n, h n )(H n H ) + d l 2+j (β, H n, h 2 n )(h jn h j ) P max j d {K r (A.16) j j n ɛ 24

25 for any ɛ >. Let F 2n = {l(β, H, h) l(α ) : g G n, h j H jn, g g C K r, h j h j C j K r j j. Similar analysis in Lemma 3 show that the bracketing entropy integral (in terms of L 2 (P )) for F 2n is finite, thus yields that F 2n is P-Donsker. Combining this P-Donsker result and (A.16), we use Corollary of Van de Geer & Wellner (1996) to conclude that nan /(max j d {K r j j n ɛ ) = o P (1). By choosing some proper < ɛ < 1/2 satisfying n ɛ 1/2 = max j d {K r j j O(max j d {K 2r j j thus concludes the whole proof., we have A n = o P (max j d {K 2r j j ). We can also show B n ) by similar analysis of (A.15). This gives the form of δ 2n in (A.14), and Proof of Theorem 2 We apply Lemma 4 to prove this theorem by checking their Conditions B1 B3. To facilitate the understanding, we first sketch the verification of Condition B1 and then provide the details. To verify B1, we first know that P n l β = since β maximizes l(β, ĝ, ĥ1,..., ĥd), β is consistent and β is an interior point of B. We next show that b jk (a k ) belongs to Hr j c j [, 1] (H r c [l v, u v ]) for some < c j < and j =, 1,..., d such that there exists a b jkn H jn (a kn G n) satisfying b jk b jkn = O(n r j/(2r j +1) ) (A.17) a kn a k = O(n r /(2r +1) ) (A.18) by (6) and the assumption that K j n 1/(2r j+1). Since P n lĥj [b jkn ] = and P n lĝ[a kn ] = for any b jkn H jn and a kn G n, it remains to show P n { lĥj [b jkn ] lĥj [b jk ] = o P (n 1/2 ), (A.19) P n { lĝ[a kn ] lĝ[a k ] = o P (n 1/2 ) (A.2) for verifying Condition B1. Now we show b jk Hr j c j [, 1] and (A.19). Following the analysis in Page 2282 of Ma & Kosorok (25a), we can write, with ā I (v) = v l v exp(g (s))ā (s)ds, b j = Π j D(v, w) Π j ā I (v) Π j b i i j = Π j D(v, w) uv l v ā I (v)sf(v, w j)dv i j 1 b i (w i)t f(w i, w j )dw i. According to Lemma 5 and dominated convergence theorem, we know that b jk (w j) H r j c j [, 1] under Condition M5, b jk L 2(w j ) and a k L 2(H) (thus a Ik is uniformly bounded) for some 25

26 < c j <. As for (A.19), we first decompose its left hand side as I 1n + I 2n, where { I 1n = P lĥj [b jkn b jk ] l hj [b jkn b jk ], { I 2n = (P n P ) lĥj [b jkn b jk ]. By Cauchy-Schwartz Inequality, we have I < 1n b kjn b kj θ θ 2 based on Conditions M4(a), P1 & P2. Thus, (A.13) and (A.17) imply that I 1n = O P (n 2r/(2r+1) ) = o P (n 1/2 ) since r > 1/2. To show I 2n = o P (n 1/2 ), we need to make use of Lemma in Van de Geer & Wellner (1996). We first construct the following class of functions: I n = { f θ,bjkn (x) = l hj [b jkn b jk ](x; α) : α A n(n r 2r+1 ) and bjkn H jn ( n r j 2r j +1 ), where A n (δ) {α A n : d(α, α ) C 1 δ and H jn(δ) {b jkn H jn : b jkn b jk C 2 δ for some < C 1, C 2 <. Let Θ n (δ) = {β z + H(v) + d h j(w j ) : α A n (δ). It is easy to verify that, for every x, f θ1,b jkn1 (x) f θ2,b jkn2 (x) < θ 1 θ 2 + b jkn1 b jkn2, (A.21) where θ j Θ n (n r/(2r+1) ) for j = 1, 2. Let θ 1,..., θ N(ɛ,Θn(n r/(2r+1) ), ) and b 1 jkn,..., b N(ɛ,H jn (n r j /(2r j +1) ), ) jkn be the ɛ-cover for Θ n (n r/(2r+1) ) and H jn(n r j/(2r j +1) ), respectively. Thus, we can construct the bracket [f θ i,b l jkn 2Cɛ, f θ i,b l jkn + 2Cɛ] covering I n. The bracket size is 4Cɛ. Hence, we obtain < H B (ɛ, I n, L 2 (P X )) H(ɛ/(4C), Θ n (n r 2r+1 ), ) + H(ɛ/(4C), H jn(n 2r j +1 ), ) max {K j log(1 + n r/(2r+1) /ɛ) j d r j based on Lemma 2. The corresponding δ-bracketing entropy integral is calculated as J B (δ, I n, L 2 (P X )) δ 1 + HB (ɛ, I n, L 2 (P X )) < max { K j n r 4r+2 δ 1/2. (A.22) j d Now, it is ready to apply Lemma in Van de Geer & Wellner (1996) to show E G n In = o(1) implying I 2n = o P (n 1/2 ). Note that f 2 < b jkn bjk 2 and f b jkn bjk for any f I n, and thus δ and M in Lemma of Van de Geer & Wellner (1996) are both 26

27 chosen as K r j j, i.e., n r j/(2r j +1). Then, by Lemma of Van de Geer & Wellner (1996) and (A.22), we have that E G n In = O ( ( ) n r 1 4r+2 + r j 4r j +2 n 4r 1 4r+2 ) = o(1). This completes the proof of (A.19). We next show (A.2) by similar arguments. Similarly, we have ā I (v) = Π ad(v, w) 1 b j (w j)uf(w j, v)dw j. Recall that ā I (v) = v l v exp(g (s))ā (s)ds. Under Condition M6 and the assumption that g H r c [l v, u v ], we can show that a Ik Hr +1 c [l v, u v ], which implies that a k Hr c [l v, u v ] for some < c <, based on Lemma 5. We next show that I 1n = o P (n 1/2 ) and I 2n = o P (n 1/2 ), where { I 1n = P lĝ[a kn a k ] l g [a kn a k ], { I 2n = (P n P ) lĝ[a kn a k ]. Similarly, by Cauchy-Schwartz Inequality, we can show that [ v ] I 1n < a kn a k θ θ 2 + P (exp(ĝ) exp(g ))(s)(a kn a k )(s)ds l v ) a kn a k ( θ θ 2 + Ĥ H 2 < < O P (n r/(2r+1) ) = o P (n 1/2 ) by choosing K j n 1/(2rj+1). Following similar arguments in analyzing I 2n, we can show that I 2n = o P (n 1/2 ). Thus, we have verified Condition B1 in Lemma 4. We again apply Lemma of Van de Geer & Wellner (1996) to verify Assumption B2. The details are skipped due to the similarity of the previous analysis. It remains to verify Assumption B3. This can be easily established using the Taylor expansion in Banach space. However, we first need to reparameterize the efficient score function l β (X; α) as [ ] V l β (X; α ) = ZQ θ (X) ā (s)dh(s) + b j (W j) Q θ (X) l v l β (X; α ) l η [ c ](X; α ), 27

28 where α = (β, H, h 1,..., h d ), η = (H, h 1,..., h d ) and c = (ā, b 1,..., b d ). We first derive two useful equalities (A.26)-(A.27). Let E α be the expectation corresponding to the reparametrized likelihood under the parameter α. Since E α lβ (X; α ) =, we have t t=e α t lβ (X; α t ) =, (A.23) where αt = α + tɛ. Define l β,β and l β,η [c] as the first derivative of l β w.r.t. β and η (along the direction c), respectively. By setting ɛ = (ɛ β,,..., ) and ɛ = (, e) = (, H, b 1,..., b d ), respectively, some calculations reveal that E { lβ,β (X; α)ɛ β + E { lβ (X; α) l β(x; α)ɛ β =, (A.24) E { lβ,η [e](x; α ) + E { lβ (X; α ) l η[e](x; α ) = (A.25) based on (A.23). By considering the orthogonal property of l β and the above reparametrization, we obtain the following two useful facts: Ĩ = E { lβ,β (X; α), (A.26) E { lβ,η [e](x; α) = (A.27) based on (A.24) and (A.25). Define l β,α,α [h 1, h 2 ](X; α ) as the second order Fréchet derivative of l β w.r.t. α along the direction [h 1, h 2 ] at the point α. The same notation rule applies to l β,α,α [h 1, h 2 ](X; α ) and l η,α,α [h 1, h 2, h 3 ](X; α ). Now we are ready to express the Taylor expansion as follows. E[ l β (X; α) l β (X; α )] = E[ l β (X; α ) l β (X; α)] = E { lβ,β (X; α) (β β ) + E { lβ,η [η η ](X; α) + 1 { lβ,α 2 E,α [ α, α ](X; α ) = Ĩ(β β ) + 1 { 2 E lβ,α,α [ α, α ](X; α ) l η,α,α [ c, α, α ](X; α ), where α = α α and α lies between α and α. The last equation in the above follows from (A.26) & (A.27). Now we only need to show that the second term in the last equation is of the order o( β β ) + o(n 1/2 ). 28

CURE MODEL WITH CURRENT STATUS DATA

CURE MODEL WITH CURRENT STATUS DATA Statistica Sinica 19 (2009), 233-249 CURE MODEL WITH CURRENT STATUS DATA Shuangge Ma Yale University Abstract: Current status data arise when only random censoring time and event status at censoring are

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

5.1 Approximation of convex functions

5.1 Approximation of convex functions Chapter 5 Convexity Very rough draft 5.1 Approximation of convex functions Convexity::approximation bound Expand Convexity simplifies arguments from Chapter 3. Reasons: local minima are global minima and

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data

asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data asymptotic normality of nonparametric M-estimators with applications to hypothesis testing for panel count data Xingqiu Zhao and Ying Zhang The Hong Kong Polytechnic University and Indiana University Abstract:

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models

A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models A Least-Squares Approach to Consistent Information Estimation in Semiparametric Models Jian Huang Department of Statistics University of Iowa Ying Zhang and Lei Hua Department of Biostatistics University

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Estimation of Time Transformation Models with. Bernstein Polynomials

Estimation of Time Transformation Models with. Bernstein Polynomials Estimation of Time Transformation Models with Bernstein Polynomials Alexander C. McLain and Sujit K. Ghosh June 12, 2009 Institute of Statistics Mimeo Series #2625 Abstract Time transformation models assume

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links. 14.385 Fall, 2007 Nonlinear Econometrics Lecture 2. Theory: Consistency for Extremum Estimators Modeling: Probit, Logit, and Other Links. 1 Example: Binary Choice Models. The latent outcome is defined

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA

A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Statistica Sinica 23 (2013), 383-408 doi:http://dx.doi.org/10.5705/ss.2011.151 A FRAILTY MODEL APPROACH FOR REGRESSION ANALYSIS OF BIVARIATE INTERVAL-CENSORED SURVIVAL DATA Chi-Chung Wen and Yi-Hau Chen

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology

NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Step-Stress Models and Associated Inference

Step-Stress Models and Associated Inference Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated

More information

Euler Equations: local existence

Euler Equations: local existence Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data International Mathematical Forum, 3, 2008, no. 33, 1643-1654 Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data A. Al-khedhairi Department of Statistics and O.R. Faculty

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

INDEX. Bolzano-Weierstrass theorem, for sequences, boundary points, bounded functions, 142 bounded sets, 42 43

INDEX. Bolzano-Weierstrass theorem, for sequences, boundary points, bounded functions, 142 bounded sets, 42 43 INDEX Abel s identity, 131 Abel s test, 131 132 Abel s theorem, 463 464 absolute convergence, 113 114 implication of conditional convergence, 114 absolute value, 7 reverse triangle inequality, 9 triangle

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 11, 2006 Abstract The behavior of maximum likelihood estimates (MLEs) and the likelihood ratio

More information

Nonparametric Identification and Estimation of a Transformation Model

Nonparametric Identification and Estimation of a Transformation Model Nonparametric and of a Transformation Model Hidehiko Ichimura and Sokbae Lee University of Tokyo and Seoul National University 15 February, 2012 Outline 1. The Model and Motivation 2. 3. Consistency 4.

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

PENALIZED LOG-LIKELIHOOD ESTIMATION FOR PARTLY LINEAR TRANSFORMATION MODELS WITH CURRENT STATUS DATA 1

PENALIZED LOG-LIKELIHOOD ESTIMATION FOR PARTLY LINEAR TRANSFORMATION MODELS WITH CURRENT STATUS DATA 1 The Annals of Statistics 2005, Vol. 33, No. 5, 2256 2290 DOI: 10.1214/009053605000000444 Institute of Mathematical Statistics, 2005 PENALIZED LOG-LIKELIHOOD ESTIMATION FOR PARTLY LINEAR TRANSFORMATION

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Calculus of Variations. Final Examination

Calculus of Variations. Final Examination Université Paris-Saclay M AMS and Optimization January 18th, 018 Calculus of Variations Final Examination Duration : 3h ; all kind of paper documents (notes, books...) are authorized. The total score of

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information