Additional proofs and details

Size: px

Start display at page:

Download "Additional proofs and details"

Sheena Powell
5 years ago
Views:

1 Statistica Sinica: Supplement NONPARAMETRIC MODEL CHECKS OF SINGLE-INDEX ASSUMPTIONS Samuel Maistre 1,2,3 and Valentin Patilea 1 1 CREST (Ensai) 2 Université de Lyon 3 Université de Strasbourg Supplementary Material This supplementary material contains additional proofs, details and technical lemmas. S1 Additional proofs and details S1.1 Proof of Lemma A.1 Lemma A.1. Let (U 1, Z 1, W 1 ) and (U 2, Z 2, W 2 ) be two independent draws of (U, Z, W ). Let K( ) be a bounded, even, integrable function with positive, integrable Fourier transform. Assume E( Uw(Z) 2 H ) <, Then for any h > 0, E [U Z, W = 0 a.s. I(h) = 0.

2 SAMUEL MAISTRE AND VALENTIN PATILEA Moreover, if P (E [U Z, W = 0) < 1, then inf I(h) > 0. h (0,1 Proof of Lemma A.1. Using the Inverse Fourier Transform and the independence of (U 1, Z 1, W 1 ) and (U 2, Z 2, W 2 ), I(h) = E [ U 1, U 2 H w(z 1 )w(z 2 )h q K((Z 1 Z 2 )/h)ϕ(w 1 W 2 ) [ = E U 1, U 2 H w(z 1 )w(z 2 ) e 2πiv (Z 1 Z 2 ) F[K(vh)dv e 2πis (W 1 W 2 ) F[ϕ(s)ds R q R r [ = E E[U Z, W w(z)e 2πi{v Z+s W } 2 F[K(vh)F[ϕ(s)dvds. R q R r H Clearly, for any h > 0, I(h) = 0 whenever E[U Z, W = 0 a.s. For the reverse implication, since F[ϕ, F[K > 0, and w( ) > 0, for any h > 0, one can deduce [ E E[U Z, W w(z)e 2πi{v Z+s W } = 0, v R q, s R r. Then necessarily E[U Z, W w(z) = 0 a.s., and thus E[U Z, W = 0 a.s. In the case P (E [U W, X = 0) < 1, by the Lebesgue Dominated Con-

3 S1. ADDITIONAL PROOFS AND DETAILS vergence Theorem, the map h I(h) is continuous on (0, 1 and lim I(h) = F [K(0) h 0 R q [ E E[U Z, W w(z)e 2πi{v Z+s W } 2 R r H F [ϕ (s) dvds. Since the nonnegative valued map (v, s) [ E E[U Z, W w(z)e 2πi{v Z+s W } 2 H is continuous, and non identically equal to 0 whenever E [U Z, W 0, and F [K (0), F [ψ ( ) > 0, lim h 0 I(h) is necessarily positive and I(h) is bounded away from zero on the interval (0, 1. S1.2 Some details on equation (4.4) Consider a sequence of alternatives Y = m(z(β 0 )) + r n δ(z(β 0 ), W (β 0 )) + ε, n 1, with E(ε X) = 0 a.s., and δ( ) satisfying the conditions (4.3). In the following, we show that for each n, β 0 is solution of the equation β E [ {Y r β (Z(β))} 2 = 0.

4 SAMUEL MAISTRE AND VALENTIN PATILEA Under suitable conditions on the second order derivative with respect to β, this justifies the equation (4.4). For this purpose, we want to differentiate r β (Z(β)). Let Y 0 = m(z(β 0 ))+ε and define r 0 β(z(β)) = E [ Y 0 Z(β) = E [m(z(β 0 )) Z(β). Moreover, let δ β (Z(β)) = E [δ(z(β 0 ), W (β 0 )) Z(β), and notice r β (Z(β)) = r 0 β(z(β)) + r n δ β (Z(β)), and that, by the first condition in equation (4.3), δ β0 (Z(β 0 )) = 0. Next, by standard results from single-index regression models applied to the response Y 0 (see, for instance, Horowitz (2009) chapter 2), one has β r0 β(z(β)) = m (Z(β 0 )){X E[X Z(β 0 )}. β=β0 On the other hand, by the standard variance decomposition formula, E [ {δ(z(β 0 ), W (β 0 )) δ β (Z(β))} 2 E[δ 2 (Z(β 0 ), W (β 0 )) = E [ {δ(z(β 0 ), W (β 0 )) δ β0 (Z(β 0 ))} 2,

5 S1. ADDITIONAL PROOFS AND DETAILS and thus one can deduce that [ E δ(z(β 0 ), W (β 0 )) β δ β(z(β)) = 0. (S1.1) β=β0 Using this identity and the second condition in equation (4.3), under suitable technical conditions, for any n, one can deduce β E [ [ {Y r β (Z(β))} 2 = 2E {Y r β0 (Z(β 0 ))} β=β0 β r β(z(β)) [ { = 2E {r n δ(z(β 0 ), W (β 0 )) + ε} β r0 β(z(β)) + β=β0 β r nδ β (Z(β)) = 2r n E [δ(z(β 0 ), W (β 0 ))m (Z(β 0 )){X E[X Z(β 0 )} [ 2rnE 2 δ(z(β 0 ), W (β 0 )) β δ β(z(β)) β=β0 = 0. β=β0 β=β0 } Let us end this part with a comment on the condition (4.3). In general, in the case of sequence of alternatives, by standard tools for deriving asymptotic results, one can prove ˆβ β = O P (n 1/2 ) for some β B that could depend on n. The value β tends to β 0 at a rate depending on the rate the alternative hypotheses approaches the null hypothesis. Following a common choice in the literature, herein we want to simplify the presentation and focus on the case where β does not depend on n. For this purpose, we impose

6 SAMUEL MAISTRE AND VALENTIN PATILEA some orthogonality conditions on the function δ( ) such that β = β 0 when the index is estimated by semiparametric least-squares. See, for instance, the second condition of equation (3.11) in Guerre and Lavergne (2005) for a similar condition. S1.3 Some details on the proof of Proposition 1 Herein, we provide detailed justifications of the claim that the norm of any column of A (β) A ( β) is bounded by c β β. Firstly, recall that B {1} R p 1 or B { γ 1 γ : γ = (γ 1,..., γ p ) R p, γ 1 > 0}. Then the norm of every β from the parameter space B is larger or equal to 1. Moreover, if β B and B(β, r) R d is the ball centered at β of radius 0 < r < 1/2, then inf β B(β,r) B β, β β β 1 β > 0. (S1.2) Indeed, for any β B(β, r), β β r and thus β β r 1 r.

7 S1. ADDITIONAL PROOFS AND DETAILS Then, since for any β B(β, r), β β 2 r 2, we have β, β = 1 2 { β 2 + β 2 β β 2} β 2 2. Finally, divide by β β to derive the inequality (S1.2). Now, consider {v 1,..., v p 1 }, a basis in the orthogonal subspace {β}. Then for any β B(β, r), the set of vectors {v 1,..., v p 1 } {β} is a basis in R p. Indeed, equation (S1.2) implies that none of β B(β, r) could be spanned by {v 1,..., v p 1 }. Finally, if A (β) is built by the Gram-Schmidt process applied to {v 1,..., v p 1 } {β}, as described at the beginning of the proof of Proposition 1, and the vectors {v 1,..., v p 1 } are orthogonal, then the norm of any column of A (β) A ( β) is bounded by c β β for some c depending only on the initial p 1 independent vectors. The fact that {v 1,..., v p 1 } are orthogonal it is not a real constraint since, starting from an arbitrary basis in {β}, one could first apply an orthogonalization process with that basis. We consider the case p = 3, the case of larger p could be derived similarly. Let β B(β, 1/2) and consider v, w two linearly independent vectors from the space {β}. The Gram-Schmidt process transforms the

8 SAMUEL MAISTRE AND VALENTIN PATILEA basis {β, v, w} as follows: u 1 = β, e 1 = e 1 (β) = u 1 u 1, u 2 = v v, e 1 e 1, e 2 = e 2 (β) = u 2 u 2, u 3 = w w, e 1 e 1 w, e 2 e 2, e 3 = e 3 (β) = u 3 u 3. Since the matrix A(β) is build with the columns e 2 (β) and e 3 (β), it remains to check that Lipschitz condition for e 2 (β) and e 3 (β) as functions of β. First, note that e 1 (β) e 1 (β) = β β β β β β 1 β β β + 1 β β β c 1 β β, with c 1 = 2/ β. In particular, e1 (β) e 1 (β) c1 β β. Next, since v {β} = {e 1 (β)}, v, e 1 (β) = v, e 1 (β) + v, e 1 (β) e 1 (β) = v, e 1 (β) e 1 (β),

9 S1. ADDITIONAL PROOFS AND DETAILS and thus v, e 1 (β) v e 1 (β) e 1 (β) c 1 v β β c 1 r v. Moreover, β B(β, r) u 2 (β) 2 = v 2 v, e 1 (β) 2 (1 c 2 1r 2 ) v 2 and u2 (β) u 2 (β) u2 (β) 2 u 2 (β) 2 = u 2 (β) + u 2 (β) v, e1 (β) v, e 1 (β) v, e1 (β) + v, e 1 (β) 2(1 c 2 1r 2 ) 1/2 v v e 1(β) e 1 (β) 2 v 2(1 c 2 1r 2 ) 1/2 v c 1 v β β. (1 c 2 1r 2 ) 1/2

10 SAMUEL MAISTRE AND VALENTIN PATILEA Then e 2 (β) e 2 (β) = u 2 (β) {v v, e 1 (β) e 1 (β)} u 2 (β) {v v, e 1 (β) e 1 (β)} u 2 (β) u 2 (β) u 2 (β) u 2 (β) v u 2 (β) u 2 (β) 1 + v, e1 (β) e 1 (β) v, e 1 (β) e 1 (β) u 2 (β) v, e1 (β) e 1 (β) + u 2 (β) u 2 (β) u 2 (β) c 2 β β, where c 2 is a positive constant that depends only on r, β and v. Smaller r is fixed, larger the constant c 2 could be taken. Finally, to get the Lipschitz condition for the map β e 3 (β), we first need to bound from below u 3 (β). Using the orthogonality between e 1 (β) and e 2 (β), we get u 3 (β) 2 = w 2 w, e 1 (β) 2 w, e 2 (β) 2. Since w {β} = {e 1 (β)} and v and w are orthogonal, w, e 1 (β) w e 1 (β) e 1 (β) c 1 w β β

11 S1. ADDITIONAL PROOFS AND DETAILS and w, e 2 (β) w v, e 1 (β) e 1 (β) v, e 1 (β) e 1 (β) c 2 w β β. Deduce that β B(β, r), u 3 (β) 2 (1 c 2 1r 2 c 2 2r 2 ) w 2. The Lipschitz condition for the map β e 3 (β) follows after repeatedly applying the triangle inequality. S1.4 Proof of Proposition 2 Proposition 2. Suppose the conditions in Assumption 1 in the Appendix are met and the null hypothesis (2.2) holds true. Consider β n such that β n β 0 = O P (n 1/2 ). Then nh 1/2 I {l} n H 0, and (β n ) /ˆω n {l} (β n ) N (0, 1) in law under {l} [ˆω n (β 0 ) 2 [ ω {l} (β 0 ) 2 = 2 K 2 (u) du Γ 2 (s, t) ds dt [ E fβ 4 0 (z) ϕ 2 (W 1 (β 0 ) W 2 (β 0 )) π β0 (z W 1 (β 0 )) π β0 (z W 2 (β 0 )) dz,

12 SAMUEL MAISTRE AND VALENTIN PATILEA in probability, where π β0 ( w) is the conditional density of Z(β 0 ) knowing that W (β 0 ) = w, and for t, s [0, 1, Γ (s, t) = E [ϵ (s) ϵ (t), ϵ (t) = 1{Φ(Y ) t} P[Φ(Y ) t X β 0. Proof of Proposition 2. Let us consider the simplified notation from equation (7.3) and further simplify in the case β = β 0 and write L ij = L ij (β 0, g), K ij = K ij (β 0, h), and ϕ ij = ϕ(w i (β 0 ) W j (β 0 )). Notice that (S1.3) I {l} n (β 0 ) = 1 n (n 1) h 1 i j n { (ri r i ) ( ; β 0 ), (r j r j ) ( ; β 0 ) L 2 + ϵ i ( ), ϵ j ( ) L 2 + ϵ i ( ), ϵ j ( ) L ϵ i ( ), (r j r j ) ( ; β 0 ) L 2 2 ϵ i ( ), (r j r j ) ( ; β 0 ) L 2 2 ϵ i ( ), ϵ j ( ) L 2} ˆfβ0,i ˆf β0,jk ij ϕ ij = I 1 (β 0 ) + I 2 (β 0 ) + I 3 (β 0 ) + 2I 4 (β 0 ) 2I 5 (β 0 ) 2I 6 (β 0 )

13 S1. ADDITIONAL PROOFS AND DETAILS with ˆf β,i = 1 (n 1) g L ik (β), r i (t; β) = P [ Y i Φ 1 (t) X iβ, k i r i (t; β) = 1 (n 1) g ˆf r k (t; β) L ik (β) β,i k i and ϵ i ( ) is defined as r i (t; β) by replacing r i (t; β) by ϵ i ( ). This decomposition of I n {l} (β 0 ) is given by the identity U i ω (Z i ) ( ; β 0 ) = [r i ( ; β 0 ) r i ( ; β 0 ) + ϵ i ( ) ϵ i ( ) ˆf β0,i. The terms I 1 (β 0 ) and I 3 (β 0 ) are treated in Lemmas 6 and 7 in Section S2. For I 2 (β 0 ), let us introduce ω 2 n (β) = 2 n (n 1) h n i=1 j i Γ 2 (s, t) ds dt ˆf 2 β,i ˆf 2 β,jk 2 ij (β) ϕ 2 ij (β). Proposition 3 below ensures that nh 1/2 ωn 1 (β 0 ) I 2 (β 0 ) N (0, 1) in law. The terms I 4 (β 0 ), I 5 (β 0 ) and I 6 (β 0 ) can be shown to be negligible in a similar way as I 1 (β 0 ) and I 3 (β 0 ). Lemma 9 shows that ω 2 n (β 0 ) [ ω {l} (β 0 ) 2, in probability, with ω {l} (β 0 ) > 0 and thus I j (β 0 ) /ω n (β 0 ) is of the same order as I j (β 0 ) for j {1, 3, 4, 5, 6}. Finally, it is easy to check that ω n (β 0 ) ˆω {l} n (β 0 ) = o P (1). By Proposition 1, one can replace β 0 by β n,

14 SAMUEL MAISTRE AND VALENTIN PATILEA an estimator of β 0 that converges in probability with the rate O P (n 1/2 ), and have nh 1/2 I {l} n (β n ) /ˆω {l} n (β n ) nh 1/2 I n {l} (β 0 ) /ˆω n {l} (β 0 ) = o P (1). Then the result of the Proposition 2 follows. Proposition 3. Under the conditions of Proposition 2, nh 1/2 ω 1 n (β 0 ) I 2 (β 0 ) N (0, 1) in law. Proof. {S n,m, F n,m, 1 m n, n 1} is a martingale array with S n,1 = 0 and with S n,m (β 0 ) = m G n,i (β 0 ) i=1 G n,i (β 0 ) = 2h p/2 ϵ i ( ) ω n (n 1) h ˆf β0,i, i 1 ϵ j ( ) ˆf β0,jk ij ϕ ij and F n,m is the σ-field generated by {X 1,..., X n, Y 1,..., Y m }. Thus j=1 L 2 nh 1/2 ω 1 n (β 0 ) I 2 (β 0 ) = S n,n (β 0 ).

15 S2. TECHNICAL LEMMAS From Lemma 8 and the nesting of the σ-fields F n,i F n+1,i for 1 i n, n 1, we have that the martingale array satisfies Corollary 3.1 of Hall and Heyde (1980) and the result follows. S2 Technical lemmas In the following results the kernels L and K are supposed to satisfy the conditions of Assumption 1-(f). Lemma 1. Assume that E[exp(a X ) < for some a > 0. Consider that g 0 and ng 4/3 / log n. For any t [0, 1 let Y k (t), 1 k n, be an i.i.d. random variables like in the proof of Proposition 1 such that E[sup t Y k (t) a < for some a > 8. Moreover, assume that the maps v E[ Y k (t) X β = vf β(v), v R, t [0, 1, are uniformly Lipschitz (the Lipschitz constant does not depend on t). Then max sup 1 i n t [0,1 β B n 1 sup Y k (t) 1 [ Lik (β) L ik ( n 1 g β) = O P (n 1/2 g 1/2 log 1/2 n+b n ). k i Moreover, max sup 1 i n t [0,1 β B n 1 sup {Y k (t) E[Y k (t) X k n 1 β} 1 g L ik( β) = O P(n 1/2 g 1/2 log 1/2 n). k i Proof of Lemma 1. Recall that Y i (t) Y i (in the case of SIM for mean re-

16 SAMUEL MAISTRE AND VALENTIN PATILEA gression) or Y i (t) = 1{Y i Φ 1 (t)} (for the case of single-index assumption on the conditional law), and r i (t; β) = E[Y i (t) Z( β), t [0, 1. For any t [0, 1 we decompose 1 ng k i Y k (t)l ik (β) = 1 ng n {Y k (t)l ((X i X k ) β/g) E [Y (t)l ((X i X) β/g) X i } k=1 +E [ Y (t)g 1 L ((X i X) β/g) X i n 1 g 1 L(0)Y i (t) = Σ 1ni (β, t) + Σ 2ni (β, t) n 1 g 1 L(0)Y i (t). The moment condition on Y guarantees that max 1 i n sup t Y i (t) = o P (n b ) for some 0 < b < 1/8. This and the fact that ng 4/3 / log n make that max 1 i n sup t n 1 g 1 Y i (t) = o P (n 1/2 g 1/2 log 1/2 n). On the other hand, by Lemma 5, max sup 1 i n t [0,1 sup Σ2ni (β, t) Σ 2ni ( β, t) = OP (b n ). β B n It remains to uniformly bound Σ 1ni (β, t) and for this purpose we use empirical process tools. Let us introduce some notation. Let G be a class of functions of the observations with envelope function G and let δ J(δ, G, L 2 ) = sup 1 + log N(ε G 2, G, L 2 (Q))dε, 0 < δ 1, Q 0 denote the uniform entropy integral, where the supremum is taken over all

17 S2. TECHNICAL LEMMAS finitely discrete probability distributions Q on the space of the observations, and G 2 denotes the norm of G in L 2 (Q). Let Z 1,, Z n be a sample of independent observations and let G n g = 1 n n γ(z i ), γ G, i=1 be the empirical process indexed by G. If the covering number N(ε, G, L 2 (Q)) is of polynomial order in 1/ε, there exists a constant c > 0 such that J(δ, G, L 2 ) cδ log(1/δ) for 0 < δ < 1/2. Now if Eγ 2 < δ 2 EG 2 for every γ and some 0 < δ < 1, and EG (4υ 2)/(υ 1) < for some υ > 1, under mild additional measurability conditions that are satisfied in our context, Theorem 3.1 of van der Vaart and Wellner (2011) implies sup G n γ = J(δ, G, L 2 ) G ( 1 + J(δ1/υ, G, L 2 ) δ 2 n G 2 1/υ ) υ/(2υ 1) (4υ 2)/(υ 1) G G 2 1/υ 2 O P (1), 2 (S2.4) where G 2 2 = EG 2 and the O P (1) term is independent of n. Note that the family G could change with n, as soon as the envelope is the same for all n. We apply this result to the family of functions G = {γ( ; β, w, t) γ( ; β, w, t) : t [0, 1, β B, w R} where γ(y, X; β, w, t) = Y (t)l((x β w)g 1 ))

18 SAMUEL MAISTRE AND VALENTIN PATILEA for a sequence g that converges to zero and the envelope G(Y, X) = sup Y (t) sup L(w). t [0,1 w R Its entropy number is of polynomial order in 1/ε, independently of n, as L( ) is of bounded variation and the families of indicator functions have polynomial complexity, see for instance van der Vaart (1998). Now for any γ G, Eγ 2 CgEG 2, for some constant C. Let δ = g 1/2, so that Eγ 2 C δ 2 EG 2, for some constant C and υ = 3/2, which corresponds to EG 8 < that is guaranteed by our assumptions. Thus the bound in (S2.4) yields sup G 1 g n G nγ = log1/2 (n) 3/4 [1 + n 1/2 g 2/3 log (n) 1/2 OP (1), ng where the O P (1) term is independent of n. Since ng 4/3 / log n, max sup 1 i n t [0,1 sup Σ 1ni (β, t) Σ 1ni ( β, t) = O P (n 1/2 g 1/2 log 1/2 n). β B n The second part of the statement is now obvious.

19 S2. TECHNICAL LEMMAS Lemma 2. Assume that the density f β( ) is Lipschitz. Then 1 1 max 1 i n n 1 g L ik( β) f β(x i β) = O P(n 1/2 g 1/2 log 1/2 n + g). k i Proof of Lemma 2. We can write 1 1 n 1 g L ik( β) f β(x β) i = 1 n k i n { g 1 L ik ( β) E[g 1 L ik ( β) X i } k=1 +E[g 1 L ik ( β) X i f β(x i β) + O(n 1 g 1 ). By the empirical process arguments used in Lemma 1, the sum on the righthand side of the display is of rate O P (n 1/2 g 1/2 log 1/2 n) uniformly with respect to i. The Lipschitz property of f β and the fact that vl(v) dv < guarantee that max 1 i n E[g 1 L ik ( β) X i f β(x i β) Cg for some constant C. Lemma 3. For any t [0, 1 let Y k (t), 1 k n, be an independent sample from a random variable Y (t) defined like in the proof of Proposition 1. Let r(v; t, β) = E[Y (t) X β = v, v R, and assume that r( ; t, β)

20 SAMUEL MAISTRE AND VALENTIN PATILEA is twice differentiable and the second derivative is bounded by a constant independent of t. If r (v; t, β) is the first derivative of r( ; t, β), then, for any t [0, 1, 1 n 1 {r(x i β; t, β) r(x k β; t, β)} 1 g L ik( β) = r (X i β; t, β)gd 1,ni +g 2 D 1,ni (t), k i where max 1 i n D 1,ni = n 1/2 g 1/2 log 1/2 n and max 1 i n sup t [0,1 D 1,ni (t) = O P (1). Proof of Lemma 3. By Taylor expansion 1 n 1 {r(x i β; t, β) r(x k β; t, β)} 1 g L ik( β) = r (X i β; t, β) 1 n k i + 1 n n k=1 n r (x ik (t); t, β)[(x i X k ) 2 β 1 g L ik( β), k=1 (X i X k ) β 1 g L ik( β) where r stands for the second derivative with respect to v and x ik (t) is a point between X i β and X k β. Since L( ) is symmetric, by the empirical process arguments as in Lemma 1 1 max 1 i n n n k=1 (X i X k ) β g 1 g L ik( β) = O P(n 1/2 g 1/2 log 1/2 n). The result follows taking absolute values in the last sum in the last display,

21 S2. TECHNICAL LEMMAS using the boudedness of r and the fact that 1 max 1 i n n n k=1 [(X i X k ) 2 β 1 g 2 g L ik( β) f β(x i β) R v 2 L(v) dv = o P(1). Lemma 4. Assume that E[exp(a X ) < for some a > 0. Moreover the kernels K and L are of bounded variation, differentiable except at most a finite set of points, and uk(u) du <. Let B R n be a subset in the parameter space such that the event defined in equation (7.2) with b n 0 and b n n 1/2 / log n has probability tending to 1. Let K 12 (β) = K((X 1 X 2 ) β/h), L 12 (β) = L((X 1 X 2 ) β/g) and ϕ(β) = ϕ((x 1 X 2 ) A(β)). If the density f β is Lipschitz with constant C 1, β, then there exists a constant C depending only on K, L, f β and C 1, β such that { [ P E sup K12 (β)ϕ 12 (β) K 12 ( β)ϕ 12 ( β) } X1 Cb n h 1/2 1, (S2.5) b B n

22 SAMUEL MAISTRE AND VALENTIN PATILEA [ E sup K12 (β)ϕ 12 (β) K 12 ( β)ϕ 12 ( β) Cb n h 1/2, b B n (S2.6) { [ P E sup L12 (β) L 12 ( β) } 2 X 1 Cb n g 1 1 b B n (S2.7) and { [ P E sup L13 (β) L 13 ( β) } 2 K 12 ( β) 2 X 2, X 3 Chb n g 1 1, b B n (S2.8) [ E sup L 13 (β) L 13 ( β) 2 K 12 ( β) 2 ϕ 2 12( β) Chb n g 1, b B n (S2.9) In Lemma 4 we provide different bounds for L( ) and K( ) because the bandwidths g and h have to satisfy the condition h/g 2 0. Hence we need less restrictive conditions on the range of h if we want to allow for a larger domain for the pair (g, h). Proof of Lemma 4. Since the univariate kernel K is of bounded kernels, let K 1 and K 2 non decreasing bounded functions such that K = K 1 K 2 and denote K 1h = K 1 ( /h). Clearly, it is sufficient to prove the result

23 S2. TECHNICAL LEMMAS with K 1. Similar arguments apply for K 2 and hence we get the results for K. For simpler writings we assume that K is differentiable and let K 1 (x) = x [K (t) + dt and K 2 (x) = x [K (t) dt, x R. Here [K + (resp. [K ) denotes the positive (resp. negative) part of K. The general case where a finite set of nondifferentiability is allowed can be handled with obvious modifications. Let K 1h (t) = K 1 (t/h) and recall that Z i (β) = X iβ. Note that exp( t 2 ) exp( s 2 ) 2 t s. For any β B n and an elementary event in the set C n = {max 1 i n X i c log n} E n for some large constant c, K1h (Z 1 (β) Z 2 (β)) ϕ 12 (β) K 1h ( Z1 ( β) Z 2 ( β) ) ϕ 12 ( β) 2 b n K 1h (Z 1 ( β) Z 2 ( β) + 2b n ) + [K 1h (Z 1 ( β) Z 2 ( β) + 2b n ) K 1h ( Z1 ( β) Z 2 ( β) 2b n ) ϕ12 ( β). The upper bound on the left-hand side is uniform with respect to β. By a suitable change of variable and since the density f β is bounded, it is easy to check that E [ K 1h ( Z1 ( β) Z 2 ( β) + 2b n ) Z1 ( β) is bounded by a constant times h. Next, note that since nh, there exists a constant C independent of n such that on the set C n we have

24 SAMUEL MAISTRE AND VALENTIN PATILEA Z 1 ( β) Z 2 ( β) ± 2b n /h C h 1/2. Then, applying twice a change of variables and using the Lipschitz property of f β, on the set C n, E [ ( K1h Z1 ( β) Z 2 ( β) ) ( + 2b n K1h Z1 ( β) Z 2 ( β) ) 2b n 1{Cn } Z 1 ( β) h K 1 (u) f β(2b n + Z 1 ( β) uh) f β( 2b n + Z 1 ( β) uh) du [ C /h 1/2, C /h 1/2 h sup t R f β(2b n + t) f β( 2b n + t) [ C /h 1/2, C /h 1/2 K 1 (u)du Ch 1/2 b n, for some constant C > 0. Since by a suitable choice of c the probability of 1{C n } given Z 1 ( β) could be made smaller than any fixed negative power of n, and the probability of the event { Z 1 ( β) c log n} could be also made very small, the bound in the last display implies the statement (S2.5). For the statement (S2.6) it suffices to take expectation. For the bound in equation (S2.7), recall that L(t) = L( t ) for any t R so that we can consider only nonnegative t. Moreover, without loss of generality we can consider L nonnegative and decreasing on [0, ), otherwise, since L is of bounded variation, it could be written an the difference of two nonnegative decreasing functions on [0, ). Moreover, let Z 13 (β) = Z 1 (β) Z 3 (β) and L g,13 (β) = L(Z 13 (β)/g). We split the problem in two cases: Z 13 (β) Z 13 ( β) and Z 13 (β) > Z 13 ( β). Then, for β B n

25 S2. TECHNICAL LEMMAS and on the set C n we have Lg,13 (β) L g,13 ( β) 1{Z13 (β) Z 13 ( β)} [L(0) L g,13 ( β)1{z 13 (β) Z 13 ( β), Z 13 ( β) 2b n } + [L g,13 (β) L g,13 ( β)1{z 13 (β) Z 13 ( β), Z 13 ( β) 2b n } Cb 2 ng 2 1{Z 13 ( β) 2b n } + [ L((Z 13 ( β) 2b n )/g) L(Z 13 ( β)/g) 1{Z 13 (β) Z 13 ( β), Z 13 ( β) 2b n } = Cb 2 ng 2 1{Z 13 ( β) 2b n } + A n and Lg,13 (β) L g,13 ( β) 1{Z13 (β) > Z 13 ( β)} [ L(Z 13 ( β)/g) L((Z 13 ( β) + 2b n )/g) 1{Z 13 (β) > Z 13 ( β)} = B n,

26 SAMUEL MAISTRE AND VALENTIN PATILEA for some constant C. Let us notice that A n + B n [L([Z 13 ( β) 2b n /g) L([Z 13 ( β) + 2b n /g)1{z 13 ( β) 2b n } + [L([Z 13 ( β)/g) L([Z 13 ( β) + 2b n /g)1{0 Z 13 ( β) 2b n } [L([Z 13 ( β) 2b n /g) L([Z 13 ( β) + 2b n /g)1{z 13 ( β) 2b n } + Cb 2 ng 2 1{Z 13 ( β) 2b n } [L([Z 13 ( β) 2b n /g) L([Z 13 ( β) + 2b n /g) + 2Cb 2 ng 2 1{Z 13 ( β) 2b n } = D n + 2Cb 2 ng 2 1{Z 13 ( β) 2b n }. On the other hand, 0 D n 4b n g 1 L ( Z) where Z is some value such that Z Z 13 ( β) 2b n g 1. Since, for some constant c, L (v) c v in a neighborhood of the origin, D n 4b n g 1 L (Z 13 ( β) + C b 2 ng 2, for some constant C. Since L is bounded, deduce that L g,13 (β) L g,13 ( β) 2 is bounded by Cb 2 ng 1 g 1 L (Z 13 ( β) + o(b 2 ng 1 ) for some constant C. Take conditional expectation given X 1, that is the same with the conditional expectation given Z 1 (β), and deduce the bound in equation (S2.7).

27 S2. TECHNICAL LEMMAS On the set of events C n, sup L13 (β) L 13 ( β) K12 ( β) {D n +3Cb 2 ng 2 1{Z 13 ( β) 2b n }} K 12 ( β). β B n Take conditional expectation and use standard change of variables to derive the bound in equation (S2.8). Take expectation and remember that ϕ 12 ( β) is bounded to derive the moment bound in equation (S2.9). Lemma 5. Under the conditions of Lemma 1 sup sup max t [0,1 β B n 1 i n Σ2ni (β, t) Σ 2ni ( β, t) = OP (b n ). Proof of Lemma 5. We can write Σ2ni (β, t) Σ 2ni ( β, t) E [ Y (t) g 1 L ((X i X) β/g) g 1 L ( (X i X) β/g ) Xi = E [ E { Y (t) X} g 1 L ((Xi X) β/g) L ( (X i X) β/g ) Xi. Now, we can apply the monotonicity argument we used in Lemma 4 and deduce the bound. Lemma 6. Under the conditions of Proposition 2, I 1 (β 0 ) = o P ( n 1 h 1/2).

28 SAMUEL MAISTRE AND VALENTIN PATILEA Proof of Lemma 6. With the notation defined in equation (S1.3) we have I 1 (β 0 ) = 1 n (n 1) 3 g 2 h n (r i r k ) ( ; β 0 ), (r j r l ) ( ; β 0 ) L 2 L ik L jl K ij ϕ ij i=1 j i k i l j and if we denote by I 1,1 (β 0 ) the term where i, j, k and l are all different, then E [I 1,1 (β 0 ) = (n 2) (n 3) (n 1) 2 g 2 h E [ E [(r i r k ) ( ; β 0 ) L ik Z i (β 0 ), E [(r j r l ) ( ; β 0 ) L jl Z j (β 0 ) L 2 K ij ϕ ij = O ( g 4 ) as soon as g 1 E [(r i r k ) (t; β 0 ) L ik (β 0 ) Z i (β 0 ) = O(g 2 )D (t; Z i (β 0 )) with D( ) bounded, which is guaranteed by Assumption 1-(c). When i, j, k and l take no more than 3 different values, the number of terms is reduced by a factor n, and thus we have that E [I 1,2 (β 0 ) = O (n 1 g 1 ) = o ( n 1 h 1/2). Similar reasoning can be applied to prove that E [I1 2 (β 0 ) = o (n 2 h 1 ). See also Proposition A.1. in Fan and Li (1996). Lemma 7. Under the conditions of Proposition 2, I 3 (β 0 ) = o P ( n 1 h 1/2).

29 S2. TECHNICAL LEMMAS Proof of Lemma 7. Write I 3 (β 0 ) = 1 n (n 1) 3 g 2 h = 1 n (n 1) 3 g 2 h n ϵ k ( ), ϵ l ( ) L 2 L ik L jl K ij ϕ ij i=1 j i k i l j n ϵ k ( ), ϵ l ( ) L 2 L ik L jl K ij ϕ ij i=1 1 + n (n 1) 3 g 2 h 1 + n (n 1) 3 g 2 h j i k i l j,k n ϵ k ( ) 2 L L 2 ik L ji K ij ϕ ij i=1 j i k i,j n ϵ j ( ) 2 L L 2 ij L ji K ij ϕ ij i=1 j i = I 3,1 (β 0 ) + I 3,2 (β 0 ) + I 3,3 (β 0 ). Then E [I 3,1 (β 0 ) = 1 (n 1) 2 g 2 h E [ ϵ 1 ( ), ϵ 2 ( ) L 2 L 2 12K 12 ϕ 12 = O ( n 2 g 2) E [ ϵ1 ( ), ϵ 2 ( ) L 2 h 1 K 12 = O ( n 2 g 2), E [I 3,2 (β 0 ) = O (n 1 g 1 ) and E [I 3,3 (β 0 ) = O (n 2 g 2 ), thus E [I 3 (β 0 ) = o ( n 1 h 1/2). By quite straightforward but tedious calculations, it can be proved that E [I 2 3 (β 0 ) = o (n 2 h 1 ) and the rate of I 3 (β 0 ) follows.

30 SAMUEL MAISTRE AND VALENTIN PATILEA Lemma 8. Under the conditions of Proposition 2, V 2 n (β 0 ) = n E [ G 2 n,i (β 0 ) F n,i 1 1, i=2 in probability, and the martingale difference array {G n,i, F n,i, 1 i n} satisfies the conditional Lindeberg condition ε > 0, n E [ G 2 n,ii ( G n,i > ε) F n,i 1 0, i=2 in probability. Proof of Lemma 8. First, decompose Vn 2 4 n ( i 1 ) (β 0 ) = ωn 2 (n 1) 2 Γ (s, t) ˆf β 2 h 0,i ϵ j (s) ˆf β0,jk ij ϕ ij i=2 j=1 4 n i 1 = ωn 2 (n 1) 2 h i=2 j=1 Γ (s, t) ˆf β 2 0,iϵ j (s) ϵ j (t) ˆf β 2 0,jKijϕ 2 2 ijds dt 8 n i 1 j 1 + ωn 2 (n 1) 2 h Γ (s, t) ˆf β 2 0,iϵ j (s) ϵ k (t) ˆf ˆf β0,j β0,kkijϕ 2 2 ijds dt i=3 j=2 k=1 = A n (β 0 ) + B n (β 0 ). (S2.10)

31 S2. TECHNICAL LEMMAS We have E [A n (β 0 ) = E [E [A n (β 0 ) X 1,..., X n [ 2n = E Γ (s, t) ωn 2 (β 0 ) (n 1) h ˆf β 2 0,iE [ϵ j (s) ϵ j (t) ˆf β 2 0,jKijϕ 2 2 ijds dt = n n 1 n 1. Moreover, Var (A n (β 0 )) 64 ϕ 4 (n 1) 4 h ϕ 4 (n 1) 4 h ϕ 4 (n 1) 4 h 2 n i 1 j 1 i=3 = o ( n 1 h 1/2), j=2 j =1 n i 1 i 1 i=3 i =2 j=1 [ E ωn 2 (β 0 ) ˆf β 4 ˆf 0,i β 2 ˆf 0,j β 2 0,j K2 ijkij 2 [ E n i 1 i 1 i=3 i =2 j=1 [ E Γ 2 (s, t) Γ 2 (u, v) dsdtdudv ω 2 n (β 0 ) ˆf 2 β 0,i ˆf 2 β 0,i ˆf 4 β 0,jK 2 ijk 2 i j Γ (s, t) Γ (u, v) G (s, t, u, v) dsdtdudv ω 2 n (β 0 ) ˆf 4 β 0,i ˆf 4 β 0,jK 4 ij Γ (s, t) Γ (u, v) G (s, t, u, v) dsdtdudv where G (s, t, u, v) = E [ϵ (s) ϵ (t) ϵ (u) ϵ (v). The decomposition of E [B 2 n involves the same type of terms and is therefore also of rate o ( n 1 h 1/2), so that the convergence of V 2 n (β 0 ) is met.

32 SAMUEL MAISTRE AND VALENTIN PATILEA For the conditional Lindeberg condition, we have ε > 0, n 1 and 1 < i n E [ E [ G 4 G 2 n,i F n,i 1 n,ii ( G n,i > ε) F n,i 1. ε 2 Then n E [ G 2 n,ii ( G n,i > ε) F n,i 1 i=2 1 ε 2 n E [ G 4 n,i F n,i 1 i=2 1 ε 2 16 (n 1) 4 h 2 n i=2 G (s 1, s 2, s 3, s 4 ) ˆf 4 β 0,i 4 i 1 k=1 j k =1 ϵ jk (s k ) ˆf β0,j k K ijk ϕ ijk ds k. The expectation of the last majorant is of rate O ( n 1) G (s 1, s 2, s 3, s 4 ) Γ (s 1, s 2 ) Γ (s 3, s 4 ) ds 1 ds 2 ds 3 ds 4 [ E ˆf 4 ˆf β 0,i β 2 ˆf 0,j β 2 0,j h 1 Kijh 2 1 Kij 2 ϕ2 ijϕ 2 ij + O ( n 2 h 1) n G 2 (s 1, s 2, s 3, s 4 ) ds 1 ds 2 ds 3 ds 4 i=2 [ E ˆf 4 ˆf β 0,i β 4 0,jh 1 Kijϕ 4 4 ij =o ( n 1 h 1/2).

33 S2. TECHNICAL LEMMAS Lemma 9. Under the conditions of Proposition 2, ω 2 n (β 0 ) ω 2 (β 0 ) > 0, in probability. Proof of Lemma 9. We have E [ ωn 2 (β 0 ) [ = 2E ˆfβ,i ˆfβ,j h 1 Kij 2 (β) ϕ 2 ij (β) Γ 2 (s, t) ds dt. On the other hand, [ E ˆfβ,i ˆfβ,j h 1 Kijϕ 2 2 ij [ = 1 g 2 h E L ik L jl L ik L jl h 1 Kijϕ 2 2 ij k i l j k i l j [ 1 = g 2 h (n 1) 2 E L ik L jl L ik L jl h 1 Kijϕ 2 2 ij k i l j k i l j [ 1 = g 2 h (n 1) 4 E L ik L jl L ik L jl h 1 Kijϕ 2 2 ij + o ( n 1 h 1/2) k i l j k i l j (n 1) 3 = (n 2) (n 3) (n 4) ω2 n (β 0 ) + o ( n 1 h 1/2)

34 SAMUEL MAISTRE AND VALENTIN PATILEA where [ ( 1 ω n 2 (β 0 ) = E g L zi z k g 1 h K2 ( zi z j h ) ( 1 g L zj z l g ) ϕ ij ) 1 g L ( zi z k g ) ( 1 g L zj z l g ) f β0 (z k ) f β0 (z l ) f β0 (z k ) f β0 (z l ) [ = E π β0 (z i W i (β 0 )) π β0 (z j W j (β 0 )) dz i dz j dz k dz l dz k dz l f β0 (z i gs 1 ) f β0 (z i gs 2 ) f β0 (z j gt 1 ) f β0 (z j gt 2 ) π β0 (z i W i (β 0 )) π β0 (z j W j (β 0 )) ϕ ij L (s 1 ) L (t 1 ) L (s 2 ) L (t 2 ) 1 ( ) zi z j h K2 dz i dz j ds 1 dt 1 ds 2 dt 2 h [ = E f β0 (z i gs 1 ) f β0 (z i gs 2 ) f β0 (z i gu gt 1 ) f β0 (z i gu gt 2 ) [ E π β0 (z i W i (β 0 )) π β0 (z i gu W j (β 0 )) ϕ ij L (s 1 ) L (t 1 ) L (s 2 ) L (t 2 ) K 2 (u) dz i ds 1 dt 1 ds 2 dt 2 du fβ 4 0 (z) π β0 (z W i (β 0 )) π β0 (z W j (β 0 )) ϕ ij dz K 2 (u) du where the limit is obtained by standard arguments, using uniform continuity of f β0 ( ) and π β0 ( w).

35 REFERENCES References Fan, Y. and Li, Q. (1996). Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms. Econometrica 64, Hall, P. and Heyde, C. C. (1980). Martingale limit theory and its application, New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers, probab. Math. Statist. van der Vaart, A. W. (1998). Asymptotic statistics, vol. 3 of Camb. Ser. Stat. Probab. Math., Cambridge: Cambridge University Press. van der Vaart, A. W. and Wellner, J. A. (2011). A local maximal inequality under uniform entropy. Electron. J. Stat. 5,

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University