Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang
|
|
- Sheena Berry
- 5 years ago
- Views:
Transcription
1 Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang The supplementary material contains some additional simulation results in Section 8 and proofs of Theorems 3-6 in Section 9. For the sake of readership and completeness, we also provide a brief description of the model setting. 8 Additional Simulation Results 8.1 Example 1 We adopt the simple linear model from Fan and Lv (2008): Y = 5X 1 + 5X 2 + 5X 3 + ϵ. The predictor vector (X 1,..., X p ) is drawn from a multivariate normal distribution N(0, Σ) whose covariance matrix Σ = (σ ij ) p p has entries σ ii = 1, i = 1,..., p, and σ ij = ρ, i j. The error term ϵ is independently generated from the standard normal distribution. We consider several different combinations of (p, n, ρ), i.e., p = 100, 1000, n = 20, 50, 70 and ρ = 0, 0.1, 0.5,
2 Table 11: P a for Example 1 with d = 5, 10, 15 and p = 1000 d p n Method Results for the following values of ρ: ρ = 0 ρ = 0.1 ρ = 0.5 ρ = SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS
3 Table 12: P a for Example 1 with d = n under several SNRs (signal noise ratios) SNR p n Method Results for the following values of ρ: ρ = 0 ρ = 0.1 ρ = 0.5 ρ = 0.9 SNR= SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SNR= SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SNR= SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS
4 Table 13: P a for Example 1 with p = 3000 and d = n/log(n) p n Method Results for the following values of ρ: ρ = 0 ρ = 0.1 ρ = 0.5 ρ = SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS SIS DC-SIS MDC-SIS SIRS
5 8.2 Example 2 In this example, we consider two nonlinear additive models, which have been analyzed in Meier, Geer, and Bühlmann (2009) and Fan, Feng and Song (2011). Let g 1 (x) = x, g 2 (x) = (2x 1) 2, g 3 (x) = sin(2πx)/(2 sin(2πx)), and g 4 (x) = 0.1sin(2πx) + 0.2cos(2πx) + 0.3sin 2 (2πx) + 0.4cos 3 (2πx) + 0.5sin 3 (2πx). The following cases are studied: Case 2.a: Y = 5g 1 (X 1 ) + 3g 2 (X 2 ) + 4g 3 (X 3 ) + 6g 4 (X 4 ) ϵ, where the covariates X j, j = 1,..., p are simulated according to iid Unif(0,1), and ϵ is independent from the covariates and follows the standard normal distribution. Case 2.b: The covariates and the error term are simulated as in Case 2a, but the model structure is more involved with 8 additional predictor variables. Y = g 1 (X 1 ) + g 2 (X 2 )+g 3 (X 3 )+g 4 (X 4 )+1.5g 1 (X 5 )+1.5g 2 (X 6 )+1.5g 3 (X 7 )+1.5g 4 (X 8 )+2g 1 (X 9 )+ 2g 2 (X 10 ) + 2g 3 (X 11 ) + 2g 4 (X 12 ) ϵ. Table 14: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Example 2 with p = 2000 and n = 200. Method 5% 25% 50% 75% 95% SIS DC-SIS a MDC-SIS NIS SIRS SIS DC-SIS b MDC-SIS NIS SIRS
6 Table 15: The proportions of P s and P a for Example 2 with d = n/logn, p = 2000 and n = 200. Method X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 ALL SIS DC-SIS a MDC-SIS NIS SIRS SIS DC-SIS b MDC-SIS NIS SIRS Example 4 This example consists of three cases: Case 4.a Y = X X X X X 5 + exp(x 20 + X 21 + X 22 ) ϵ. Case 4.b Y = X sin( X 2 )+0.6 exp( X 3 )+0.4X X 5 +exp(x 20 +X 21 +X 22 ) ϵ. Case 4.c Y = X 1 X X X X 5 + exp(x 20 + X 21 + X 22 ) ϵ. In the above models, the error ϵ N(0, 1) and is independent from the covariates. The predictor vector follows the multivariate normal distribution with the correlation structure described in Example 1 but with ρ = 0.8. All models in this example are heteroscedastic with the number of active variables being 5 at the median (i.e., τ = 0.5) but 8 for other τs. Case 4.a is adapted from an example used in Zhu et al. (2011). Cases 4.b and 4.c are modified versions of Case 4.a by including nonlinear structure and interaction terms. We report S, P s and P a with d = n/logn for all three methods in Tables 6 and 7. Tables below report the minimum model size and the proportions of P s and P a for Cases 4a, 4b and 4c with varying degree of signal-to-noise ratios. 46
7 Table 16: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Case 4.a with different SNR Settings τ Method 5% 25% 50% 75% 95% Case 4.a (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.a (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
8 Table 17: The proportions of P s and P a for Case 4.a with d = n/logn and different SNR Settings τ Method X 1 X 2 X 3 X 4 X 5 X 20 X 21 X 22 ALL Case 4.a (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.a (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
9 Table 18: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Case 4.b with different SNR Settings τ Method 5% 25% 50% 75% 95% Case 4.b (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.b (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
10 Table 19: The proportions of P s and P a for Case 4.b with d = n/logn and different SNR Settings τ Method X 1 X 2 X 3 X 4 X 5 X 20 X 21 X 22 ALL Case 4.b (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.b (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
11 Table 20: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Case 4.c with different SNR Settings τ Method 5% 25% 50% 75% 95% Case 4.c (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.c (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
12 Table 21: The proportions of P s and P a for Case 4.c with d = n/logn and different SNR Settings τ Method X 1 X 2 X 3 X 4 X 5 X 20 X 21 X 22 ALL Case 4.c (c = 0.5) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS Case 4.c (c = 2) SISQ MDC-SISQ QaSIS n=100 SISQ MDC-SISQ QaSIS DC-SIS SIRS SISQ MDC-SISQ QaSIS n=200 SISQ MDC-SISQ QaSIS DC-SIS SIRS
13 8.4 Examples in He, Wang and Hong (2013) Example HWH1: (additive model, n=400, p=1000). This example is adapted from Fan et al. (2011). Let g 1 (x) = x, g 2 (x) = (2x 1) 2, g 3 (x) = sin(2πx)/(2 sin(2πx)), and g 4 (x) = 0.1sin(2πx) + 0.2cos(2πx) + 0.3sin(2πx) cos(2πx) sin(2πx) 3. The following cases are studied: Case 1a: Y = 5g 1 (X 1 )+3g 2 (X 2 )+4g 3 (X 3 )+6g 4 (X 4 )+ (1.74)ϵ, where the vector of covariates X is generated from the multivariate normal distribution N(0,Σ) with σ ij = ρ i j. In Case 1a, we consider ρ = 0; Case 1b: same as Case 1a except that ρ = 0.8; Case 1c: same as Case 1b except that ϵ Cauchy. Example HWH2: (index model, n=200,p=2000). This example is adapted from Zhu et al. (2011). The random data are generated from Y = 2(X X X X X 5 ) + exp(x 20 + X 21 + X 22 ) ϵ. This model is heteroscedastic: the number of active variables is 5 at the median but 8 elsewhere. Example HWH3: (a more complex structure, n=400,p=5000). Case 3a: Y = 2(X X 2 2) + exp((x 1 + X 2 + X 18 + X X 30 )/10) ϵ, where ϵ N(0, 1), and X follows the multivariate normal distribution with the correlation structure described in Case 1b. In this case, the number of active variables is 2 at the median but is 15 elsewhere. Case 3b: Same as Case 3a, but with 2(X X 2 2) replaced by 2((X 1 + 1) 2 + (X 2 + 2) 2 ). 53
14 Table 22: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Example HWH1 Case τ Method 5% 25% 50% 75% 95% SISQ MDC-SISQ QaSIS Case 1.a SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS SISQ MDC-SISQ QaSIS Case 1.b SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS SISQ MDC-SISQ QaSIS Case 1.c SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS
15 Table 23: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Example HWH2 τ Method 5% 25% 50% 75% 95% SISQ MDC-SISQ QaSIS SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS
16 Table 24: The 5%, 25%, 50%, 75% and 95% quantiles of the minimum model size S for Example HWH3 τ Method 5% 25% 50% 75% 95% SISQ MDC-SISQ QaSIS Case 3.a SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS SISQ MDC-SISQ QaSIS Case 3.b SISQ MDC-SISQ QaSIS DC-SIS SIRS NIS
17 9 Technical Appendix 2 Proof of Theorem 3: Write MDD 2 (V U) = g V,U (s) g V g U (s) 2 and MDD 2 n(v U) = ξ n (s) 2, where ξ n (s) = 1 n n V ke i<s,u k> 1 n n V k 1 n n ei<s,u k>. After an elementary transformation, ξ n (s) can be expressed as ξ n (s) = 1 n Ũ k Ṽ k 1 n Ũ k 1 n Ṽ k, where Ũk = exp{i < s, U k >} E[exp{i < s, U >}], Ṽk = V k E(V ). Define the region D(δ) = {s : δ s q 1/δ} for each δ > 0, MDDn,δ 2 (V U) = ξ D(δ) n(s) 2 dw, where dw = w(s)ds and w(s) = 1. For any fixed δ > 0, the weight function w(s) is bounded c q s 1+q q on D(δ). Hence MDDn,δ 2 (V U) is a combination of V-statistics with finite expectation. By the SLLN for V-statistics, it follows that almost surely lim n MDD2 n,δ(v U) = MDD 2,δ(V U) = g U,V (s) g U (s)g V 2 dw. Obviously MDD 2,δ (V U) converges to MDD2 (V U) when δ tends to zero. Now, it remains to show that lim sup δ 0 For each δ > 0, MDDn(V 2 U) MDDn,δ(V 2 U) = D(δ) lim sup MDDn(V 2 U) MDDn,δ(V 2 U) = 0. n s q <δ ξ n (s) 2 dw + s q > 1 δ ξ n (s) 2 dw (8) For z = (z 1, z 2,..., z q ) R q, define the function G(y) = 1 cos z 1 z q dz. Clearly G(y) is <y z 1+q q bounded by c q and lim y 0 G(y) = 0. Applying the Cauchy-Schwarz inequality, we obtain ξ n (s) 2 4 n Ũk 2 1 n Hence the first summand in (8) satisfies that ξ n (s) 2 dw 4 Ũk 2 n c q s 1+q ds 1 q n s q <δ 4 n 2 s q <δ Ṽk 2. (9) Ṽk 2 E U { U k U q G( U k U q δ)} 2 n 57 ( V k 2 + E(V 2 )),
18 where we used the inequalities a b 2 2(a 2 + b 2 ) for a, b R and (E(V )) 2 E(V 2 ) as well as the fact that Ũk 2 s q ds 2E <δ c q s 1+q U { U k U q G( U k U q δ)}, as presented in q Székely et al. (2007), page By the SLLN, lim sup ξ n (s) 2 dw 8E{ U 1 U 2 q G( U 1 U 2 q δ)} 4E(V 2 ) a.s. n s q<δ Therefore, by the Lebesgue dominated convergence theorem, lim sup lim sup ξ n (s) 2 dw = 0 a.s. δ 0 n s q <δ Now, consider the second summand in (8). Using the fact that Ũk 2 4 and the inequality (9) again, we can derive that ξ n (s) 2 dw 16 1 s q> 1 n δ s q> 1 c δ q s 1+q ds 1 V k E(V ) 2 q n 16h(δ) 2 {Vk 2 + E(V 2 )}, n where h(δ) = 1 s q> ds goes to zero as δ 0; compare page 2778 of Székely et al. 1 δ c q s 1+q q (2007). Thus, almost surely lim sup δ 0 lim sup n s q> ξ 1 n (s) 2 dw = 0, which implies δ that MDD n (V U) a.s. MDD(V U). The consistency of MDC n (V U) follows from the fact that V ar n (V ) V ar(v ) (SLLN) and dv ar n (U) a.s. dv ar(u) (Theorem 2 in Székely et al. (2007)). The proof is then complete. Proof of Theorem 4: The argument is similar to that presented in the proofs of Theorem 5 and Corollary 2 of Székely et al. (2007). (a), Define the process Γ n (s) = nξ n (s) = n(g n U,V (s) gu n (s)gn V ). After some straightforward calculation, we can derive that E[Γ n (s)] = 0 E[Γ n (s)γ n (s 0 )] = ( n 1 n )2 F (s s 0 ) + n 1 n g U(s s 0 )[ 1 n E(V 2 ) (EV ) 2 ] In particular, + n 1 n [(EV )2 + n 2 n E(V 2 )]g U (s)g U (s 0 ) ( n 1 n )2 F (s)g U (s 0 ) ( n 1 n )2 g U (s)f (s 0 ). E Γ n (s) 2 = n 1 n E(V 2 )(1 + n 2 n g U(s) 2 ) n 1 n (EV )2 (1 g U (s) 2 ) ( n 1 n )2 [F (s)g U (s) + g U (s)f (s)]. 58
19 In the sequel, we construct a sequence of random variables {Q n (δ)}, such that (i) Q n (δ) D Q(δ) for each δ > 0, ; (ii) lim sup E Q n (δ) Γ n 2 0 as δ 0; n (iii) E Q(δ) Γ 2 0 as δ 0. Then the weak convergence of Γ n 2 to Γ 2 follows from Theorem of Resnick (1999). Following the construction in Székely et al. (2007), we define Q n (δ) = Γ n (s) 2 dw and Q(δ) = Γ(s) 2 dw. D(δ) Given ϵ = 1/p > 0, p N, choose a partition {D k } N of D(δ) into N = N(ϵ) measurable sets with diameter at most ϵ. Then Q n (δ) = N D k Γ n (s) 2 dw and Q(δ) = N D k Γ(s) 2 dw. Define Q p n(δ) = N D k Γ n (s 0 (k)) 2 dw and Q p (δ) = N D k Γ(s 0 (k)) 2 dw, where {s 0 (k)} N are a set of distinct points such that s 0(k) D k. By multivariate CLT and continuous mapping theorem, Q p n(δ) D Q p (δ), for any p N. Then in view of Theorem of Resnick (1999), (i) holds if we can show and lim sup p D(δ) lim sup E Q p (δ) Q(δ) = 0 (10) p lim sup E Q p n(δ) Q n (δ) = 0. (11) n Let β n (ϵ) = sup s,s0 E Γ n (s) 2 Γ n (s 0 ) 2 and β(ϵ) = sup s,s0 E Γ(s) 2 Γ(s 0 ) 2, where the supremum is taken over all s and s 0, under the restrictions: δ < s q, s 0 q < 1/δ and s s 0 q < ϵ. In view of the form of Cov Γ (s, s 0 ) (defined after Theorem 3) and by applying the Cauchy-Swartz inequality, we derive that β(ϵ) = sup s,s 0 E (Γ(s) Γ(s 0 ))Γ(s) + Γ(s 0 )(Γ(s) Γ(s 0 )) sup E 1/2 Γ(s) Γ(s 0 ) 2 (E 1/2 Γ(s) 2 + E 1/2 Γ(s 0 ) 2 ) C sup E 1/2 Γ(s) Γ(s 0 ) 2 s,s 0 s,s 0 sup C Cov Γ (s, s) Cov Γ (s, s 0 ) Cov Γ (s 0, s) + Cov Γ (s 0, s 0 ) 1/2. s,s 0 59
20 Since g U (s) and F (s) are uniformly continuous in s R q, it can be easily shown that β(ϵ) 0 as ϵ 0. To show (10), we note that E Q p N (δ) Q(δ) = E Γ(s) 2 dw Γ(s 0 (k)) 2 dw D(δ) D k N = E ( Γ(s) 2 Γ(s 0 (k)) 2 )dw D k 1 β(1/p) c q s 1+q ds 0, as p q Using exactly the same argument, we can show (11) and thus (i) holds. On the other hand, E Γ n (s) 2 dw Γ n (s) 2 dw = E Γ n (s) 2 dw + E Γ n (s) 2 dw D(δ) R q D(δ) s q<δ s q>1/δ Following similar steps as in the proof of Theorem 3, we can derive that for any small ϵ, there exist δ 0, n 0, such that when n n 0 and δ δ 0, E Γ s q<δ n(s) 2 dw < ϵ and E Γ s q >1/δ n(s) 2 dw < ϵ. Thus, we complete our proof for (ii). A similar argument also applies to Q(δ), so (iii) holds. Therefore nmddn(v 2 U) = Γ n 2 D n Γ 2. (b), According to the first assertion, under the assumption that M DC(V U) = 0, nmddn(v 2 U) converges in distribution to a quadratic form Γ 2. Note that E Γ 2 = Cov Γ (s, s)dw R q = {[E(V 2 ) (EV ) 2 ](1 g U (s) 2 ) + 2E(V 2 ) g U (s) 2 F (s)g U (s) g U (s)f (s)}dw R q Under the assumption that E(V 2 U) = E(V 2 ), F (s) = E(V 2 )g U (s), which implies that E Γ 2 = E U U q [E(V 2 ) (EV ) 2 a.s. ]. By the SLLN for V-statistics, S n E U n U q [E(V 2 ) (EV ) 2 ]. Therefore nmdd 2 n(v U)/S n D n Q, where E[Q] = 1 and Q is a nonnegative quadratic form of centered Gaussian random variable following the argument in the proof of Corollary 2 of Székely et al. (2007). 60
21 (c), Suppose that MDD(V U) > 0, then Theorem 3 implies that MDD 2 n(v U) MDD 2 (V U) > 0, and therefore nmdd 2 n(v U) P n a.s. n. By the SLLN, S n converges to a constant and therefore nmddn(v 2 U)/S n. n Proof of Theorem 5: Our argument basically follows that in the proof of Theorem 1 of Li, Zhong and Zhu (2012) with a slight modification. For the sake of completeness, we present the details. In our proof, the positive constant C is generic and its value may vary from place to place. We shall first show the uniform consistency of ω j = (MDCn) j 2 under the assumption (A1). Due to the similarity of its numerator and denominator, we only deal with its numerator, i.e., the uniform consistency of (MDDn) j 2. Let S j 1 = E[Y Y X j X j ], S j 2 = E[Y Y ]E[ X j X j ] and S j 3 = E[Y Y X j X j ], where (X j, Y ) and (X j, Y ) are iid copies of (X j, Y ). Correspondingly, denote their sample counterparts as S j 1n = 1 Y n 2 k Y l X jk X jl, k,l=1 S j 2n = 1 n 2 k,l=1 S j 3n = 1 n 3 k,l,h=1 P Y k Y l 1 n 2 X jk X jl, k,l=1 Y k Y h X jk X jl. According to the proofs of Theorems 1 and 2, MDD j and MDDn j can be expressed as (MDD j ) 2 = S j 1 S j 2 + 2S j 3 and (MDDn) j 2 = S j 1n S j 2n + 2S3n. j We shall establish the consistency result for each part respectively. Part I: Consistency of S j 1n Define a U-statistic S j 1n = {n(n 1)} 1 k l Y ky l X jk X jl with the kernel function h 1 (X jk, Y k ; X jl, Y l ) = Y k Y l X jk X jl. First, we shall show that the uniform consistency of S j 1n can be derived by that of S 1n. j By using the Cauchy-Schwarz inequality, S j 1 = E[Y Y X j X j ] {E[(Y Y ) 2 ] E[ X j X j 2 ]} 1/2 {(E(Y 4 )) 1/2 (E[(Y ) 4 ]) 1/2 4E(Xj 2 )} 1/2 = 2(E(Xj 2 ) E(Y 4 )) 1/2. Under the assumption (A1), sup p max 1 j p S j 1 <, i.e., {S1} j p j=1 are uniformly bounded. Thus, for any ϵ > 0, there exists a sufficiently large n, s.t. S1/n j ϵ for any j = 1,, p (in the case ϵ = cn κ as will be specified later, this still holds). Then, P ( S1n S j 1 j 2ϵ) = P ( n 1( S j n 1n S1) j 1 n Sj 1 2ϵ) P ( S 1n S j 1 + S j 1/n j 2ϵ) P ( S j 1n S1 j ϵ). 61
22 Next, we shall establish the uniform consistency of S j 1n based on the theories of U- statistics. Write S j 1n as S j 1n = {n(n 1)} 1 k l h 1I{ h 1 M}+{n(n 1)} 1 k l h 1I{ h 1 > M} = S j 1n,1 + S 1n,2. j Correspondingly, its population counterpart can also be decomposed as S j 1 = E[h 1 I{ h 1 M}] + E[h 1 I{ h 1 > M}] = S j 1,1 + S1,2. j Note that S j 1n,1 and S j 1n,2 are unbiased estimators of S j 1,1 and S j 1,2 respectively. To show the consistency of S 1n,1, j we note that all U-statistics can be expressed as an average of averages of iid random variables, see Serfling (1980, Section 5.1.6). Denote m = n/2 and define Ω(X j1, Y 1 ; ; X jn, Y n ) = 1 m m 1 r=0 h (r) 1 I{ h (r) 1 M}, where h (r) 1 = h 1 (X j 1+2r, Y 1+2r ; X j 2+2r, Y 2+2r ). Then we have S j 1n,1 = (n!) 1 n! Ω(X ji 1, Y i1 ; ; X jin, Y in ), where n! denote summation over all n! permutations (i 1,, i n ) of (1,, n). By Jensen s inequality, for t > 0, E[exp(t S j 1n,1)] = E[exp{t(n!) 1 n! Ω(X ji1, Y i1 ; ; X jin, Y in )}] (n!) 1 n! m 1 E[exp(t h (r) 1 I{ h (r) 1 M}/m)] r=0 = E m [exp(th (r) 1 I{ h (r) 1 M}/m)], which entails that P ( S j 1n,1 S j 1,1 ϵ) exp( tϵ) exp( ts j 1,1)E[exp(t S j 1n,1)] exp( tϵ) E m {exp[t(h (r) 1 I{ h (r) 1 M} S j 1,1)/m]} exp( tϵ) exp{t 2 M 2 /(2m)} where we have applied Markov s inequality and Hoeffding s inequality (see Lemma 1 of Li, Zhong and Zhu (2012)) in the first and third inequality above, respectively. Choosing t = ϵm/m 2 and utilizing the symmetry of U-statistic, we can obtain that P ( S j 1n,1 S1,1 j ϵ) 2 exp{ ϵ 2 m/(2m 2 )}. Next, we turn to the other part S 1n,2. j With the Cauchy-Schwarz inequality and Markov s inequality, (S1,2) j 2 = (E[h 1 I{ h 1 > M}]) 2 E[h 2 1] P { h 1 > M} E[h 2 1]E[ h 1 q ] M q for any q N. By applying the inequality ab (a 2 + b 2 )/2, a, b R twice, we get h 1 (X jk, Y k ; X jl, Y l ) Yk 2Y l 2/2 + 1 X 2 jk X jl 2 Yk 4 + Y l 4 + Xjk 2 + X2 jl, which yields 62
23 E[ h 1 q ] (2 q 1) 2 E[Y 4q k + Y 4q l + X 2q jk + X2q jl )] < by the C r inequality and assumption (A1). Thus, if we choose M = n γ for 0 < γ < 1/2 κ, then S j 1,2 ϵ/2 for sufficiently large n (in the case ϵ = cn κ as will be specified later, q can be any integer greater than 2κ/γ). Hence, P ( S j 1n,2 S1,2 j ϵ) P ( S 1n,2 j ϵ/2). Since the event { S 1n,2 j ϵ/2} implies the event {Yk 4 + X2 jk M/2 for some 1 k n}, we have that P { S 1n,2 j ϵ/2} P ( n {Yk 4 + Xjk 2 M/2}) P ({Yk 4 + Xjk 2 M/2}) = np ({Yk 4 + Xjk 2 M/2}), where we have applied Bonferroni s inequality in the second inequality above. Invoking assumption (A1) and Markov s inequality, there must exist a constant C, s.t. P (Y 4 k +X2 jk M/2) P (Y 2 k M/2)+P (X 2 jk M/4) C exp( s M/2) for any j, k, and s (0, 2s 0 ]. Consequently, for sufficiently large n, max 1 j p P ( S j 1n,2 S j 1,2 ϵ) max 1 j p P ( S j 1n,2 ϵ/2) max 1 j p np (Y 4 k + X2 jk M/2) Cn exp( s M/2). In combination with the convergence result of S j 1n,1, we get that for large enough n, P ( S j 1n S j 1 4ϵ) P ( S j 1n S j 1 2ϵ) P ( S j 1n,1 S j 1,1 ϵ) + P ( S j 1n,2 S j 1,2 ϵ) 2 exp( ϵ 2 n 1 2γ /4) + Cn exp( sn γ/2 /2). Part II: Consistency of S j 2n Denote S j 2n as S j 2n = S j 2n,1 S j 2n,2, where S j 2n,1 = n 2 n k,l=1 X jk X jl, and S j 2n,2 = n 2 n k,l=1 Y ky l. Similarly, write its population counterpart as S j 2 = S j 2,1 S j 2,2, where S j 2,1 = E X j X j and S j 2,2 = E(Y Y ). Following the similar arguments in Part I, we can show that P ( S j 2n,1 S j 2,1 4ϵ) 2 exp( ϵ 2 n 1 2γ /4) + Cn exp( sn 2γ /4) P ( S j 2n,2 S j 2,2 4ϵ) 2 exp( ϵ 2 n 1 2γ /4) + Cn exp( sn γ ). Assumption (A1) ensures that S j 2,1 = E X j X j (E X j X j 2 ) 1/2 [4E( X j 2 )] 1/2 and S j 2,2 = E(Y Y ) 1E(Y 2 + Y 2 ) = E(Y 2 ) are both uniformly bounded. Let C be a 2 sufficiently large constant, which satisfies ( ) C > max {S2,1} j p j=1, {Sj 2,2} p j=1, {E[exp(sX2 j )]} p j=1, E[exp(sY 2 )], 1 for s (0, 2s 0 ]. 63
24 Note that S j 2n S j 2 = S j 2n,1 S j 2n,2 S j 2,1 S j 2,2 = (S j 2n,1 S j 2,1)(S j 2n,2 S j 2,2) + S j 2,1(S j 2n,2 S j 2,2) + S j 2,2(S j 2n,1 S j 2,1). Therefore, by utilizing above inequalities repeatedly, we can show that P ( (S j 2n,1 S j 2,1)(S j 2n,2 S j 2,2) ϵ) P ( S j 2n,1 S j 2,1 ϵ) + P ( S j 2n,2 S j 2,2 ϵ) 4 exp( ϵn 1 2γ /64) + 2Cn exp( sn γ ), P ( S j 2,1(S j 2n,2 S j 2,2) ϵ) P ( S j 2n,2 S j 2,2 ϵ/c) 2 exp( ϵ 2 n 1 2γ /(64C 2 ))+Cn exp( sn γ ), and P ( S j 2,2(S j 2n,1 S j 2,1) ϵ) P ( S j 2n,1 S j 2,1 ϵ/c) 2 exp( ϵ 2 n 1 2γ /(64C 2 ))+Cn exp( sn 2γ /4). It follows from Bonferroni s inequality that, P ( S j 2n S j 2 3ϵ) P ( (S j 2n,1 S j 2,1)(S j 2n,2 S j 2,2) ϵ) + P ( S j 2,1(S j 2n,2 S j 2,2) ϵ) + P ( S j 2,2(S j 2n,1 S j 2,1) ϵ) 8 exp( ϵ 2 n 1 2γ /(64C 2 )) + 4Cn exp( sn γ ). Part III: Consistency of S j 3n Define the corresponding U-statistic: S j 3n = {n(n 1)(n 2)} [ 1 Y k Y l X jk X jh + Y k Y h X jk X jl + Y l Y k X jl X jh k<l<h ] +Y l Y h X jl X jk + Y h Y k X jh X jl + Y h Y l X jh X jk = 6{n(n 1)(n 2)} 1 h 3 (X jk, Y k ; X jl, Y l ; X jh, Y h ), k<l<h where h 3 (X jk, Y k ; X jl, Y l ; X jh, Y h ) is the kernel function. Following the same argument to deal with S 1n, j we write S j 3n as S j 3n = 6{n(n 1)(n 2)} 1 k<l<h h 3I( h 3 M) + 6{n(n 1)(n 2)} 1 k<l<h h 3I( h 3 > M) = S j 3n,1 + S j 3n,2 and its population counterpart as S j 3 = E[h 3 I{ h 3 M}] + E[h 3 I{ h 3 > M}] = S j 3,1 + S3,2. j By using the same argument for S 1n,1, j we can show that P ( S j 3n,1 S j 3,1 ϵ) 2 exp{ ϵ 2 m /(2M 2 )}, where m = n/3 due to the fact S j 3n is a third-order U-statistic. Now, it remains to establish the uniform convergency of the other part S j 3n,2. Note that h 3 (X jk, Y k ; X jl, Y l ; X jh, Y h ) 64
25 [Yk 4 +Y l 4 +Yh 4 +X2 jk +X2 jh +X2 jl ], so the event { S 3n,2 j ϵ/2} implies the event {Yk 4 +X2 jk > M/3, for some 1 k n}. Therefore, following a similar argument as presented in Part I, we have P ( S j 3n,2 S j 3,2 ϵ) P ( S j 3n,2 ϵ/2) P ( n [Y 4 k + X 2 jk M/3]) Cn exp( s M/ 6) for any j, k and s (0, 2s 0 ]. Combining the two convergence results for S j 3n,1 and S j 3n,2 with M = n γ for some 0 < γ < 1/2 κ, it follows that P ( S j 3n S j 3 2ϵ) 2 exp( ϵ 2 n 1 2γ /6) + Cn exp( sn γ/2 / 6). Note that S j 3n S j 3 = (n 1)(n 2) n 2 ( S j 3n S j 3) 3n 2 n 2 S j 3 + n 1 n 2 ( S j 1n S1) j + n 1 S n 1. j Following 2 the similar argument in dealing with S j 1, we can show S j 3 is also uniformly bounded in j. Therefore, with a sufficiently large n, (3n 2)S j 3/n 2 and (n 1)S j 1/n 2 are both smaller than ϵ (in the case ϵ = cn κ, this also holds). Then, P ( S j 3n S j 3 4ϵ) P ( S j 3n S j 3 ϵ) + P ( S j 1n S j 1 ϵ) 4 exp( ϵ 2 n 1 2γ /24) + 2Cn exp( sn γ/2 / 6). This, together with the consistency in Part I and Part II, yields that P { (2S j 3n S j 1n S j 2n) (2S j 3 S j 1 S j 2) ϵ} P ( S j 3n S j 3 ϵ 4 ) + P ( Sj 2n S j 2 ϵ 4 ) + P ( Sj 1n S j 1 ϵ 4 ) =O{exp( c 1 ϵ 2 n 1 2γ ) + n exp( c 2 n γ/2 )} for some positive constants c 1 and c 2 and the bound is uniform with respect to j = 1,, p. Analyzing the denominator of ω j would generate the same form of convergence rate, so we omit the details here. Let ϵ = cn κ, where κ satisfies 0 < κ + γ < 1/2, we then have P { max 1 j p ω j ω j cn κ } p max 1 j p P { ω j ω j cn κ } O(p[exp{ c 1 n 1 2(κ+γ) } + n exp( c 2 n γ/2 )]) which finishes our proof for the first part of theorem. If D E D E, then there exist some j D E, such that ω j < cn κ. According to assumption (A2), this particular j would make ω j ω j cn κ, which implies that A = {D E D E } { ω j ω j cn κ, for some j D E } = B 65
26 and hence B c A c. Therefore, P (A c ) P (B c ) = 1 P (B) = 1 P ( ω j ω j cn κ, for some j D E ) 1 s n max j D E P ( ω j ω j cn κ ) 1 O(s n [exp{ c 1 n 1 2(κ+γ) } + n exp( c 2 n γ/2 )]), where the first inequality above is due to Bonferroni s inequality. The proof is thus complete. Proof of Theorem 6: We shall show the uniform consistency of ω j (Ŵ ) := MDCj n(ŵ )2 under the assumptions (B1) and (B2). Due to the similarity of its numerator and denominator, we only deal with the numerator part, i.e., the consistency of MDDn(Ŵ j )2. First we demonstrate the consistency of MDDn(W j ) 2, and then study the difference between MDDn(W j ) 2 and MDDn(Ŵ j )2. Since W and W j are uniformly bounded, we can adopt the argument in the proof of Theorem 1 in Li, Zhong and Zhu (2012) (also see the proof of Theorem 5 for a slightly modified argument, where the bound can be slightly improved under the assumption that the response variable is bounded) and get that for any γ (0, 1/2 κ), there exist positive constants c 1 and c 2 such that for a sufficiently small ϵ (say ϵ = cn κ as will be specified later), P ( MDD j n(w ) 2 MDD j (W ) 2 ϵ) C[exp{ c 1 ϵ 2 n 1 2γ } + n exp( c 2 n γ )]. (12) Next we analyze the difference between MDDn(W j ) 2 and MDDn(Ŵ j )2. Denote T j 1n = n 2 n k,l=1 ŴkŴl X j jk X jl, T 2n = n 2 n k,l=1 ŴkŴl 1 n n 2 k,l=1 X jk X jl, and T j 3n = n 3 n k,l,h=1 ŴkŴh X jk X jl. Similarly T1n, j T j 2n and T j 3n are defined with {Ŵk} n replaced by {W k } n. Let C 0 = τ + 1. By using the triangle inequality and the boundness 66
27 of W k and Ŵk, we can derive that MDDn(Ŵ j )2 MDDn(W j ) 2 T j 1n T1n j + T j 2n T2n j + 2 T j 3n T3n j = 1 n 2 [ŴkŴl W k W l ] X jk X jl + 1 [ŴkŴl W k W l ] 1 X n 2 jk X jl 1 n 2 k,l= n 3 k,l,h=1 [ŴkŴh W k W h ] X jk X jl n 2 k,l=1 [ Ŵk(Ŵl W l ) + W l (Ŵk W k ) ] X jk X jl k,l=1 + 1 n 2 [ Ŵk(Ŵl W l ) + W l (Ŵk W k ) ] 1 n 2 k,l=1 + 2 n 3 4C 0 n 2 k,l,h=1 k,l=1 X jk X jl k,l=1 [ Ŵk(Ŵh W h ) + W h (Ŵk W k ) ] X jk X jl Ŵl W l X jk X jl + 4C 0 n Ŵk W k 1 n 2 k,l=1 X jk X jl =: k,l=1 We first treat 1. By the Cauchy-Schwarz inequality, we have ( 1 ) 2 16C 2 0 =16C n [ 1 n + 16C 2 0 Ŵl W l 2 1 n 2 l=1 Ŵl W l 2 1 n 2 l=1 1 n X jk X jl 2 k,l=1 X jk X jl 2 1 n k,l=1 Ŵl W l 2 E X j1 X j2 2] l=1 Ŵl W l 2 E X j1 X j2 2 =: D 1 + D 2 l=1 Noting n 1 n l=1 Ŵl W l 2 4C 2 0, then we have P ( D 1 ϵ 2 /2) P ( 1 n 2 X jk X jl 2 E X j1 X j2 2 ϵ 2 /(128C0)) 4 k,l=1 C(exp( c 1 ϵ 4 n 1 2γ ) + n exp( c 2 n 2γ )) for some positive constants c 1 and c 2, based on equation (B.7) in Li, Zhong and Zhu (2012). 67
28 Under the assumption (B2), there exists a positive constant C 2 <, such that E X j1 X j2 2 4E X j 2 < C 2. Then by Proposition 2, P ( D 2 ϵ 2 /2) P ( 1 n Ŵl W l 2 ϵ 2 /(32C0C 2 2 )) C exp( nc 3 ϵ 4 ) l=1 for small enough ϵ and some c 3 > 0. Combining the probability bounds we derived for D 1 and D 2, P ( 1 ϵ) C ( exp( c 1 ϵ 4 n 1 2γ ) + n exp( c 2 n 2γ ) + exp( c 3 nϵ 4 ) ) C ( exp( c 1 ϵ 4 n 1 2γ ) + n exp( c 2 n 2γ )), where the third term on the right hand side can be absorbed into the first term. In a similar fashion, we can derive that P ( 2 ϵ) C ( exp( c 1 ϵ 2 n 1 2γ ) + n exp( c 2 n 2γ ) ) for some positive constants c 1, c 2. Consequently, in view of (12), we have that P ( MDD j n(ŵ )2 MDD j (W ) 2 3ϵ) P ( MDD j n(w ) 2 MDD j (W ) 2 ϵ) +P ( 1 ϵ) + P ( 2 ϵ) C ( exp( c 1 ϵ 4 n 1 2γ ) + n exp( c 2 n γ ) ) for a sufficiently small ϵ and some positive constants c 1, c 2. The analysis of the denominator of MDC j n(ŵ )2 would generate a similar form of convergence rate. Therefore, if we set ϵ = cn κ, where κ satisfies 0 < 2κ + γ < 1/2, we would have P { max 1 j p ω j(ŵ ) ω j(w ) cn κ } p max 1 j p P { ω j(ŵ ) ω j(w ) cn κ } C(p[exp( c 1 n 1 2(γ+2κ) ) + n exp( c 2 n γ )]) which proves the first assertion. The second assertion follows from the same argument used in proving the second statement in Theorem 5. The proof is complete. 68
29 References Fan, J., Feng, Y., and Song, R. (2011), Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models, Journal of the American Statistical Association, 106, Fan, J., and Lv, J. (2008), Sure Independence Screening for Ultra-High Dimensional Feature Space, Journal of the Royal Statistical Society, Series B, 70, (with discussions and rejoinder). He, X., Wang, L., and Hong, H. G. (2013), Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data, Annals of Statistics, 41, Li, R., Zhong, W., and Zhu, L. (2012), Feature Screening via Distance Correlation Learning, Journal of the American Statistical Association, 107, Meier, L., van de Geer, S., and Bühlmann, P. (2009), High-Dimensional Additive Modeling, Annals of Statistics, 37, Resnick, S. I. (1999) A Probability Path. Birkhäuser, Boston. Serfling, R. J. (1980) Approximation Theorems of Mathematical Statistics, New York: John Wiley & Sons Inc. Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, 35, Székely, G. J., and Rizzo, M. L. (2009), Brownian Distance Covariance, The Annals of Applied Statistics, 3, Zhu, L., Li, L., Li, R., and Zhu, L. (2011), Model-Free Feature Screening for Ultrahigh- Dimensional Data, Journal of the American Statistical Association, 106,
arxiv: v1 [stat.ml] 5 Jan 2015
To appear on the Annals of Statistics SUPPLEMENTARY MATERIALS FOR INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION arxiv:1501.0109v1 [stat.ml] 5 Jan 015 By Yingying Fan, Yinfei
More informationGaussian Processes. 1. Basic Notions
Gaussian Processes 1. Basic Notions Let T be a set, and X : {X } T a stochastic process, defined on a suitable probability space (Ω P), that is indexed by T. Definition 1.1. We say that X is a Gaussian
More informationUniversity Park, PA, b Wang Yanan Institute for Studies in Economics, Department of Statistics, Fujian Key
This article was downloaded by: [174.109.31.123] On: 02 August 2014, At: 14:01 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
More informationConditional quantile screening in ultrahigh-dimensional heterogeneous data
Biometrika (2015), 102,1,pp. 65 76 doi: 10.1093/biomet/asu068 Printed in Great Britain Advance Access publication 16 February 2015 Conditional quantile screening in ultrahigh-dimensional heterogeneous
More information1. Stochastic Processes and filtrations
1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S
More informationLarge Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n
Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the
More informationThe Pennsylvania State University The Graduate School NONPARAMETRIC INDEPENDENCE SCREENING AND TEST-BASED SCREENING VIA THE VARIANCE OF THE
The Pennsylvania State University The Graduate School NONPARAMETRIC INDEPENDENCE SCREENING AND TEST-BASED SCREENING VIA THE VARIANCE OF THE REGRESSION FUNCTION A Dissertation in Statistics by Won Chul
More informationAssumptions for the theoretical properties of the
Statistica Sinica: Supplement IMPUTED FACTOR REGRESSION FOR HIGH- DIMENSIONAL BLOCK-WISE MISSING DATA Yanqing Zhang, Niansheng Tang and Annie Qu Yunnan University and University of Illinois at Urbana-Champaign
More informationSUPPLEMENTARY MATERIAL TO INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION
IPDC SUPPLEMENTARY MATERIAL TO INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION By Yinfei Kong, Daoji Li, Yingying Fan and Jinchi Lv California State University
More informationTools from Lebesgue integration
Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More informationBahadur representations for bootstrap quantiles 1
Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by
More informationMax stable Processes & Random Fields: Representations, Models, and Prediction
Max stable Processes & Random Fields: Representations, Models, and Prediction Stilian Stoev University of Michigan, Ann Arbor March 2, 2011 Based on joint works with Yizao Wang and Murad S. Taqqu. 1 Preliminaries
More informationarxiv: v1 [stat.me] 30 Dec 2017
arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw
More informationStochastic integration. P.J.C. Spreij
Stochastic integration P.J.C. Spreij this version: April 22, 29 Contents 1 Stochastic processes 1 1.1 General theory............................... 1 1.2 Stopping times...............................
More informationPreliminary Exam: Probability 9:00am 2:00pm, Friday, January 6, 2012
Preliminary Exam: Probability 9:00am 2:00pm, Friday, January 6, 202 The exam lasts from 9:00am until 2:00pm, with a walking break every hour. Your goal on this exam should be to demonstrate mastery of
More informationAdditive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535
Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June
More informationarxiv: v1 [stat.me] 11 May 2016
Submitted to the Annals of Statistics INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION arxiv:1605.03315v1 [stat.me] 11 May 2016 By Yinfei Kong, Daoji Li, Yingying
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationSurvival impact index and ultrahigh-dimensional model-free screening with survival outcomes
Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes Jialiang Li, National University of Singapore Qi Zheng, University of Louisville Limin Peng, Emory University
More informationERRATA: Probabilistic Techniques in Analysis
ERRATA: Probabilistic Techniques in Analysis ERRATA 1 Updated April 25, 26 Page 3, line 13. A 1,..., A n are independent if P(A i1 A ij ) = P(A 1 ) P(A ij ) for every subset {i 1,..., i j } of {1,...,
More informationPartial martingale difference correlation
Electronic Journal of Statistics Vol. 9 (2015) 1492 1517 ISSN: 1935-7524 DOI: 10.1214/15-EJS1047 Partial martingale difference correlation Trevor Park, Xiaofeng Shao and Shun Yao University of Illinois
More informationWiener Measure and Brownian Motion
Chapter 16 Wiener Measure and Brownian Motion Diffusion of particles is a product of their apparently random motion. The density u(t, x) of diffusing particles satisfies the diffusion equation (16.1) u
More informationBrownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory
More informationCHAPTER 3: LARGE SAMPLE THEORY
CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually
More informationn E(X t T n = lim X s Tn = X s
Stochastic Calculus Example sheet - Lent 15 Michael Tehranchi Problem 1. Let X be a local martingale. Prove that X is a uniformly integrable martingale if and only X is of class D. Solution 1. If If direction:
More informationApproximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work
Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto
More informationGeneralizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence
Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence arxiv:1709.02532v5 [math.st] 25 Feb 2018 Ze Jin, David S. Matteson February 27, 2018 Abstract We propose three new measures
More informationCONDITIONAL MEAN AND QUANTILE DEPENDENCE TESTING IN HIGH DIMENSION
Submitted to the Annals of Statistics CONDITIONAL MEAN AND QUANTILE DEPENDENCE TESTING IN HIGH DIMENSION By Xianyang Zhang, Shun Yao, and Xiaofeng Shao Texas A&M University and University of Illinois at
More informationFeature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates
Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Jingyuan Liu, Runze Li and Rongling Wu Abstract This paper is concerned with feature screening and variable selection
More information4 Sums of Independent Random Variables
4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables
More informationDoléans measures. Appendix C. C.1 Introduction
Appendix C Doléans measures C.1 Introduction Once again all random processes will live on a fixed probability space (Ω, F, P equipped with a filtration {F t : 0 t 1}. We should probably assume the filtration
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationStatistica Sinica Preprint No: SS
Statistica Sinica Preprint No: SS-2018-0176 Title A Lack-Of-Fit Test with Screening in Sufficient Dimension Reduction Manuscript ID SS-2018-0176 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202018.0176
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationMultivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013
Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.
More informationSTOCHASTIC GEOMETRY BIOIMAGING
CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2018 www.csgb.dk RESEARCH REPORT Anders Rønn-Nielsen and Eva B. Vedel Jensen Central limit theorem for mean and variogram estimators in Lévy based
More informationAdditional proofs and details
Statistica Sinica: Supplement NONPARAMETRIC MODEL CHECKS OF SINGLE-INDEX ASSUMPTIONS Samuel Maistre 1,2,3 and Valentin Patilea 1 1 CREST (Ensai) 2 Université de Lyon 3 Université de Strasbourg Supplementary
More informationItô s formula. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Itô s formula Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Itô s formula Probability Theory
More information2 Two-Point Boundary Value Problems
2 Two-Point Boundary Value Problems Another fundamental equation, in addition to the heat eq. and the wave eq., is Poisson s equation: n j=1 2 u x 2 j The unknown is the function u = u(x 1, x 2,..., x
More information4 Uniform convergence
4 Uniform convergence In the last few sections we have seen several functions which have been defined via series or integrals. We now want to develop tools that will allow us to show that these functions
More informationAn Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions
An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions H. V. Poor Princeton University March 17, 5 Exercise 1: Let {h k,l } denote the impulse response of a
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationLecture 16: Sample quantiles and their asymptotic properties
Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p
More informationOn the martingales obtained by an extension due to Saisho, Tanemura and Yor of Pitman s theorem
On the martingales obtained by an extension due to Saisho, Tanemura and Yor of Pitman s theorem Koichiro TAKAOKA Dept of Applied Physics, Tokyo Institute of Technology Abstract M Yor constructed a family
More informationOnline Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)
Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.
More informationDistance between multinomial and multivariate normal models
Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationDETERMINATION OF THE BLOW-UP RATE FOR THE SEMILINEAR WAVE EQUATION
DETERMINATION OF THE LOW-UP RATE FOR THE SEMILINEAR WAVE EQUATION y FRANK MERLE and HATEM ZAAG Abstract. In this paper, we find the optimal blow-up rate for the semilinear wave equation with a power nonlinearity.
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationConcentration behavior of the penalized least squares estimator
Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar
More informationRegularity of the density for the stochastic heat equation
Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationInternational Journal of Pure and Applied Mathematics Volume 21 No , THE VARIANCE OF SAMPLE VARIANCE FROM A FINITE POPULATION
International Journal of Pure and Applied Mathematics Volume 21 No. 3 2005, 387-394 THE VARIANCE OF SAMPLE VARIANCE FROM A FINITE POPULATION Eungchun Cho 1, Moon Jung Cho 2, John Eltinge 3 1 Department
More informationOn Expected Gaussian Random Determinants
On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose
More informationThe Central Limit Theorem Under Random Truncation
The Central Limit Theorem Under Random Truncation WINFRIED STUTE and JANE-LING WANG Mathematical Institute, University of Giessen, Arndtstr., D-3539 Giessen, Germany. winfried.stute@math.uni-giessen.de
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationCombinatorial Dimension in Fractional Cartesian Products
Combinatorial Dimension in Fractional Cartesian Products Ron Blei, 1 Fuchang Gao 1 Department of Mathematics, University of Connecticut, Storrs, Connecticut 0668; e-mail: blei@math.uconn.edu Department
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationA Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices
A Note on the Central Limit Theorem for the Eigenvalue Counting Function of Wigner and Covariance Matrices S. Dallaporta University of Toulouse, France Abstract. This note presents some central limit theorems
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationSPECTRAL GAP FOR ZERO-RANGE DYNAMICS. By C. Landim, S. Sethuraman and S. Varadhan 1 IMPA and CNRS, Courant Institute and Courant Institute
The Annals of Probability 996, Vol. 24, No. 4, 87 902 SPECTRAL GAP FOR ZERO-RANGE DYNAMICS By C. Landim, S. Sethuraman and S. Varadhan IMPA and CNRS, Courant Institute and Courant Institute We give a lower
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationRisk-Minimality and Orthogonality of Martingales
Risk-Minimality and Orthogonality of Martingales Martin Schweizer Universität Bonn Institut für Angewandte Mathematik Wegelerstraße 6 D 53 Bonn 1 (Stochastics and Stochastics Reports 3 (199, 123 131 2
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationConvergence at first and second order of some approximations of stochastic integrals
Convergence at first and second order of some approximations of stochastic integrals Bérard Bergery Blandine, Vallois Pierre IECN, Nancy-Université, CNRS, INRIA, Boulevard des Aiguillettes B.P. 239 F-5456
More informationStein s Method and Characteristic Functions
Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method
More informationGeneralized Gaussian Bridges of Prediction-Invertible Processes
Generalized Gaussian Bridges of Prediction-Invertible Processes Tommi Sottinen 1 and Adil Yazigi University of Vaasa, Finland Modern Stochastics: Theory and Applications III September 1, 212, Kyiv, Ukraine
More informationHomework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples
Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function
More informationA FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS
Yugoslav Journal of Operations Research 25 (205), Number, 57 72 DOI: 0.2298/YJOR3055034A A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM FOR P (κ)-horizontal LINEAR COMPLEMENTARITY PROBLEMS Soodabeh
More informationConsidering our result for the sum and product of analytic functions, this means that for (a 0, a 1,..., a N ) C N+1, the polynomial.
Lecture 3 Usual complex functions MATH-GA 245.00 Complex Variables Polynomials. Construction f : z z is analytic on all of C since its real and imaginary parts satisfy the Cauchy-Riemann relations and
More informationEstimation of the Bivariate and Marginal Distributions with Censored Data
Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new
More information(2m)-TH MEAN BEHAVIOR OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS UNDER PARAMETRIC PERTURBATIONS
(2m)-TH MEAN BEHAVIOR OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS UNDER PARAMETRIC PERTURBATIONS Svetlana Janković and Miljana Jovanović Faculty of Science, Department of Mathematics, University
More informationResearch Article Exponential Inequalities for Positively Associated Random Variables and Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 008, Article ID 38536, 11 pages doi:10.1155/008/38536 Research Article Exponential Inequalities for Positively Associated
More informationResearch Article Existence and Uniqueness Theorem for Stochastic Differential Equations with Self-Exciting Switching
Discrete Dynamics in Nature and Society Volume 211, Article ID 549651, 12 pages doi:1.1155/211/549651 Research Article Existence and Uniqueness Theorem for Stochastic Differential Equations with Self-Exciting
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary
More informationAn introduction to some aspects of functional analysis
An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms
More information9 Brownian Motion: Construction
9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of
More informationKai Lai Chung
First Prev Next Go To Go Back Full Screen Close Quit 1 Kai Lai Chung 1917-29 Mathematicians are more inclined to build fire stations than to put out fires. Courses from Chung First Prev Next Go To Go Back
More informationWeak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij
Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real
More informationLinear Ordinary Differential Equations
MTH.B402; Sect. 1 20180703) 2 Linear Ordinary Differential Equations Preliminaries: Matrix Norms. Denote by M n R) the set of n n matrix with real components, which can be identified the vector space R
More informationI forgot to mention last time: in the Ito formula for two standard processes, putting
I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationSEMI-INNER PRODUCTS AND THE NUMERICAL RADIUS OF BOUNDED LINEAR OPERATORS IN HILBERT SPACES
SEMI-INNER PRODUCTS AND THE NUMERICAL RADIUS OF BOUNDED LINEAR OPERATORS IN HILBERT SPACES S.S. DRAGOMIR Abstract. The main aim of this paper is to establish some connections that exist between the numerical
More informationMOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES
J. Korean Math. Soc. 47 1, No., pp. 63 75 DOI 1.4134/JKMS.1.47..63 MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES Ke-Ang Fu Li-Hua Hu Abstract. Let X n ; n 1 be a strictly stationary
More informationExponential martingales: uniform integrability results and applications to point processes
Exponential martingales: uniform integrability results and applications to point processes Alexander Sokol Department of Mathematical Sciences, University of Copenhagen 26 September, 2012 1 / 39 Agenda
More informationINTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION 1
The Annals of Statistics 017, Vol. 45, No., 897 9 DOI: 10.114/16-AOS1474 Institute of Mathematical Statistics, 017 INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION
More informationLecture 12. F o s, (1.1) F t := s>t
Lecture 12 1 Brownian motion: the Markov property Let C := C(0, ), R) be the space of continuous functions mapping from 0, ) to R, in which a Brownian motion (B t ) t 0 almost surely takes its value. Let
More informationThe largest eigenvalues of the sample covariance matrix. in the heavy-tail case
The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)
More informationDetecting instants of jumps and estimating intensity of jumps from continuous or discrete data
Detecting instants of jumps and estimating intensity of jumps from continuous or discrete data Denis Bosq 1 Delphine Blanke 2 1 LSTA, Université Pierre et Marie Curie - Paris 6 2 LMA, Université d'avignon
More informationKrzysztof Burdzy University of Washington. = X(Y (t)), t 0}
VARIATION OF ITERATED BROWNIAN MOTION Krzysztof Burdzy University of Washington 1. Introduction and main results. Suppose that X 1, X 2 and Y are independent standard Brownian motions starting from 0 and
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationFast-slow systems with chaotic noise
Fast-slow systems with chaotic noise David Kelly Ian Melbourne Courant Institute New York University New York NY www.dtbkelly.com May 1, 216 Statistical properties of dynamical systems, ESI Vienna. David
More informationExistence and Uniqueness
Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect
More informationExercises in stochastic analysis
Exercises in stochastic analysis Franco Flandoli, Mario Maurelli, Dario Trevisan The exercises with a P are those which have been done totally or partially) in the previous lectures; the exercises with
More informationarxiv: v2 [math.co] 20 Jun 2018
ON ORDERED RAMSEY NUMBERS OF BOUNDED-DEGREE GRAPHS MARTIN BALKO, VÍT JELÍNEK, AND PAVEL VALTR arxiv:1606.0568v [math.co] 0 Jun 018 Abstract. An ordered graph is a pair G = G, ) where G is a graph and is
More information