arxiv: v1 [stat.ml] 5 Jan 2015

Size: px
Start display at page:

Download "arxiv: v1 [stat.ml] 5 Jan 2015"

Transcription

1 To appear on the Annals of Statistics SUPPLEMENTARY MATERIALS FOR INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION arxiv: v1 [stat.ml] 5 Jan 015 By Yingying Fan, Yinfei Kong, Daoi Li and Zemin Zheng University of Southern California APPENDIX A: PROOFS FOR PROPOSITION 1 AND MAIN LEMMAS In this section, we prove all key Lemmas in the order they appear in the main text. Additional Lemmas and their proofs are provided in Appendix B. Deviation of sub-gaussian distribution. Recall that a random vector w = (W 1,,W p ) T R p is sub-gaussian if there exist some positive constants a and b such that P( v T w > t) aexp( bt ) for any t > 0 and any vector v R p satisfying v = 1. Suppose w = (W 1,,W p ) is sub-gaussian with constants a, b, mean µ and covariance matrix Σ. Let w 1,,w n be n independent copies of w. Since w is sub-gaussian, by Lemma 10, we have w µ is also sub-gaussian. Then there exists some constants ã and b such that P( v T (w i µ)(w i µ) T v > x) ãexp( bx) for all x > 0 and v = 1, which implies that E{exp[tv T (w i µ)(w i µ) T v]} < for all 0 < t < b and v = 1. Similar as in the proof of Lemma 3 in [1], we know that there exist some constants C > 0 and ρ > 0 depending on ã and b, such that for all 0 < x < ρ and any unit vector v = 1, (A.1) P{ (1/n) v T (w i µ)(w i µ) T v v T Σv > x} Ce nxρ/. 1

2 Y. FAN, Y. KONG, D. LI AND Z. ZHENG A.1. Proof of Proposition 1. The main idea is to prove that our test statistic D converges to its population counterpart D uniformly over. Since Condition 3 ensures that D is bounded from below for A 1 and D = 0 for A c 1, the uniform convergence of D will imply the results in Proposition 1. We proceed to prove the uniform convergence of D to D. To this end, we decompose the difference between them into the following five terms: (A.) D D log( σ /σ ) log [ (1) +π ( σ ) /(σ (1) ) ] log [ () +(1 π) ( σ ) /(σ () ) ] + n 1 /n π log [ ( σ (1) ) ] + n /n (1 π) log [ ( σ () ) ]. We will establish successively the deviation bounds of the terms on the right hand side above. The same notation C will be used to denote a generic constant without loss of generality. By Lemma 7, the estimators σ converge to σ uniformly over all = 1,,p, with probability at least 1 pexp( C τ 1,p n1 κ ). Define this event as E. We will condition on the event E hereafter. Since x 1 n log(1+x n) 1 as x n 0, it follows that log( σ /σ )/( σ /σ 1) 1 uniformlyfor all as n. Thus,uniformlyover all = 1,,p, with sufficiently large n, we have the following bound for the first term log( σ /σ ) on the right hand side of (A.) (A.3) P( log( σ /σ ) > 4 1 cn κ E) P( σ /σ 1 > 8 1 cn κ E), where constants c and κ are defined in Condition 3. Then (A.3) together with (A.7) in the proof of Lemma 7 entails that (A.4) P( log( σ /σ ) > 4 1 cn κ E) exp( C τ 1,p n1 κ ). Using the same arguments, for either k = 1 or, we can prove that (A.5) P( log[( σ (k) ) /(σ (k) ) ] > 4 1 cn κ E) exp( C τ 1,p n1 κ ). By the proof of Lemma 6, we know that τ,p πλ max (Ω 1 )+(1 π)λ max (Ω 1 Σ Ω 1 ). Since ( σ (1) ) and ( σ () ) can be bounded from above by λ max (Ω 1 ) and λ max (Ω 1 Σ Ω 1 ), respectively, we know that ( σ (k) ) can be bounded from

3 INNOVATED INTERACTION SCREENING 3 above by π 1 (1 π) 1 τ,p for k = 1,. As n 1 = n i, by Hoeffding s inequality we get (A.6) P( n 1 /n π log[( σ (1) ) ] > 4 1 cn κ E) P( 1 cn κ i π > n 4 log[π 1 (1 π) 1 τ,p ] E) nc n κ exp( 16log [π 1 (1 π) 1 τ,p ] ) exp{ Cn1 κ /log ( τ,p )}. Similarly, we have (A.7) P( n /n (1 π) log[( σ () ) ] > 4 1 cn κ E) exp{ Cn 1 κ /log ( τ,p )}. In view of (A.), we have (A.8) P( D D > cn κ E) P( log( σ /σ ) > 4 1 cn κ E)+P( log[( σ (1) ) /(σ (1) ) ] > 4 1 cn κ E) + P( log[( σ (1) ) /(σ (1) ) ] > 4 1 cn κ E)+P( n 1 n π log[( σ(1) ) ] > 4 1 cn κ E) + P( n n (1 π) log[( σ() ) ] > 4 1 cn κ E). Combining the probability bounds in (A.4), (A.5), (A.6) and (A.7) gives P( D D > cn κ E) 3exp( C τ 1,p n1 κ )+exp{ Cn 1 κ /log ( τ,p )} It follows that exp{ Cn 1 κ /[ τ 1,p +log ( τ,p )]}. P( max 1 p D D > cn κ E) p P( D D > cn κ E) =1 pexp{ Cn 1 κ /[ τ 1,p +log ( τ,p )]}, which is the deviation of the statistic D from its population counterpart D. By Lemma 7, P(E c ) pexp( C τ 1,p n1 κ ). Thus we have P(max 1 p D D > cn κ ) P(max 1 p D D > cn κ E)+P(E c ) pexp{ Cn 1 κ /[ τ 1,p +log ( τ,p )]}.

4 4 Y. FAN, Y. KONG, D. LI AND Z. ZHENG Therefore, for any p satisfying logp = O(n γ ) with γ > 0, γ +κ < 1 and τ 1,p +log ( τ,p ) = o(n 1 κ γ ), it follows that P( max 1 p D D > cn κ ) exp{n γ Cn 1 κ /[ τ 1,p +log ( τ,p )]} exp{ Cn 1 κ /[ τ 1,p +log ( τ,p )]}. By Condition 3 and its discussion, we know that D 3cn κ when A 1, and D = 0 otherwise. It follows that {min D < cn κ } {max D > cn κ } { max D D > cn κ }, A 1 A c 1 {1,,p} whichshowsthatwithprobabilityatleast1 exp{ Cn 1 κ /[ τ 1,p +log ( τ,p )]}, min D cn κ and max D cn κ A 1 A c 1 for sufficiently large n. As the same conditions hold for the covariance matrix Σ and the data after the second transformation, the results above also apply to the covariates in A with the test statistics calculated based on data transformed by Ω. This completes the proof of Proposition 1. A.. Lemma 1 and its proof. Lemma 1. Under model setting () and conditions in Theorem 1, for sufficiently large n, with probability at least 1 pexp( C τ 1,p n1 κ ), it holds that max 1 p ˆσ / σ 1 T n,p/6 for some positive constant C. Proof. Let Z = 1 n ( z 1,, z p ) with z = n z i/n for the original data matrix without transformation. We will first bound the term ˆσ σ by writing it as ˆσ σ = e T [ Ω 1 (Z Z) T (Z Z) Ω 1 Ω 1 (Z Z) T (Z Z)Ω 1 ]e /n. Through a further decomposition, it gives ˆσ σ = et ( Ω 1 Ω 1 )(Z Z) T (Z Z)( Ω 1 Ω 1 )e /n (A.9) +e T ( Ω 1 Ω 1 )(Z Z) T (Z Z)Ω 1 e /n.

5 INNOVATED INTERACTION SCREENING 5 We will then bound the two terms on the right hand side above separately. Recall that max denotes the componentwise infinity norm for a matrix. For the first term, we have e T ( Ω 1 Ω 1 )(Z Z) T (Z Z)( Ω 1 Ω 1 )e /n (Z Z) T (Z Z) max ( Ω 1 Ω 1 )e 1 /n (Z Z) T (Z Z) max [(K p +K p) C 1 K p (logp)/n] /n = ( (Z Z) T (Z Z) max /n) [(K p +K p ) C 1 K4 p (logp)/n], where the second inequality follows from the definition of acceptable estimator and the fact that Ω 1 Ω 1 is (K p +K p )-sparse. For the second term, similarly we get e T ( Ω 1 Ω 1 )(Z Z) T (Z Z)Ω 1 e /n (Z Z) T (Z Z) max ( Ω 1 Ω 1 )e 1 Ω 1 e 1 /n ( (Z Z) T (Z Z) max /n) [(K p +K p )C 1K p (logp)/n] Kp Ω 1 max. Since Ω 1 max is assumed to be upper bounded in Condition 4, and the above two bounds are independent of the index, in view of (A.9), we know that there exists some constant C such that (A.10) max 1 p ˆσ σ C( (Z Z) T (Z Z) max /n)[(k p +K p )K3 p (logp)/n] max{(k p +K p)k p (logp)/n,1}. This together with the definition of T n,p before Theorem 1 ensures that max 1 p ˆσ σ C( (Z Z) T (Z Z) max /n) T n,p τ 1,p /( C 1 τ,p ). By Lemma 6, σ are uniformly bounded from below by τ 1,p. By Lemma 7, max 1 p σ /σ 1 cn κ /8. Combining these two results entails that for n large enough, σ (1 cn κ /8)σ > τ 1,p/, uniformly for all, with probability at least 1 pexp( C τ 1,p n1 κ ). Denote by E the event that the results in Lemma 7 hold. By Lemma 8, when C 1 1c 3 C, we have (A.11) P(max 1 p ˆσ / σ 1 > T n,p/6 E) P(max 1 p ˆσ σ > T n,p τ 1,p /1 E) P( (Z Z) T (Z Z) max /n > c 3 τ,p E) p exp( C n).

6 6 Y. FAN, Y. KONG, D. LI AND Z. ZHENG Under the conditions in Theorem 1, we have logp = o(n γ ) with γ > 0 and γ +κ < 1. It follows that P( max 1 p ˆσ / σ 1 > T n,p /6) P( max 1 p ˆσ / σ 1 > T n,p /6 E)+P(E c ) p exp( C n)+pexp( C τ 1,pn 1 κ ) = pexp( C τ 1,pn 1 κ ), where we use the same notation C here to denote a generic constant without lost of generality. This completes the proof of Lemma 1. A.3. Lemma and its proof. Lemma. Under Condition 5, we have (A.1) Cn 1 Xδ +pen( θ) n 1 ε T X δ 1 +pen(θ 0 ), where C is some positive constant depending on the positive constant π min in Condition 6, and δ = θ θ 0 is the estimation error for the regularized estimator θ defined in (18), and ε = y E(y X) with y = ( 1,, n ) T. Proof. Definel n (θ) = n 1 n l n(x T i θ, i)wherel n (x T θ, ) = x T θ+ log[1+exp(x T θ)]. Then, in matrix form, l n (θ) can be rewritten as l n (θ) = n 1 {y T Xθ 1 T b(xθ)}, where y = ( 1,, n ) T is an n-dimensional response vector with i {0,1}, X = (x 1,,x n ) T = ( x 1,, x p ) is an n p augmented design matrix, 1 is an n-dimensional vector with each component being one, b(β) = (b(β 1 ),,b(β n )) T is a vector-valued function with β i = x T i θ and b(u) = log[1+exp(u)]. By the definition of θ, we have l n (θ)+pen(θ) l n (θ 0 )+pen(θ 0 ) where θ 0 is the true regression coefficient vector of θ. Rearranging terms yields (A.13) n 1 1 T [b(x θ) b(xθ 0 )]+pen( θ) n 1 y T Xδ +pen(θ 0 ), where δ = θ θ 0 is the estimation error. Applying Taylor expansion to the function of 1 T b(x θ) at θ 0 gives (A.14) 1 T [b(x θ) b(xθ 0 )] = [b (Xθ 0 )] T Xδ + 1 δ T X T HXδ, where H = H(X, θ 1,, θ n ) = diag{b (x T 1 θ 1 ),,b (x T n θ n )} is an n n diagonal matrix with θ i R p lying on the line segment adoining θ and θ 0, i = 1,,n. Combining (A.13) and (A.14) and rearranging terms yield (A.15) (n) 1 δ T X T HXδ +pen( θ) n 1 ε T Xδ +pen(θ 0 ),

7 INNOVATED INTERACTION SCREENING 7 where ε = y E(y X) = y b (Xθ 0 ). The right hand side of the above inequality can be bounded as (A.16) n 1 ε T Xδ +pen(θ 0 ) n 1 ε T X δ 1 +pen(θ 0 ). By Condition 5, we have δ T X T HXδ C Xδ for some positive constant C, which depends on the constant π min in Condition 5. This inequality, together with (A.15) and (A.16), completes the proof. A.4. Lemma 3 and its proof. Lemma 3. Assume that Condition 1 holds. If log(p) = o(n), then with probability 1 O(p c 1 ), we have n 1 ε T X 1 c 0 log(p)/n, where c0 is some positive constant and ε = y E(y X) with y = ( 1,, n ) T. Proof. Recall that X = (x 1,,x n ) T = ( x 1,, x p ). An application of the Bonferroni inequality gives that (A.17) P( n 1 ε T X > λ 0 ) p =1 P( n 1 ε T x > λ 0 ) for any λ 0. The key idea is to bound P( n 1 ε T x > λ 0 ). To this end, consider the following three cases. Case 1: = 1. In this case, x = 1, where 1 is a n-dimensional vector with each component being one. Recall that ε = (ε 1,,ε n ) T with ε i = i E( i x i ) and i {0,1}. So 1 ε i 1 for i = 1,,n. Thus, by Hoeffding s inequality [3], we have P( n 1 ε T x > λ 0 ) = P( n 1 ε T 1 > λ 0 ) exp ( nλ 0 /). Case : p + 1. In this case, x = (Z 1, 1,,Z n, 1 ) T. Thus, n 1 ε T x = n 1 n ε iz i, 1. From Lemma 9, under Condition 1, we have that z = (Z 1,,Z p ) T is sub-gaussian, that is, there exist some positive constants a 1 and b 1 such that P( v T z > t) a 1 exp( b 1 t )

8 8 Y. FAN, Y. KONG, D. LI AND Z. ZHENG for any vector v R p satisfying v = 1 and any t > 0. Therefore, for any p + 1, taking v = e 1 being a unit vector with the ( 1) component being one and zero elsewhere in the inequality above gives P( Z 1 > t) a 1 exp( b 1 t ) holds uniformly for all p+1. By Lemma 11, we have E(e b 1Z 1 / ) 1+a 1. This, together with the inequality ab (a +b )/ for any a,b > 0, gives e (b 1/) ε i Z i, 1 e b 1(ε i +Z i, 1 )/4 = e b 1ε i /4 e b 1Z i, 1 /4 (A.18) (e b 1ε i / +e b 1Z i, 1 / )/ (e b 1/ +1+a 1 )/ for all 1 i n and p + 1, where we have used the fact that 1 ε i 1 in the last inequality. Thus, it follows from Lemma 1 that for any 0 < λ 0 1, there exist some positive constants a and b such that P( n 1 ε T x λ 0 ) = P( n 1 ε i Z i, 1 λ 0 ) a exp( b nλ 0) for all p+1. Case 3: p + p. In this case, x = (Z 1,k Z 1,l,,Z n,k Z n,l ) T for some 1 k l p. We can use similar arguments for Case to bound P( n 1 ε T x λ 0 ). Similarly to (A.18), we have e (b 1/) ε i Z ik Z il e (b 1/) Z ik Z il e b 1(Z ik +Z il )/4 = e b 1Z ik /4 e b 1Z il /4 (e b 1Z ik / +e b 1Z il / )/ 1+a 1, for all 1 i n and all 1 k l p. This together with Lemma 1 gives that for any 0 < λ 0 1, there exist some positive constants a 3 and b 3 such that P( n 1 ε T x λ 0 ) = P( n 1 ε i Z ik Z il λ 0 ) a 3 exp( b 3 nλ 0 ) for all p+ p. Combining Cases 1-3 above yields (A.19) P( n 1 ε T x λ 0 ) a 4 exp( b 4 nλ 0) for any 0 λ 0 1, wherea 4 = max{,a,a 3 } and b 4 = min{1/,b,b 3 }. Let λ 0 = 1 c 0 log(p)/n with some positive constant c0. Since log(p) = o(n),

9 INNOVATED INTERACTION SCREENING 9 we have 0 < λ 0 1 for all sufficiently large n. In view of (A.17) and (A.19), we have P( n 1 ε T X > 1 c 0 log(p)/n) pa4 exp{ 4 1 b 4 c 0log(p)} 3a 4 p (4 1 b 4 c 0 ), wherec 0 > 8/b 4 andwehaveusedthefactthat p = 1+p+p(p+1)/ 3p. Thus, we conclude that n 1 ε T X c 0 log(p)/n holds with probability at least 1 O(p c 1 ) with c 1 = 4 1 b 4 c 0 > 0. A.5. Lemma 4 and its proof. Lemma 4. Assume that there exists some constant φ > 0 such that (A.0) δ T Σδ φ δ T Sδ S for any δ R p satisfying δ S c 1 4(s 1/ +λ 1 1 λ θ 0 ) δ S, where Σ = E(x T x). If both z (1) and z () are sub-gaussian, 5s 1/ + 4λ 1 1 λ θ 0 = O(n ξ/ ), and log(p) = o(n 1/ ξ ) with constant 0 ξ < 1/4, then with probability at least 1 O(p c ), n 1/ Xδ (φ/) δ S holds for any δ R p satisfying δ S c 1 4(s 1/ +λ 1 1 λ θ 0 ) δ S when n is sufficiently large. Proof. The idea is to show that the desired inequality holds conditioning on an event and the probability of this event occurring is at most O(p c ) with some positive constant c. Conditioning the event E 4 = { n 1 X T X Σ } < C 1 n ξ where positive constant C 1 will be specified later, we have δ T (n 1 X T X Σ)δ < C 1 n ξ δ 1 = C 1n ξ ( δ S 1 + δ S c 1 ) C 1 n ξ (s 1/ δ S + δ S c 1 ), where the last inequality follows from the Cauchy-Schwarz inequality. The following arguments are all conditioning on the event E 4. Thus, for any δ R p satisfying δ S c 1 4(s 1/ +λ 1 1 λ θ 0 ) δ S, we obtain δ T (n 1 X T X Σ)δ C 1 n ξ (5s 1/ +4λ 1 1 λ θ 0 ) δ S.

10 10 Y. FAN, Y. KONG, D. LI AND Z. ZHENG Since 5s 1/ + 4λ 1 1 λ θ 0 = O(n ξ/ ), there exists some positive constant C such that 5s 1/ +4λ 1 1 λ θ 0 C n ξ/. Thus, δ T (n 1 X T X Σ)δ C 1 C δ S for any δ R p satisfying δ S c 1 4(s 1/ + λ 1 1 λ θ 0 ) δ S. Note that n 1 δ T X T Xδ = δ T (n 1 X T X Σ)δ+δ T Σδ. This, together with the above inequality and the assumption (A.0), yields n 1 δ T X T Xδ C 1 C δ S +δ T Σδ (φ C 1 C ) δ S. Choose C 1 = 3φ /(4C ). Thus, with probability 1 P(Ec 4 ), we have n 1/ Xδ (φ/) δ S for any δ R p satisfying δ S c 1 4(s 1/ +λ 1 1 λ θ 0 ) δ S. It remains to show that P(E4 c) O(p c ) with some positive constant c. For any matrix A, denote by A the entrywise matrix infinity norm of A and (A) kl the (k,l) entry of A. Since both z (1) and z () are sub-gaussian, by Lemmas 9, 10, 11, and 13, we have P(E4) c = P { n 1 X T X Σ K /(C)n ξ} p p k=1 l=1 P{ (n 1 X T X Σ) kl K /(C )n ξ } p aexp( 4 1 bc K n1/ ξ ) O(p c ) for all n sufficiently large, where a,b, c are positive constants and the last inequality holds since p = 1+p+p(p+1)/ and log(p) = o(n 1/ ξ ). This completes the proof of Lemma 4. A.6. Lemma 5 and its proof. Lemma 5. Assume that w = (W 1,,W p ) T R p is sub-gaussian. Then for any positive constant c 1, there exists some positive constant C such that { } P max W > C log(p) = O(p c 1 ). 1 p Proof. Since w = (W 1,,W p ) T is sub-gaussian, there exist some positive constants c 1 and c such that P( v T w > t) c 1 exp( c t ) for any

11 INNOVATED INTERACTION SCREENING 11 v R p satisfying v = 1 and any t > 0. Taking v be a unit vector with th component 1 and all other components 0 yields P( W > t) c 1 exp( c t ) for all 1 p. An application of the Bonferroni inequality gives P( max 1 p W > t) p P( W > t) p c 1 exp( c t ). =1 If we choose t = C log(p) with some positive constant C = 1+c 1 / c, } then we have P {max 1 p W > C log(p) = O(p c 1 ). This completes the proof of Lemma 5. APPENDIX B: PROOFS FOR SECONARY LEMMAS B.1. Lemma 6 and its proof. Lemma 6. Under Condition, it holds that τ 1,p λ min [cov( z)] λ max [cov( z)] τ,p, where τ 1,p = {πτ 1,p +(1 π)τ 1τ,p } 1 and τ,p = {πτ 1 1 +(1 π)τ 1 τ,p+ π(1 π)τ 1 µ 1 } exp(1). Proof. In order to prove Lemma 6, we will first calculate the covariance matrix of z = z (1) + (1 )z (). Recall that µ = E(z () ) = 0 and is a Bernoulli variable taking value 1 with probability π. We will apply the formula cov(z) = cov[e(z )] + E[cov(z )] to calculate the covariance matrix of z. For the first term cov[e(z )], we can calculate it as E(z ) = E(z (1) )+(1 )E(z () ) = µ 1, which gives cov[e(z )] = cov( )µ 1 µ T 1 = π(1 π)µ 1µ T 1. For the second term E[cov(z )], since =, (1 ) = 1 and (1 ) = 0, we have cov(z ) = E(zz T ) E(z )E(z ) T = {E(z (1) z (1)T ) µ 1 µ T 1}+(1 ) E(z () z ()T ) = Σ 1 +(1 )Σ.

12 1 Y. FAN, Y. KONG, D. LI AND Z. ZHENG After taking expectation on both sides, we get E[cov(z )] = πσ 1 +(1 π)σ. Thus we have cov(z) = πσ 1 +(1 π)σ +π(1 π)µ 1 µ T 1. Recall that z = Ω 1 z. It follows that cov( z) = Ω 1 cov(z)ω 1 = πω 1 +(1 π)ω 1 Σ Ω 1 +π(1 π)ω 1 µ 1 µ T 1Ω 1. Therefore, under Condition, we get λ max [cov( z)] πλ max (Ω 1 )+(1 π)λ max (Ω 1 Σ Ω 1 )+π(1 π)λ max (Ω 1 µ 1 µ T 1 Ω 1) πτ 1 1 +(1 π)τ 1 τ,p +π(1 π)τ 1 µ 1 ; λ min [cov( z)] πλ min (Ω 1 )+(1 π)λ min (Ω 1 Σ Ω 1 )+π(1 π)λ min (Ω 1 µ 1 µ T 1 Ω 1) πτ 1,p +(1 π)τ 1τ,p. It completes the proof of Lemma 6. B.. Lemma 7 and its proof. Lemma 7. Under model setting () and conditions in Proposition 1, for sufficiently large n, with probability at least 1 pexp( C τ 1,p n1 κ ), it holds that max 1 p σ /σ 1 cn κ /8, for some positive constant C, where c and κ are constants defined in Condition 3. Proof. Wewillfirstdecompose σ σ intoseveral terms,andthenprove deviation bounds for each term. Denote Ω 1 by the operator norm of Ω 1. Note that under Condition, Ω 1 is bounded from above by constant τ 1 1. So z(k) = Ω 1 z (k) for k = 1, are also sub-gaussian distributed. Recall that Z = ZΩ 1 is the transformed data matrix. Denote by Z = ( z i ) n p. Then z i are independent and identically distributed across i with mixture sub-gaussian distribution and variance σ. Since σ is the pooled sample variance estimate for the th transformed feature Z, we have σ = ( z i z ) /n,

13 INNOVATED INTERACTION SCREENING 13 where z = n z i/n is the pooled sample mean estimate for Z. Let µ = E( z i ) and µ (1) = E( z (1) i ). It is clear that µ = π µ (1). By some simple calculation, we have the following decomposition for σ σ, σ σ = ([ z i µ ] σ )/n ( z µ ). Since z i = i z (1) i +(1 i ) z () i, we have z i µ = i ( z (1) i µ (1) )+(1 i ) z () i +( i π) µ (1), where z (1) i µ(1) and z () i are sub-gaussian distributed with mean 0 and variances (σ (1) ) and (σ () ), respectively. By the proof of Lemma 6, we know that σ = π(σ(1) ) + (1 π)(σ () ) + π(1 π)( µ (1) ). As i = i, (1 i ) = 1 i and i (1 i ) = 0, by replacing z i with i z (1) i +(1 i) z () i and spreading out the terms, we can further decompose σ σ as σ σ =1 n { i [( z (1) i ) (σ (1) ) ]+(1 i )[( z () i ) (σ () ) ] +[( i π) π(1 π)]( µ (1) ) +( i π)[(σ (1) ) (σ () ) ] +( i π) µ (1) [ i ( z (1) i )+(1 i ) z () i ]} ( z µ ). Let S = {1 i n : i = 1} and ε n be any positive sequence such that ε n 0 and ε n τ 1,p 0 as n. It follows from the above decomposition

14 14 Y. FAN, Y. KONG, D. LI AND Z. ZHENG that (A.1) P( σ σ > ε nσ /) P(1 n i S +P( 1 n i S c [( z () i ) (σ () ) ] > ε nσ 1 ) [( z (1) i ) (σ (1) ) ] > ε nσ 1 ) +P( 1 n [( i π) π(1 π)]( µ (1) ) > ε nσ 1 ) +P( 1 n ( +P( 1 n i nπ)[(σ (1) ) (σ () ) ] > ε nσ 1 ) ( i π) µ (1) [ i ( z (1) i )+(1 i ) z () i ] > ε nσ 1 ) +P{( z µ ) > ε nσ 1 }. We will bound the six terms on the right hand side above one by one using some deviation results. Thesame notation C will beusedto denote a generic positive constant without loss of generality. Recall that n 1 = n i and n = n n 1. By Lemma 6, σ are uniformly boundedfrombelowby τ 1,p.Thus,bythedeviationofsub-Gaussianin(A.1), conditioning on any realization of = ( 1,, n ) T, we obtain P( 1 n i S P( 1 n 1 i S exp( Cε n τ 1,p [( z (1) i ) (σ (1) ) ] > ε nσ 1 ) [( z (1) i ) (σ (1) ) ] > ε n τ 1,p 1 ) Cexp( ρε n τ 1,pn 1 /1 ) i ). Since ε n τ 1,p 0 as n and n i is a Binomial random variable with probability of success π, for sufficiently large n such that ε n τ 1,p is small

15 INNOVATED INTERACTION SCREENING 15 enough, taking expectation on both sides above yields (A.) P( 1 n i S [( z (1) i ) (σ (1) ) ] > ε nσ 1 ) E{exp( Cε n τ 1,p = {1 π +πexp( Cε n τ 1,p )}n = exp{nln[1 π +πexp( Cε n τ 1,p )]} i )} exp{ nπ[1 exp( Cε n τ 1,p )]} exp( nπcε n τ 1,p ) = exp( Cε n τ 1,p n), where we have used the inequalities that log(1+x) x for any x > 1, and 1 exp( x) x for sufficiently small x > 0. This gives an upper bound on the first term in (A.1). Similar to (A.), the second term in (A.1) can be bounded as (A.3) P( 1 n i S c [( z () i ) (σ () ) ] > ε nσ 1 ) exp( Cε n τ 1,p n). As z (1) is sub-gaussian distributed, it follows from Lemma 11 that µ (1) are uniformly bounded from above by some positive constant across. Since i are independently Bernoulli distributed with success probability π, we know that E{( i π) } = π(1 π) and ( i π) are i.i.d. and bounded from above by 1. By Hoeffding s inequality, we have the following bound for the third term in (A.1), (A.4) P( 1 n [( i π) π(1 π)]( µ (1) ) > ε nσ 1 ) P( 1 n ( i π) π(1 π) > ε n τ 1,p ε 1( µ (1) exp( n τ 1,p n ) ) 1 ( µ (1) ) 4) exp( Cε n τ 1,pn). Due to the fact that σ = π(σ(1) ) + (1 π)(σ () ) + π(1 π)( µ (1) ), we have σ π(1 π)[(σ (1) ) + (σ () ) ]. Similar to (A.4), by applying

16 16 Y. FAN, Y. KONG, D. LI AND Z. ZHENG Hoeffding s inequality, the fourth term in (A.1) can be bounded as (A.5) P( 1 n ( i nπ)[(σ (1) ) (σ () ) ] > ε nσ 1 ) P( 1 σ i π [ n π(1 π) ] > ε nσ 1 ) P( 1 n exp( nπ (1 π) ε n 1 ) exp( Cε n τ 1,p n). For the fifth term in (A.1), we first decompose it as (A.6) P( 1 n i π > π(1 π)ε n ) 1 ( i π) µ (1) [ i ( z (1) i )+(1 i ) z () i ] > ε nσ 1 ) P( 1 n (1 π)( z (1) i ) π z () i ] > ε nσ i S i S c 1 µ (1) ) P( 1 n (1 π)( z (1) i ) > ε nσ i S 4 µ (1) )+P( 1 n π z () i > ε nσ i S c 4 µ (1) ). Recall that σ π(σ(1) ) and max 1 p µ (1) can be bounded from above by some positive constant. Applying Bernstein s inequality to the sum of independent sub-gaussian random variables yields P( 1 n (1 π)( z (1) i ) > ε nσ i S 4 µ (1) ) P( 1 ( z (1) i ) πεn σ > n 1 i S σ (1) 48(1 π) µ (1) ) πε exp( n τ 1,pn 1 96 (1 π) ( µ (1) ) ) exp( Cεn τ 1,p i ). Applying the same argument as in (A.), taking expectation on both sides above then we can obtain P( 1 n i S (1 π)( z (1) i ) > ε nσ 4 µ (1) ) exp( Cε n τ 1,p n).

17 INNOVATED INTERACTION SCREENING 17 Similarly we have ( 1 P n π z () i > ε ) nσ i S c 4 µ (1) exp( Cε n τ 1,pn). In view of (A.6), the two bounds we obtained yield (A.7) P( 1 n ( i π) µ (1) [ i ( z (1) i )+(1 i ) z () i ] > ε nσ 1 ) exp( Cε n τ 1,pn), which is the upper bound for the fifth term in (A.1). Applying Bernstein s inequality similarly to the sixth term in (A.1) gives P{( z µ ) > ε nσ 1 } P( z µ εn > σ 1 ) exp( ε nn 4 ). Combining the six bounds for the terms in (A.1) that we have obtained yields (A.8) P( σ /σ 1 > ε n /) P( σ σ > ε n σ/) exp( Cε n τ 1,pn). It follows that (A.9) P(max 1 p σ /σ 1 > ε n/) P( σ /σ 1 > ε n/) 1 p pexp( Cε n τ 1,p n). Let ε n = cn κ /4 with constants c and κ defined in Condition 3. Since τ 1,p 1, we know that ε n τ 1,p 0 as n. Thus we can replace ε n with cn κ /4 in (A.9) and get (A.30) P( max 1 p σ /σ 1 > cn κ /8) pexp( C τ 1,pn 1 κ ). This completes the proof of Lemma 7. B.3. Lemma 8 and its proof. Lemma 8. Under model setting() and Condition, for sufficiently large n, with probability at least 1 p exp( C n), it holds that (Z Z) T (Z Z) max /n c 3 τ,p for some positive constants C and c 3 >, where Z = 1 n ( z 1,, z p ) with z = n z i/n and 1 n the n 1 column vector with all components 1.

18 18 Y. FAN, Y. KONG, D. LI AND Z. ZHENG Proof. Thedeviation boundof (Z Z) T (Z Z) max /n can beobtained by bounding each component of (Z Z) T (Z Z). Recall that µ = E( z ) = πµ (1) with µ (1) = E(z (1) i ). Note that the th diagonal component of (Z Z) T (Z Z)/n can be bounded as (A.31) (z i z ) /n = (z i µ ) /n ( z µ ) (z i µ ) /n, Let Z = (z i ) n p, then the components of each row in Z are independent and identically distributed (i.i.d.) with a mixture sub-gaussian distribution which can be written as z i = i z (1) i + (1 i )z () i, where i are i.i.d. Bernoulli random variables. Denote by Σ 1 = (σ (1) i ) p p and Σ = (σ () i ) p p. Then for each = 1,,p, the random variables z (1) i µ (1) and z () i are independent across i with mean 0 and variances σ (1) and σ (), respectively. We will then bound n (z i µ ) /n by some deviation results. The same notation C will be used to denote a generic constant without loss of generality. For any 1 i n, we have the following decomposition for n (z i µ ) /n, (z i µ ) /n = n 1 { i (z (1) (A.3) i µ (1) ) +(1 i )(z () i ) +( i π) (µ (1) +( i π)µ (1) [ i (z (1) i µ (1) )+(1 i )(z () i )]}. Recall that S = {1 i n : i = 1}, n 1 = n i and n = n n 1. By Condition, both σ (1) and σ () can be uniformly bounded from above by τ,p across. For the first term in (A.3) and any positive constant c 3, applying the argument with sub-gaussian deviation and Taylor expansion similarly as in (A.) gives P(n 1 { i (z (1) P(n 1 i S i µ (1) ) } > τ,p +c 3 ) P(n 1 i S [(z (1) i µ (1) ) σ (1) ] > c 3) exp( C n). Similarly, we have for the second term in (A.3), P(n 1 (1 i )(z () i ) > τ,p +c 3 ) exp( C n). ) (z (1) i µ (1) ) > σ (1) +c 3)

19 INNOVATED INTERACTION SCREENING 19 Since µ (1) are uniformly bounded from above by some positive constant across from Lemma 11, similar to (A.4), by Hoeffding s inequality, we have the following bound for the third term in (A.3), P{n 1 ( i π) (µ (1) ) > π(1 π)(µ (1) ) +c 3 } exp( C n). For the last term in (A.3), similar to (A.7), by Bernstein s inequality we have P{n 1 ( i π)µ (1) [ i (z (1) i µ (1) )+(1 i )(z () i )] > c 3} exp( C n). In view of (A.3), by the similar argument as in (A.1), combining the four bounds we have obtained gives P( (z i µ ) /n > τ,p +C 3 ) exp( C n), where C 3 = π(1 π)(µ (1) ) + 4c 3. As shown in (A.31), n (z i z ) /n can be bounded from above by n (z i µ ) /n. Thus we have P( (z i z ) /n > τ,p +C 3 ) exp( C n). Then for the (i,)th component of (Z Z) T (Z Z), by the Cauchy- Schwarz inequality, it follows that P( (z ki z i )(z k z )/n > τ,p +C 3 ) k=1 P{[n 1 (z ki z i ) ][n 1 (z k z ) ] > [τ,p +C 3 ] } k=1 k=1 P{n 1 (z ki z i ) > τ,p +C 3 }+P{n 1 (z k z ) > τ,p +C 3 } k=1 exp( C n). Sincetheaboveinequalityholdsforany(i,)th componentof(z Z) T (Z Z), similar to (A.9), we conclude that k=1 P( (Z Z) T (Z Z) max /n > τ,p +C 3 ) p exp( C n)

20 0 Y. FAN, Y. KONG, D. LI AND Z. ZHENG As τ,p is allowed to diverge, we can choose some constant c 3 > such that c 3 τ,p > τ,p +C 3, which gives (A.33) P( (Z Z) T (Z Z) max /n > c 3 τ,p ) p exp( C n). It completes the proof of Lemma 8. B.4. Lemma 9 and its proof. Lemma 9. Assume that z (1) R p and z () R p are sub-gaussian, and follows a Bernoulli distribution with probability of success π. Let z = z (1) +(1 )z (). Then z is also sub-gaussian. Proof. Since z (1) R p is sub-gaussian, there exists some positive constants a 1 and b 1 such that P( v T z (1) > t) a 1 exp( b 1 t ) for any vector v R p satisfying v = 1 and any t > 0. Similarly, there exists some positive constants a and b such that P( v T z () > t) a exp( b t ) for any vector v R p satisfying v = 1 and any t > 0 since z () R p is sub-gaussian. Let a 3 = max{a 1,a } and b 3 = min{b 1,b }. Then we have P( v T z (k) > t) a 3 exp( b 3 t ) for k = 1,. This, together with the law of total probability, yields P( v T z > t) =P( v T z > t = 1)P( = 1)+P( v T z > t = 0)P( = 0) =πp( v T z (1) > t)+(1 π)p( v T z () > t) a 3 exp( b 3 t ). Thus z is sub-gaussian. B.5. Lemma 10 and its proof. Lemma 10. Ifarandomvectorw=(W 1,,W p ) T R p issub-gaussian, then so is w E(w). Proof. If w is sub-gaussian, then there exist some positive constants a 1 and b 1 such that P( v T w > t) a 1 exp( b 1 t )

21 INNOVATED INTERACTION SCREENING 1 for any vector v R p satisfying v = 1 and any t > 0. Let µ = E(w). Thus from the Cauchy-Schwarz inequality and Lemma 11, we have (v T µ) = E (v T w) E[(v T w) ] (1+a 1 )c 1 with c = b 1 /. Note that ( a b ) ( a + b ) (a +b ) for any a and b. Thus, for any vector v R p satisfying v = 1 and any t > 0, P{ v T [w E(w)] > t} P{ v T w + v T µ > t} P{(v T w) +(v T µ) > t } P{e c(vt w) > e ct / c(v T µ) } e ct /+c(v T µ) E[e c(vt w) ] (1+a 1 )e 1+a1 e ct / = a exp( b t ) with a = (1 + a 1 )e 1+a 1 and b = c/ = b 1 /4. Thus w E(w) is sub- Gaussian. B.6. Lemma 11 and its proof. Lemma 11. Let W be a nonnegative random variable such that P(W > t) aexp( bt ) for any t > 0, where a and b are positive constants. Then E[exp( 1 bw )] 1 + a and E(W m ) (1 + a)(/b) m m! for any integer m 0. Proof. Denote by F(t) the cumulative distribution function of W. Then for all x > 0, we have 1 F(t) = P(W > t) aexp( bt ). For any constant 0 < c < b, integration by parts yields E(e cw ) = e ct d[1 F(t)] = 1+ cate (b c)t dt = 1+ ca b c. Thus letting c = b/ gives E[exp( 1 bw )] 1+a. Note that m b m E(W m ) m! k=0 0 cte ct [1 F(t)]dt E( 1 bw ) k = E[exp( 1 bw )] 1+a. k! This implies E(W m ) (1+a)(/b) m m! for any integer m 0. This completes the proof of Lemma 11.

22 Y. FAN, Y. KONG, D. LI AND Z. ZHENG B.7. Lemma 1. Lemma 1 (Lemma 8 of Hao and Zhang []). Let W 1,,W n be independent random variables with zero mean. If E[exp(T 0 W i α )] c 1 for some constants T 0 > 0, c 1 > 0 and 0 < α 1, then there exist positive constants c and c 3 such that for any 0 < ε 1. P( n 1 W i > ε) c exp( c 3 n α ε ) B.8. Lemma 13 and its proof. Lemma 13. Assume that max 1 p E[exp( c 1 Z1 )] c holds with some positive constants c 1 and c. If for each 1 p, the random variables Z 1,,Z n are independent and identically distributed, then { } P n 1 [Z i E(Z i )] ε c 3 exp( c 4 nε ) { } P n 1 [Z i Z ik E(Z i Z ik )] ε c 3 exp( c 4 nε ) { } P n 1 [Z i Z ik Z il E(Z i Z ik Z il )] ε c 3 exp( c 4 n /3 ε ) { } P n 1 [Z ik Z il Z ik Z il E(Z ik Z il Z ik Z il )] ε c 3 exp( c 4 n 1/ ε ) for any 0 < ε < 1, where 1,k,l,k,l p, and c 3 and c 4 are generic positive constants which may vary from line to line. Proof. The idea of the proof is to use Lemma 1 and similar to that for Lemma 9 in Hao and Zhang []. So we omit the details here. REFERENCES [1] Cai, T., Zhang, C.-H., and Zhou, H.(008). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38, [] Hao, N. and Zhang, H. H. (014). Interaction Screening for Ultra-High Dimensional Data. J. Amer. Statist. Assoc., to appear. [3] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58,

23 INNOVATED INTERACTION SCREENING 3 Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, CA USA fanyingy@marshall.usc.edu daoili@marshall.usc.edu Department of Mathematics University of Southern California Los Angeles, CA USA zeminzhe@usc.edu Department of Preventive Medicine Keck School of Medicine University of Southern California Los Angeles, CA USA yinfeiko@usc.edu

SUPPLEMENTARY MATERIAL TO INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION

SUPPLEMENTARY MATERIAL TO INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION IPDC SUPPLEMENTARY MATERIAL TO INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION By Yinfei Kong, Daoji Li, Yingying Fan and Jinchi Lv California State University

More information

INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION 1

INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION 1 The Annals of Statistics 2015, Vol. 43, No. 3, 1243 1272 DOI: 10.1214/14-AOS1308 Institute of Mathematical Statistics, 2015 INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang

Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang The supplementary material contains some additional simulation

More information

Assumptions for the theoretical properties of the

Assumptions for the theoretical properties of the Statistica Sinica: Supplement IMPUTED FACTOR REGRESSION FOR HIGH- DIMENSIONAL BLOCK-WISE MISSING DATA Yanqing Zhang, Niansheng Tang and Annie Qu Yunnan University and University of Illinois at Urbana-Champaign

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

arxiv: v1 [stat.me] 11 May 2016

arxiv: v1 [stat.me] 11 May 2016 Submitted to the Annals of Statistics INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION arxiv:1605.03315v1 [stat.me] 11 May 2016 By Yinfei Kong, Daoji Li, Yingying

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions H. V. Poor Princeton University March 17, 5 Exercise 1: Let {h k,l } denote the impulse response of a

More information

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

04. Random Variables: Concepts

04. Random Variables: Concepts University of Rhode Island DigitalCommons@URI Nonequilibrium Statistical Physics Physics Course Materials 215 4. Random Variables: Concepts Gerhard Müller University of Rhode Island, gmuller@uri.edu Creative

More information

Universal examples. Chapter The Bernoulli process

Universal examples. Chapter The Bernoulli process Chapter 1 Universal examples 1.1 The Bernoulli process First description: Bernoulli random variables Y i for i = 1, 2, 3,... independent with P [Y i = 1] = p and P [Y i = ] = 1 p. Second description: Binomial

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Gaussian Elimination for Linear Systems

Gaussian Elimination for Linear Systems Gaussian Elimination for Linear Systems Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 3, 2011 1/56 Outline 1 Elementary matrices 2 LR-factorization 3 Gaussian elimination

More information

Optimal rates of convergence for estimating Toeplitz covariance matrices

Optimal rates of convergence for estimating Toeplitz covariance matrices Probab. Theory Relat. Fields 2013) 156:101 143 DOI 10.1007/s00440-012-0422-7 Optimal rates of convergence for estimating Toeplitz covariance matrices T. Tony Cai Zhao Ren Harrison H. Zhou Received: 17

More information

Correlation dimension for self-similar Cantor sets with overlaps

Correlation dimension for self-similar Cantor sets with overlaps F U N D A M E N T A MATHEMATICAE 155 (1998) Correlation dimension for self-similar Cantor sets with overlaps by Károly S i m o n (Miskolc) and Boris S o l o m y a k (Seattle, Wash.) Abstract. We consider

More information

Tail bound inequalities and empirical likelihood for the mean

Tail bound inequalities and empirical likelihood for the mean Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, 2011 R. M. Dudley 1 A. Dvoretzky, J. Kiefer, and J. Wolfowitz 1956 proved the Dvoretzky Kiefer Wolfowitz

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Selected Exercises on Expectations and Some Probability Inequalities

Selected Exercises on Expectations and Some Probability Inequalities Selected Exercises on Expectations and Some Probability Inequalities # If E(X 2 ) = and E X a > 0, then P( X λa) ( λ) 2 a 2 for 0 < λ

More information

ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL

ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL Statistica Sinica 19(29): Supplement S1-S1 ESTIMATION IN A SEMI-PARAMETRIC TWO-STAGE RENEWAL REGRESSION MODEL Dorota M. Dabrowska University of California, Los Angeles Supplementary Material This supplement

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

5: MULTIVARATE STATIONARY PROCESSES

5: MULTIVARATE STATIONARY PROCESSES 5: MULTIVARATE STATIONARY PROCESSES 1 1 Some Preliminary Definitions and Concepts Random Vector: A vector X = (X 1,..., X n ) whose components are scalarvalued random variables on the same probability

More information

Invertibility of symmetric random matrices

Invertibility of symmetric random matrices Invertibility of symmetric random matrices Roman Vershynin University of Michigan romanv@umich.edu February 1, 2011; last revised March 16, 2012 Abstract We study n n symmetric random matrices H, possibly

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Random Variables. P(x) = P[X(e)] = P(e). (1)

Random Variables. P(x) = P[X(e)] = P(e). (1) Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

arxiv: v1 [math.pr] 22 Dec 2018

arxiv: v1 [math.pr] 22 Dec 2018 arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

{η : η=linear combination of 1, Z 1,, Z n

{η : η=linear combination of 1, Z 1,, Z n If 3. Orthogonal projection. Conditional expectation in the wide sense Let (X n n be a sequence of random variables with EX n = σ n and EX n 0. EX k X j = { σk, k = j 0, otherwise, (X n n is the sequence

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs M. Huebner S. Lototsky B.L. Rozovskii In: Yu. M. Kabanov, B. L. Rozovskii, and A. N. Shiryaev editors, Statistics

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Lecture 22 Girsanov s Theorem

Lecture 22 Girsanov s Theorem Lecture 22: Girsanov s Theorem of 8 Course: Theory of Probability II Term: Spring 25 Instructor: Gordan Zitkovic Lecture 22 Girsanov s Theorem An example Consider a finite Gaussian random walk X n = n

More information

Spectral representations and ergodic theorems for stationary stochastic processes

Spectral representations and ergodic theorems for stationary stochastic processes AMS 263 Stochastic Processes (Fall 2005) Instructor: Athanasios Kottas Spectral representations and ergodic theorems for stationary stochastic processes Stationary stochastic processes Theory and methods

More information

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX MARK RUDELSON AND ROMAN VERSHYNIN Abstract. We prove an optimal estimate on the smallest singular value of a random subgaussian matrix, valid

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

arxiv: v1 [math.ap] 17 May 2017

arxiv: v1 [math.ap] 17 May 2017 ON A HOMOGENIZATION PROBLEM arxiv:1705.06174v1 [math.ap] 17 May 2017 J. BOURGAIN Abstract. We study a homogenization question for a stochastic divergence type operator. 1. Introduction and Statement Let

More information

AP Calculus Testbank (Chapter 9) (Mr. Surowski)

AP Calculus Testbank (Chapter 9) (Mr. Surowski) AP Calculus Testbank (Chapter 9) (Mr. Surowski) Part I. Multiple-Choice Questions n 1 1. The series will converge, provided that n 1+p + n + 1 (A) p > 1 (B) p > 2 (C) p >.5 (D) p 0 2. The series

More information

THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD

THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD ANDREW V. SILLS AND CHARLES W. CHAMP Abstract. In Grubbs and Weaver (1947), the authors suggest a minimumvariance unbiased estimator for

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX

THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX THE SMALLEST SINGULAR VALUE OF A RANDOM RECTANGULAR MATRIX MARK RUDELSON AND ROMAN VERSHYNIN Abstract. We prove an optimal estimate of the smallest singular value of a random subgaussian matrix, valid

More information

ONLINE APPENDIX TO: NONPARAMETRIC IDENTIFICATION OF THE MIXED HAZARD MODEL USING MARTINGALE-BASED MOMENTS

ONLINE APPENDIX TO: NONPARAMETRIC IDENTIFICATION OF THE MIXED HAZARD MODEL USING MARTINGALE-BASED MOMENTS ONLINE APPENDIX TO: NONPARAMETRIC IDENTIFICATION OF THE MIXED HAZARD MODEL USING MARTINGALE-BASED MOMENTS JOHANNES RUF AND JAMES LEWIS WOLTER Appendix B. The Proofs of Theorem. and Proposition.3 The proof

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

arxiv: v6 [math.st] 25 Nov 2010

arxiv: v6 [math.st] 25 Nov 2010 Confidence Interval for the Mean of a Bounded Random Variable and Its Applications in Point arxiv:080.458v6 [math.st] 5 Nov 010 Estimation Xinjia Chen November, 010 Abstract In this article, we derive

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

A simple branching process approach to the phase transition in G n,p

A simple branching process approach to the phase transition in G n,p A simple branching process approach to the phase transition in G n,p Béla Bollobás Department of Pure Mathematics and Mathematical Statistics Wilberforce Road, Cambridge CB3 0WB, UK b.bollobas@dpmms.cam.ac.uk

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

MULTI-RESTRAINED STIRLING NUMBERS

MULTI-RESTRAINED STIRLING NUMBERS MULTI-RESTRAINED STIRLING NUMBERS JI YOUNG CHOI DEPARTMENT OF MATHEMATICS SHIPPENSBURG UNIVERSITY SHIPPENSBURG, PA 17257, U.S.A. Abstract. Given positive integers n, k, and m, the (n, k)-th m- restrained

More information

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n Large Sample Theory In statistics, we are interested in the properties of particular random variables (or estimators ), which are functions of our data. In ymptotic analysis, we focus on describing the

More information

7. MULTIVARATE STATIONARY PROCESSES

7. MULTIVARATE STATIONARY PROCESSES 7. MULTIVARATE STATIONARY PROCESSES 1 1 Some Preliminary Definitions and Concepts Random Vector: A vector X = (X 1,..., X n ) whose components are scalar-valued random variables on the same probability

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK

Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection. Sung Y. Park CUHK Econ 5150: Applied Econometrics Dynamic Demand Model Model Selection Sung Y. Park CUHK Simple dynamic models A typical simple model: y t = α 0 + α 1 y t 1 + α 2 y t 2 + x tβ 0 x t 1β 1 + u t, where y t

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

On the convergence of sequences of random variables: A primer

On the convergence of sequences of random variables: A primer BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM May 2012 2 A sequence a :

More information

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term EE 150 - Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko JPL Third Term 2011-2012 Due on Thursday May 3 in class. Homework Set #4 1. (10 points) (Adapted

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES

MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES J. Korean Math. Soc. 47 1, No., pp. 63 75 DOI 1.4134/JKMS.1.47..63 MOMENT CONVERGENCE RATES OF LIL FOR NEGATIVELY ASSOCIATED SEQUENCES Ke-Ang Fu Li-Hua Hu Abstract. Let X n ; n 1 be a strictly stationary

More information

Chapter 6: Random Processes 1

Chapter 6: Random Processes 1 Chapter 6: Random Processes 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION 1

INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION 1 The Annals of Statistics 017, Vol. 45, No., 897 9 DOI: 10.114/16-AOS1474 Institute of Mathematical Statistics, 017 INTERACTION PURSUIT IN HIGH-DIMENSIONAL MULTI-RESPONSE REGRESSION VIA DISTANCE CORRELATION

More information

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS

SUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,

More information

Chapter 4 : Expectation and Moments

Chapter 4 : Expectation and Moments ECE5: Analysis of Random Signals Fall 06 Chapter 4 : Expectation and Moments Dr. Salim El Rouayheb Scribe: Serge Kas Hanna, Lu Liu Expected Value of a Random Variable Definition. The expected or average

More information

A remark on the maximum eigenvalue for circulant matrices

A remark on the maximum eigenvalue for circulant matrices IMS Collections High Dimensional Probability V: The Luminy Volume Vol 5 (009 79 84 c Institute of Mathematical Statistics, 009 DOI: 04/09-IMSCOLL5 A remark on the imum eigenvalue for circulant matrices

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information