arxiv: v1 [math.st] 27 Sep 2018

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 27 Sep 2018"

Transcription

1 Robust covariance estimation under L 4 L 2 norm equivalence. arxiv: v [math.st] 27 Sep 208 Shahar Mendelson ikita Zhivotovskiy Abstract Let X be a centered random vector taking values in R d and let Σ = EX X) be its covariance matrix. We show that if X satisfies an L 4 L 2 norm equivalence, there is a covariance estimator ˆΣ that exhibits the optimal performance one would expect had X been a gaussian vector. The procedure also improves the current state-of-the-art regarding high probability bounds in the subgaussian case sharp results were only known in expectation or with constant probability). In both scenarios the new bound does not depend explicitly on the dimension d, but rather on the effective rank of the covariance matrix Σ. Introduction The question of estimating the covariance of a random vector has been studied extensively in recent years see, e.g., [2, 4, 0,, 2] and references therein). To formulate the problem, let X be a zero mean random vector taking its values in R d and denote the covariancematrix by Σ = EX X). Given a samplex,...,x consisting ofindependent random vectors that are distributed according to X, the goal is to select a matrix ˆΣ that approximates Σ. While there are various notions of approximation, the focus of this note is on approximation with respect to the l 2 l 2 ) operator norm, which from here on we denote by. One way of viewing the question of covariance estimation with respect to any norm), is as a mean estimation problem. Indeed, if one sets W = X X, then EW = Σ, and since one is given a sample X,...,X, the vectors X i X i ) are independent copies of W. Thus, a matrix Ŵ that is a good approximation of the mean EW with respect to the underlying norm is a solution to the problem of estimating the covariance of X with respect to that norm. An immediate outcome of this simple observation is that the empirical mean ˆΣ = W i = X i X i, which is the trivial choice for estimating the true mean, is a poor estimator unless the random vector W has a nice tail behavior see, for example, the discussion in [5]). An example of a positive result of that flavour is Theorem 9 in [2], and to formulate it we need the following definition. Mathematical Sciences Institute, The Australian ational University and Department of Mathematics, Technion, I.I.T, shahar.mendelson@gmail.com Department of Mathematics, Technion, I.I.T, Skoltech and Higher School of Economics nikita.zhivotovskiy@phystech.edu

2 Definition.. The effective rank of a positive semidefinite square matrix A R d d is given by ra) = TrA) A..) Clearly, ra) d but the gap between ra) and d may be substantial. Theorem.2 [2]). For every L there exists a constant cl) for which the following holds. Let X be an L-subgaussian random vector. Then with probability at least δ rσ) X i X i Σ cl) Σ + rσ) ) log2/δ) + + log2/δ).2) It was shown in [2] that if G is a zero mean gaussian vector and in particular it satisfies the conditions of Theorem.2) with covariance Σ then E { } rσ) G i G i Σ Σ max, rσ). Hence, there is no room for improvement in the deviation estimate of the empirical mean from the true one at the constant confidence level. Of course, that does not imply that the empirical mean is an optimal covariance estimator even for a gaussian vector and at a constant level of confidence. Moreover, as we explain in what follows, there are far better covariance estimators than.2) when the confidence parameter δ is small. Just as in the one-dimensional mean-estimation problem, once the problem is more heavy-tailed the performance of the empirical mean deteriorates quickly and a different procedure has to be used. And that is also the case for covarianceestimation. The current state-of-the-art for covariance estimation in heavy-tailed situation is [2] see Corollary 4. there and similar results in [0, ]), in which X is assumed to satisfy an L 4 L 2 norm equivalence. Definition.3. A random vector X with mean µ satisfies an L 4 L 2 norm equivalence with constant L if for every t R d, E X µ,t 4 ) /4 LE X µ,t 2 ) /2. Remark.4. ote that if X is L-subgaussian then it satisfies an L 4 L 2 norm equivalence with constant 2L. Theorem.5 [2]). For every L there are constants cl) and c L) that depend only on L and for which the following holds. Let X satisfy an L 4 L 2 norm equivalence with constant L. For 0 < δ < there is an estimator Σ δ that satisfies rσ) Σ δ Σ cl) Σ logd+log/δ)).3) with probability at least δ, provided that c L)rΣ)logd+log2/δ)). Remark.6. Let us mention that the procedure from [2] requires prior information on the values of Σ and rσ) up to some absolute multiplicative constant an assumption we shall return to in what follows. Recall that a random vector X with values in R d and with mean µ is L-subgaussian if for every t R d and every p 2, E X µ,t p ) /p L pe X µ,t 2 ) /2. 2

3 ote that if δ is smaller than /d, the error guaranteed by Theorem.5 is of the order of rσ) Σ log2/δ),.4) which will turn out to be far from optimal. To put.4) in some perspective, let us examine possible benchmarks for general mean estimation problems and see how those compare with.2),.3) and.4) when applied to covariance estimation.. Optimality in mean estimation Let W be a random vector with mean µ and set to be an arbitrary norm. Let B be the unit ball ofthe dual normto, and denote by ˆµ a mean-estimatorconstructed using an independent sample W,...,W. As it happens, a lower bound on the performance of ˆµ is R log2/δ).5) where Indeed, for every x B R = sup x B Ex W µ)) 2) /2..6) ˆµ µ x ˆµ µ) = x ˆµ) x µ) ; therefore, if there is a procedure for which ˆµ µ ε with probability δ, then on the same event the procedure automatically performs with accuracy ε and confidence δ for each one of the real-valued mean-estimation problems associated with the random variables x W), x B. By a lower bound from [] on real-valued mean estimation problems, the best possible mean-estimation error for each x W) is var x W) ) log2/δ), and taking the worst x B leads to.5). Although.5) is part of the story, it is unlikely it is the whole story. Intuitively,.5) takes into account the effect of one-dimensional marginals of W rather than the entire geometry of the distribution. It stands to reason that an additional global parameter is called for one that reflects the entire structure of W and the geometry of the norm. Moreover, that parameter should reflect the difficulty of the estimation problem at the constant confidence level. To give an example of such a result, a sharp) lower bound from [] on the mean estimation problem when W is a gaussian random vector is the following: if ˆµ µ ε with probability at least δ then ε c E W µ +R log2/δ) ) ;.7) hence, the global parameter in the gaussian case is just the mean E W µ. Let us examine.7) more carefully, in the hope that it would lead us towards the right answer for general random vectors. ote that by setting δ = exp p), the gaussian random variable W satisfies that log2/δ)ex W µ)) 2 ) /2 pex W µ)) 2 ) /2 E x W µ) p ) /p. 3

4 At the same time, the strong-weak moment inequality 2 for gaussian vectors see, e.g., [3]) implies that p)/p E W i µ E W i µ +c sup E x W i µ ) p) /p x B = E W µ +c sup x B E x W µ) p) /p ), = E W µ +c p sup x B E x W µ) 2) /2 ), where c and c are absolute constants. Thus, the lower bound of.7) implies that the best possible performance of a mean estimator of a gaussian vector matches a strong-weak moment inequality. This leads to a natural conjecture: that the best possible performance in a general mean estimation problem is given by a gaussian-like strong-weak moment inequality, and that there is a procedure that performs with that accuracy/confidence tradeoff. Recently, a general mean estimation procedure was introduced in [5] that exhibits this type of a strong-weak behavior. To formulate the result, let W be an arbitrary random vector taking values in R d and with mean µ, let G be the zero mean gaussian random vector with the same covariance as W and set Y = W i µ), where W,...,W are independent copies of W. Let be a norm, set B to be the unit ball of the dual norm, and put R = sup x B Ex W µ)) 2) /2. Theorem.7. [5] For 0 < δ < there is a procedure µ δ such that { E G µ δ µ cmax E Y, + R } log2/δ). The mean estimation procedure is defined as follows: let T = extb ) to be the set of extreme points in B. For the wanted confidence parameter 0 < δ <, let n = log2/δ) and set m = /n. Let I j ) n j= be the natural partition of {,...,} to blocks of cardinality m and given a sample W,...,W set Z j = m i I j W j. For x T and ε > 0, set S x ε) = { y R d : x Y) x Z j ) ε for more than n/2 blocks }, and define Sε) = x T S x ε). Set ε 0 = inf{ε > 0 : Sε) }, and let µ δ be any vector in ε>ε 0 Sε). 2 By strong moment we mean the L norm of W µ, while the weak moment is just the largest L p norm of a marginal x W µ) for x B. 4

5 The main result of this note which is formulated in the next section), is that the right application of Theorem.7 leads to an almost) optimal covariance estimator: the procedure performs as if X were a gaussian vector even if it only satisfies an L 4 L 2 norm equivalence, and the accuracy/confidence tradeoff obeys the strong-weak inequality one would expect..2 From mean estimation to covariance estimation In what follows, we assume without loss of generality that X is symmetric, not only zero mean. We may do so because if X is a centered random vector and X is an independent copy of X then X X )/ 2 is symmetric and has the same covariance as X. The natural choice of a random vector in Theorem.7 is W = X X, but as it happens, a better alternative is to use a truncated version of X instead of the original one: Definition.8. Let and let TrΣ) Σ α = γ X = X½ { X 2 α}. ) 4, In the L-subgaussian case we set γ = and when X only satisfies L 4 L 2 norm equivalence, let γ = logrσ). We also denote Σ = E X X). The main result of this note is that the procedure described in Theorem.7 for W = X X is an optimal or very close to optimal) covariance estimation procedure. Specifically, we prove the following, which improves both.2) and.3). Theorem.9. Let X be a zero mean random vector with an unknown) covariance matrix Σ, denote by the operator norm and using the notation of Definition.8 set R 2 = sup E v T X X Σ)u) 2, u,v S d where S d is the Euclidean unit sphere in R d. For any 0 < δ <, there is a procedure that receives as data the sample X,...,X, returns a matrix ˆΣ δ and satisfies that: ) If X is L-subgaussian then with probability δ [ ] rσ) ˆΣ δ Σ cl) Σ + rσ) + R log2/δ) ) ; 2) If X satisfies an L 4 L 2 norm equivalence with constantland c L)rΣ)logrΣ)) then with probability at least δ, ) rσ)logrσ)) ˆΣ δ Σ cl) Σ + R log2/δ)..8) In both cases R cl) Σ and cl),c L) are constants that depend only on L. Remark.0. ote that the estimates in Theorem.9 do not depend on the dimension d; instead, they depend only on rσ) which may be small even if d tends to infinity. This is important in view of recent results on the covariance estimation in Banach spaces [2]. 5

6 The estimate in Theorem.9 is actually a strong-weak moment inequality as if X were gaussian at least up to the logarithmic term in.8)). Indeed, let G be the zero mean gaussian random vector that has the same covariance as X and set rσ). As noted previously, rσ) Σ E G i G i Σ, with the L.H.S being the strong term from Theorem.9. Moreover, the term involving R is actually the natural weak term associated with the operator norm. Indeed, recall the well-known fact that the dual norm to the operator norm is the nuclear norm. And, since a linear functional z acts on the matrix x via trace duality that is zx) = [z,x] := Trz T x) it follows from [4] that the extreme points of the dual unit ball B are { u v : u,v S d }. Thus, R 2 = sup x B E x X X Σ) ) 2 = sup u,v S d E v T X X Σ)u ) 2, and in particular, by.5) the weak term R/ ) log2/δ) appearing in Theorem.9 is sharp. Thus, up to the logarithmic factor in 2), Theorem.9 implies that the estimator ˆΣ δ performs as if X were gaussian, even though it can be very far from gaussian. Let us compare the outcome of Theorem.9 to the current state of the art we mentioned previously. In the subgaussian setup Theorem.9 improves Theorem.2 because there are situations in which R is significantly smaller than Σ see such an example in what follows). And, under an L 4 L 2 norm equivalence scenario the improvement is more dramatic: on top of an improvement in the logarithmic factor appearing in the strong term, the weak term, R/ ) log2/δ) is significantly smaller than the corresponding estimate of Σ rσ)/ log2/δ) from Theorem.5. The proof of Theorem.9 is presented in the following section. We end this introduction with some notation. Throughout, absolute constants are denoted by c,c,..., etc.. Their value may change from line to line. Constants that depend on a parameter L are denoted by cl), a b means that there is an absolute constant c such that a cb, and a b means that cb a c b. When the constants depend on L we write a L b and a L b respectively. Finally, we define ψ 2 -norm of a real valued random variable Y as Y ψ2 = inf{c > 0 : EexpY 2 /c 2 ) 2}. In what follows E Y we use the well known fact that Y ψ2 sup p ) p p. p 2 2 Proof of Theorem.9 We require several observations on properties of X. First, note that by the symmetry of X, X is symmetric as well. Second, for every p 2 and any u R d, X,u Lp = E X,u p ) /p E X,u p ) /p. Therefore, if X is L-subgaussian then X,u Lp L p X,u L2, and if X satisfies L 4 L 2 norm equivalence with constant L then X,u L4 L X,u L2. More important features of X have to do with its covariance matrix Σ: 6

7 Lemma 2.. Assume that X is zero mean and satisfies an L 4 L 2 norm equivalence with constant L. Using the notation of Definition.8 we have that γrσ) Σ Σ cl) Σ, 2.) and Tr Σ) TrΣ) γrσ) cl)trσ), 2.2) where cl) is a constant that depends only on L. Proof. Observe that Σ Σ = sup 2 sup u,v S d u,v S d u T EX X) E X X) ) v E X,u X X,v = 2 sup E X,u X,v ½{ X 2 α} u,v S d 2 sup u,v S d E X,u 4 ) /4 E X,v 4 ) /4 Pr /2 X 2 α). By the L 4 L 2 norm equivalence, and sup u S d E X,u 4 ) /4 L sup u S d E X,u 2 ) /2 = L Σ /2 E X 4 2 =E d X,e i 2 ) 2 E i,j X,e i 2 X,e j 2 i,j E X,ei 4) /2 E X,ej 4) /2 L 2 i,j E X,e i 2 E X,e j 2 = L 2 i,j Σ ii Σ jj = L 2 TrΣ) ) ) Recalling the definition of α, we have that ) E X Pr /2 4 /2 X 2 α) 2 L 2 γ TrΣ) ) ) 2 /2 γtrσ) α 4 = L TrΣ) Σ Σ γrσ) =L, 2.4) and combining the two observations, γrσ) Σ Σ cl) Σ, 2.5) as claimed. Turning to the second part of the lemma, note that TrΣ) = d E X,e i 2 and Tr Σ) = Therefore, by the L 4 L 2 norm equivalence and 2.4), d Tr Σ) TrΣ) = E X,e i 2 ½ { X 2>α} L 2 d d E X,e i 2 ½ { X 2 α}. d E X,e i 4) /2 Pr /2 X 2 > α) E X,e i 2 )Pr /2 X 2 > α) cl)trσ) γrσ). 7

8 Clearly, by the first part of Lemma 2. it suffices to address the covariance estimation problem for the random vector X, since ˆΣ δ Σ ˆΣ δ Σ + Σ Σ, and Σ Σ is smaller than the wanted accuracy. Thus, from here on we set W = X X R d d, and the norm is the operator norm. As a result, the estimation procedure of Theorem.7 is Let X,...,X be the given sample, let X i = X i ½ { Xi 2 α} where α is given in Definition.8, and set 0 < δ <. Let n = log2/δ) and split the sample to n blocks I j, each of cardinality m = /n; set M j = m i I Xi j X i. Let T = {u,v) : u,v S d } and for ε > 0 and a pair u,v) let S u,v ε) = { Y R d d : v T M j Y)u ε for more than n/2 blocks }. Set Sε) = u,v) T S u,v ε). Let ε 0 = inf{ε > 0 : Sε) }, and choose Σ δ to be any matrix in ε>ε 0 Sε). Thanks to Theorem.7, the proof of Theorem.9 follows once sufficient control on E Y, E G and R is established in the two cases we are interested in. Controlling R The required estimate on R is presented in the next lemma. Lemma 2.2. Assume that X is zero mean and satisfies an L 4 L 2 norm equivalence with constant L. Then R vx) L Σ, where v 2 X) = sup v S d E X,v 4. Proof. For every u,v S d, E v T X X Σ)u ) 2 =E X,v 2 X,u 2 v T Σu) 2 E X,v 2 X,u 2 E X,v 4) /2 E X,u 4 ) /2, where we have used that fact that E X,v X,u = v T Σu. Thus, R vx). Also, recalling that X satisfies and L 4 L 2 norm equivalence, implying that vx) L 2 Σ, as claimed. E X,v 4 L 4 E X,v 2) 2 L 4 Σ 2 8

9 Controlling E G and E Y In the context of Theorem.7, G is the zero mean gaussian vector on R d d whose covariance coincides with that of W = X X. Instead of dealing with that vector directly, note that E G liminf E Y, 2.6) Indeed, E G = sup E max G), T B, T is finite x T x { and by the multivariate CLT, /2 } x W i EW) : x T {x G) : x T }. Hence, 2.6) follows from tail integration. Thanks to 2.6), all that remains is to bound E Y. converges weakly to The subgaussian case Fix an integer and note that X i X i Σ = sup u S d X i,u 2 E X i,u 2, 2.7) which is the supremum of a quadratic empirical process indexed by S d. Such empirical processes have been studied extensivelysee, e.g.,[6, 7, 8]), mainly using chaining methods. As it happens, quadratic subgaussian processes may be controlled in terms of a natural metric invariant of the indexing class the so-called γ 2 functional 3. In the case of 2.7), the indexing class is S d whose elements are viewed as linear functionals on R d, and the underlying metric is the ψ 2 norm endowed by the random vector X. By Corollary.9 from [8] we have that E sup u S d X i,u 2 E X i,u 2 c D γ 2S d,ψ 2 X)) where c is an absolute constant and + γ2 2S d,ψ 2 X)) ), 2.8) E X,u p ) /p D = DS d,ψ 2 ) = sup X,u ψ2 sup sup u S d u S d p 2 p. To estimate 2.8) one requires two facts see, e.g., [5] for more details). Firstly, a general property of the γ 2 functional is monotonicity in d: if T,d) is a metric space and d is another metric on T which satisfies that for every t,t 2 T, dt,t 2 ) κd t,t 2 ), then γ 2 T,d) κγ 2 T,d ). Here, we have that for every p 2 and every u R d, E X,u p ) /p E X,u p ) /p L p E X,u 2 ) /2, implying that X,u ψ2 L X,u L2 ; 3 Rather than defining the γ 2 functional, we refer the reader to [5] for a detailed exposition on the topic, and to [8, 6, 7] for the study of the quadratic empirical process in this and more general situations. 9

10 hence, γ 2 S d,ψ 2 X)) Lγ 2 S d,l 2 X)). Secondly, by Talagrand s majorizing measures theorem, if G is a zero mean gaussian random vector with the same covariance as X then γ 2 S d,l 2 X)) ce sup u S d G,u c E G 2 2 for a suitable absolute constant c. Finally, again thanks to the fact that X is L-subgaussian, D L sup u S d X,u L2 = L Σ /2. Therefore, by 2.8), for every, ) TrΣ) E Y cl) Σ /2 + TrΣ), and in particular, liminf E Y cl) Σ /2 TrΣ). This completes the proof of the first part of Theorem.9. ) /2 = c TrΣ), L 4 L 2 norm equivalence Just as in the subgaussian case, the key issue is finding a suitable estimate on E Y. Thanks to the fact that X is a truncated random vector one may apply a version of the matrix Bernstein inequality. We invoke Corollary from the survey [6] which is a slightly modified version of the original result from [9]): if Z is a random vector which satisfies that Z Z β almost surely, and B = EZ Z) 2, then ) B logrb)) E Z i Z i EZ Z) c +β logrb)). 2.9) In our case, Z = X½ { X α} for α as in Definition.8, and all that remains is to estimate B and rb). It is straightforward to verify that c Σ Tr Σ) B c L) Σ TrΣ) and TrB) c L) TrΣ) ) 2 : indeed, the upper estimates on B and TrB) follow from a direct computation and the fact that X satisfies an L 4 L 2 norm equivalence see, e.g., Lemma 4. in [2]); the lower estimate is an outcome of the FKG inequality see Corollary 5. in the supplementary material to []). Moreover, by Lemma 2. and using its notation, both Σ and Tr Σ) are equivalent to Σ and TrΣ) respectively, as long as cl)γrσ); hence, rb) L rσ). Finally, observe that Z Z = Z 2 2 α2. By 2.9), the choice of the truncation level α, and the fact that L rσ)logrσ), ) TrΣ) E Y cl) Σ /2 logrσ)+α 2 logrσ) rσ) cl) Σ logrσ). In particular, lim inf E Y cl) Σ TrΣ) logrσ), which completes the proof of second part of Theorem.9. 0

11 Concluding remarks The drawback of our estimator is that it requires prior information on TrΣ) and Σ. This issue has already been addressed in [, 2] using Lepski s method. The alternative we present is to handle the problem by constructing appropriate median of means estimators ˆϕ and ˆϕ 2, and for our purpose it suffices that ˆϕ TrΣ) and ˆϕ 2 Σ with high probability. The freedom to estimate the quantities in question up to an absolute multiplicative constant simplifies the problem considerably. Consider the problem of trace estimation. Since TrΣ) = E d X,e i 2, a standard median-of-meansestimator ˆϕ of E d X,e i 2 satisfiesthat with probability at least δ, d ) ˆϕ TrΣ) c Var X,e i 2 log/δ). see [3] for what is by now a standard argument). Using 2.3) we have Var d X,e i 2 ) L 2 )TrΣ) 2. Therefore, log/δ) ˆϕ TrΣ) cl)trσ), which implies that in the regime c L)log/δ), one has ˆϕ TrΣ) with probability at least δ. The estimation of Σ may be addressed in a similar fashion. Because it is not the focus of this note and for the sake of brevity we just sketch an argument that leads to a bound that depends on the dimension d, rather than on rσ). The more accurate estimate can be derived from Theorem 2 in [5]. Let be a minimal /4 cover of S d with respect to the Euclidean norm. Thus, Σ sup u u T Σu. For any fixed u the median of means estimator ˆϕ 2,u of Eu T X Xu satisfies that with probability at least δ, log/δ) ˆϕ 2,u u T Σu cl) Σ, because Var u T X Xu ) L 4 Σ 2. Finally, recalling that 9 d, the union bound shows that with probability at least δ d+log/δ) sup ˆϕ 2,u u T Σu c L) Σ. u Therefore, when c L)log/δ)+d), one has that sup ˆϕ 2,u Σ with probability u at least δ. Finally, let us give an example showing that there could be a substantial gap between R and Σ as well as R and vx)), which is a reason of sub-optimality of Theorem.2 Theorem 9 in [2]). Example 2.3. Let X = X ),...,X d) ) where X i) = α i ε i ; ε i ) d are independent, symmetric, {,}-valued random variables; and α >... > α d 0. Since the X i) s are

12 centered, independent and subgaussian with an absolute constant, then X is a centered, L-subgaussian random vector for some absolute constant L. If we set Σ = EX X) then clearly Σ = α 2, rσ) = d α 2 i /α2 and E v T X X Σ)u ) 2 = E v i u j X i) X j)) 2 = α 2 iα 2 jviu 2 2 j +v i v j u i u j ) i j α α 2 ) 2 v i u j ) 2 + v i v j u i u j ) α α 2 ) 2 v 2 u 2 + v, u 2) 2α α 2 ) 2. i,j Therefore, by the suitable choice of α 2 i j R 2α α 2 α 2 = Σ, 2.0) and the gap between R and Σ may be arbitrary large. The inequality 2.0) is the best one can hope for. Indeed, let Y be a centered random vector taking its values in R d with Σ = EY Y). Then for R = RY) it holds EY Y Σ) 2 d = EY Y Σ) e i e T i Y Y Σ) d sup E e T i Y Y Σ)v) 2 dr 2. v S d As before Corollary 5. in [] implies EY Y) 2 TrΣ) Σ. Therefore, dr 2 EY Y Σ) 2 EY Y) 2 Σ 2 TrΣ)) Σ Σ 2. This gives the following general lower bound rσ) R Σ. 2.) d When all α 2,...,α d are of the same order R = RX) satisfies 2.) up to multiplicative constant factors. References [] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l Institut Henri Poincare, Probabilites et Statistiques, pages 48 85, 202. [2] V. Koltchinskii and K. Lounici. Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 207. [3] R. Latala and J. O. Wojtaszczyk. On the infimum convolution inequality. Studia Mathematica, [4] K. Lounici. High-dimensional covariance matrix estimation with missing observations. Bernoulli, 204. [5] G. Lugosi and S. Mendelson. ear-optimal mean estimators with respect to general norms [6] S. Mendelson. Empirical processes with a bounded ψ -diameter. Geometric and Functional Analysis, pages ,

13 [7] S. Mendelson. Upper bounds on product and multiplier empirical processes. Stochastic Processes and their Applications, 26: , 206. [8] S. Mendelson, A. Pajor, and. Tomczak-Jaegermann. Reconstruction and subgaussian operators in asymptotic geometric analysis. Geometric and Functional Analysis, 74): , [9] S. Minsker. On some extensions of bernstein s inequality for self-adjoint operators. Statistics and Probability Letters, 207. [0] S. Minsker. Sub-gaussian estimators of the mean of a random matrix with heavytailed entries. Annals of Statistics, 46: , 208. [] S. Minsker and X. Wei. Estimation of the covariance structure of heavy-tailed distributions. IPS, 207. [2] S. Minsker and X. Wei. Robust modifications of u-statistics and applications to covariance estimation problems [3] A. emirovski and D. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons Inc., 983. [4] W. So. Facial structure of shatten p-norms. Linear and multilinear algebra, 990. [5] M. Talagrand. Upper and lower bounds for stochastic processes: modern methods and classical problems, volume 60. Springer Science & Business Media, 204. [6] J. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning,

On the singular values of random matrices

On the singular values of random matrices On the singular values of random matrices Shahar Mendelson Grigoris Paouris Abstract We present an approach that allows one to bound the largest and smallest singular values of an N n random matrix with

More information

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA) Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix

More information

Risk minimization by median-of-means tournaments

Risk minimization by median-of-means tournaments Risk minimization by median-of-means tournaments Gábor Lugosi Shahar Mendelson August 2, 206 Abstract We consider the classical statistical learning/regression problem, when the value of a real random

More information

An example of a convex body without symmetric projections.

An example of a convex body without symmetric projections. An example of a convex body without symmetric projections. E. D. Gluskin A. E. Litvak N. Tomczak-Jaegermann Abstract Many crucial results of the asymptotic theory of symmetric convex bodies were extended

More information

Lecture 7: Semidefinite programming

Lecture 7: Semidefinite programming CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational

More information

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate

More information

Approximately Gaussian marginals and the hyperplane conjecture

Approximately Gaussian marginals and the hyperplane conjecture Approximately Gaussian marginals and the hyperplane conjecture Tel-Aviv University Conference on Asymptotic Geometric Analysis, Euler Institute, St. Petersburg, July 2010 Lecture based on a joint work

More information

Sections of Convex Bodies via the Combinatorial Dimension

Sections of Convex Bodies via the Combinatorial Dimension Sections of Convex Bodies via the Combinatorial Dimension (Rough notes - no proofs) These notes are centered at one abstract result in combinatorial geometry, which gives a coordinate approach to several

More information

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which

More information

Minimax rate of convergence and the performance of ERM in phase recovery

Minimax rate of convergence and the performance of ERM in phase recovery Minimax rate of convergence and the performance of ERM in phase recovery Guillaume Lecué,3 Shahar Mendelson,4,5 ovember 0, 03 Abstract We study the performance of Empirical Risk Minimization in noisy phase

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Empirical Processes and random projections

Empirical Processes and random projections Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Small Ball Probability, Arithmetic Structure and Random Matrices

Small Ball Probability, Arithmetic Structure and Random Matrices Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in

More information

Weak and strong moments of l r -norms of log-concave vectors

Weak and strong moments of l r -norms of log-concave vectors Weak and strong moments of l r -norms of log-concave vectors Rafał Latała based on the joint work with Marta Strzelecka) University of Warsaw Minneapolis, April 14 2015 Log-concave measures/vectors A measure

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t)) Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Subspaces and orthogonal decompositions generated by bounded orthogonal systems

Subspaces and orthogonal decompositions generated by bounded orthogonal systems Subspaces and orthogonal decompositions generated by bounded orthogonal systems Olivier GUÉDON Shahar MENDELSON Alain PAJOR Nicole TOMCZAK-JAEGERMANN August 3, 006 Abstract We investigate properties of

More information

Uniform uncertainty principle for Bernoulli and subgaussian ensembles

Uniform uncertainty principle for Bernoulli and subgaussian ensembles arxiv:math.st/0608665 v1 27 Aug 2006 Uniform uncertainty principle for Bernoulli and subgaussian ensembles Shahar MENDELSON 1 Alain PAJOR 2 Nicole TOMCZAK-JAEGERMANN 3 1 Introduction August 21, 2006 In

More information

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS The Annals of Probability 2001, Vol. 29, No. 1, 411 417 MAJORIZING MEASURES WITHOUT MEASURES By Michel Talagrand URA 754 AU CNRS We give a reformulation of majorizing measures that does not involve measures,

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Anti-concentration Inequalities

Anti-concentration Inequalities Anti-concentration Inequalities Roman Vershynin Mark Rudelson University of California, Davis University of Missouri-Columbia Phenomena in High Dimensions Third Annual Conference Samos, Greece June 2007

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

Case study: stochastic simulation via Rademacher bootstrap

Case study: stochastic simulation via Rademacher bootstrap Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

arxiv: v1 [math.st] 5 Jul 2007

arxiv: v1 [math.st] 5 Jul 2007 EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive

More information

arxiv: v1 [math.pr] 11 Feb 2019

arxiv: v1 [math.pr] 11 Feb 2019 A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Lecture 3. Random Fourier measurements

Lecture 3. Random Fourier measurements Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our

More information

Matrix concentration inequalities

Matrix concentration inequalities ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,

More information

Notes on Gaussian processes and majorizing measures

Notes on Gaussian processes and majorizing measures Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,

More information

Small ball probabilities and metric entropy

Small ball probabilities and metric entropy Small ball probabilities and metric entropy Frank Aurzada, TU Berlin Sydney, February 2012 MCQMC Outline 1 Small ball probabilities vs. metric entropy 2 Connection to other questions 3 Recent results for

More information

Packing-Dimension Profiles and Fractional Brownian Motion

Packing-Dimension Profiles and Fractional Brownian Motion Under consideration for publication in Math. Proc. Camb. Phil. Soc. 1 Packing-Dimension Profiles and Fractional Brownian Motion By DAVAR KHOSHNEVISAN Department of Mathematics, 155 S. 1400 E., JWB 233,

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Gravitational allocation to Poisson points

Gravitational allocation to Poisson points Gravitational allocation to Poisson points Sourav Chatterjee joint work with Ron Peled Yuval Peres Dan Romik Allocation rules Let Ξ be a discrete subset of R d. An allocation (of Lebesgue measure to Ξ)

More information

GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n

GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. A. ZVAVITCH Abstract. In this paper we give a solution for the Gaussian version of the Busemann-Petty problem with additional

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators

A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators Shiri Artstein-Avidan, Mathematics Department, Princeton University Abstract: In this paper we first

More information

A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits

A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits Ran Raz Amir Shpilka Amir Yehudayoff Abstract We construct an explicit polynomial f(x 1,..., x n ), with coefficients in {0,

More information

Fréchet differentiability of the norm of L p -spaces associated with arbitrary von Neumann algebras

Fréchet differentiability of the norm of L p -spaces associated with arbitrary von Neumann algebras Fréchet differentiability of the norm of L p -spaces associated with arbitrary von Neumann algebras (Part I) Fedor Sukochev (joint work with D. Potapov, A. Tomskova and D. Zanin) University of NSW, AUSTRALIA

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS* LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators Stanislav Minsker e-mail: sminsker@math.duke.edu Abstract: We present some extensions of Bernstein s inequality for random self-adjoint

More information

THE UNIFORMISATION THEOREM OF RIEMANN SURFACES

THE UNIFORMISATION THEOREM OF RIEMANN SURFACES THE UNIFORISATION THEORE OF RIEANN SURFACES 1. What is the aim of this seminar? Recall that a compact oriented surface is a g -holed object. (Classification of surfaces.) It can be obtained through a 4g

More information

arxiv: v1 [math.pr] 22 Dec 2018

arxiv: v1 [math.pr] 22 Dec 2018 arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Structured signal recovery from non-linear and heavy-tailed measurements

Structured signal recovery from non-linear and heavy-tailed measurements Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern

More information

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues

More information

TOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS

TOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS TOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS ALEX HUMMELS Abstract. This paper proves a theorem that gives conditions for the topological equivalence of linear ordinary differential

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

CONSTRAINED PERCOLATION ON Z 2

CONSTRAINED PERCOLATION ON Z 2 CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Iowa State University. Instructor: Alex Roitershtein Summer Homework #1. Solutions

Iowa State University. Instructor: Alex Roitershtein Summer Homework #1. Solutions Math 501 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 015 EXERCISES FROM CHAPTER 1 Homework #1 Solutions The following version of the

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

A note on the convex infimum convolution inequality

A note on the convex infimum convolution inequality A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex

More information

Tail inequalities for additive functionals and empirical processes of. Markov chains

Tail inequalities for additive functionals and empirical processes of. Markov chains Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )

More information

Stein s Method and Characteristic Functions

Stein s Method and Characteristic Functions Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Gaussian Estimation under Attack Uncertainty

Gaussian Estimation under Attack Uncertainty Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

An almost sure invariance principle for additive functionals of Markov chains

An almost sure invariance principle for additive functionals of Markov chains Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

High-dimensional distributions with convexity properties

High-dimensional distributions with convexity properties High-dimensional distributions with convexity properties Bo az Klartag Tel-Aviv University A conference in honor of Charles Fefferman, Princeton, May 2009 High-Dimensional Distributions We are concerned

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information