arxiv: v1 [math.st] 27 Sep 2018
|
|
- Dayna Wheeler
- 5 years ago
- Views:
Transcription
1 Robust covariance estimation under L 4 L 2 norm equivalence. arxiv: v [math.st] 27 Sep 208 Shahar Mendelson ikita Zhivotovskiy Abstract Let X be a centered random vector taking values in R d and let Σ = EX X) be its covariance matrix. We show that if X satisfies an L 4 L 2 norm equivalence, there is a covariance estimator ˆΣ that exhibits the optimal performance one would expect had X been a gaussian vector. The procedure also improves the current state-of-the-art regarding high probability bounds in the subgaussian case sharp results were only known in expectation or with constant probability). In both scenarios the new bound does not depend explicitly on the dimension d, but rather on the effective rank of the covariance matrix Σ. Introduction The question of estimating the covariance of a random vector has been studied extensively in recent years see, e.g., [2, 4, 0,, 2] and references therein). To formulate the problem, let X be a zero mean random vector taking its values in R d and denote the covariancematrix by Σ = EX X). Given a samplex,...,x consisting ofindependent random vectors that are distributed according to X, the goal is to select a matrix ˆΣ that approximates Σ. While there are various notions of approximation, the focus of this note is on approximation with respect to the l 2 l 2 ) operator norm, which from here on we denote by. One way of viewing the question of covariance estimation with respect to any norm), is as a mean estimation problem. Indeed, if one sets W = X X, then EW = Σ, and since one is given a sample X,...,X, the vectors X i X i ) are independent copies of W. Thus, a matrix Ŵ that is a good approximation of the mean EW with respect to the underlying norm is a solution to the problem of estimating the covariance of X with respect to that norm. An immediate outcome of this simple observation is that the empirical mean ˆΣ = W i = X i X i, which is the trivial choice for estimating the true mean, is a poor estimator unless the random vector W has a nice tail behavior see, for example, the discussion in [5]). An example of a positive result of that flavour is Theorem 9 in [2], and to formulate it we need the following definition. Mathematical Sciences Institute, The Australian ational University and Department of Mathematics, Technion, I.I.T, shahar.mendelson@gmail.com Department of Mathematics, Technion, I.I.T, Skoltech and Higher School of Economics nikita.zhivotovskiy@phystech.edu
2 Definition.. The effective rank of a positive semidefinite square matrix A R d d is given by ra) = TrA) A..) Clearly, ra) d but the gap between ra) and d may be substantial. Theorem.2 [2]). For every L there exists a constant cl) for which the following holds. Let X be an L-subgaussian random vector. Then with probability at least δ rσ) X i X i Σ cl) Σ + rσ) ) log2/δ) + + log2/δ).2) It was shown in [2] that if G is a zero mean gaussian vector and in particular it satisfies the conditions of Theorem.2) with covariance Σ then E { } rσ) G i G i Σ Σ max, rσ). Hence, there is no room for improvement in the deviation estimate of the empirical mean from the true one at the constant confidence level. Of course, that does not imply that the empirical mean is an optimal covariance estimator even for a gaussian vector and at a constant level of confidence. Moreover, as we explain in what follows, there are far better covariance estimators than.2) when the confidence parameter δ is small. Just as in the one-dimensional mean-estimation problem, once the problem is more heavy-tailed the performance of the empirical mean deteriorates quickly and a different procedure has to be used. And that is also the case for covarianceestimation. The current state-of-the-art for covariance estimation in heavy-tailed situation is [2] see Corollary 4. there and similar results in [0, ]), in which X is assumed to satisfy an L 4 L 2 norm equivalence. Definition.3. A random vector X with mean µ satisfies an L 4 L 2 norm equivalence with constant L if for every t R d, E X µ,t 4 ) /4 LE X µ,t 2 ) /2. Remark.4. ote that if X is L-subgaussian then it satisfies an L 4 L 2 norm equivalence with constant 2L. Theorem.5 [2]). For every L there are constants cl) and c L) that depend only on L and for which the following holds. Let X satisfy an L 4 L 2 norm equivalence with constant L. For 0 < δ < there is an estimator Σ δ that satisfies rσ) Σ δ Σ cl) Σ logd+log/δ)).3) with probability at least δ, provided that c L)rΣ)logd+log2/δ)). Remark.6. Let us mention that the procedure from [2] requires prior information on the values of Σ and rσ) up to some absolute multiplicative constant an assumption we shall return to in what follows. Recall that a random vector X with values in R d and with mean µ is L-subgaussian if for every t R d and every p 2, E X µ,t p ) /p L pe X µ,t 2 ) /2. 2
3 ote that if δ is smaller than /d, the error guaranteed by Theorem.5 is of the order of rσ) Σ log2/δ),.4) which will turn out to be far from optimal. To put.4) in some perspective, let us examine possible benchmarks for general mean estimation problems and see how those compare with.2),.3) and.4) when applied to covariance estimation.. Optimality in mean estimation Let W be a random vector with mean µ and set to be an arbitrary norm. Let B be the unit ball ofthe dual normto, and denote by ˆµ a mean-estimatorconstructed using an independent sample W,...,W. As it happens, a lower bound on the performance of ˆµ is R log2/δ).5) where Indeed, for every x B R = sup x B Ex W µ)) 2) /2..6) ˆµ µ x ˆµ µ) = x ˆµ) x µ) ; therefore, if there is a procedure for which ˆµ µ ε with probability δ, then on the same event the procedure automatically performs with accuracy ε and confidence δ for each one of the real-valued mean-estimation problems associated with the random variables x W), x B. By a lower bound from [] on real-valued mean estimation problems, the best possible mean-estimation error for each x W) is var x W) ) log2/δ), and taking the worst x B leads to.5). Although.5) is part of the story, it is unlikely it is the whole story. Intuitively,.5) takes into account the effect of one-dimensional marginals of W rather than the entire geometry of the distribution. It stands to reason that an additional global parameter is called for one that reflects the entire structure of W and the geometry of the norm. Moreover, that parameter should reflect the difficulty of the estimation problem at the constant confidence level. To give an example of such a result, a sharp) lower bound from [] on the mean estimation problem when W is a gaussian random vector is the following: if ˆµ µ ε with probability at least δ then ε c E W µ +R log2/δ) ) ;.7) hence, the global parameter in the gaussian case is just the mean E W µ. Let us examine.7) more carefully, in the hope that it would lead us towards the right answer for general random vectors. ote that by setting δ = exp p), the gaussian random variable W satisfies that log2/δ)ex W µ)) 2 ) /2 pex W µ)) 2 ) /2 E x W µ) p ) /p. 3
4 At the same time, the strong-weak moment inequality 2 for gaussian vectors see, e.g., [3]) implies that p)/p E W i µ E W i µ +c sup E x W i µ ) p) /p x B = E W µ +c sup x B E x W µ) p) /p ), = E W µ +c p sup x B E x W µ) 2) /2 ), where c and c are absolute constants. Thus, the lower bound of.7) implies that the best possible performance of a mean estimator of a gaussian vector matches a strong-weak moment inequality. This leads to a natural conjecture: that the best possible performance in a general mean estimation problem is given by a gaussian-like strong-weak moment inequality, and that there is a procedure that performs with that accuracy/confidence tradeoff. Recently, a general mean estimation procedure was introduced in [5] that exhibits this type of a strong-weak behavior. To formulate the result, let W be an arbitrary random vector taking values in R d and with mean µ, let G be the zero mean gaussian random vector with the same covariance as W and set Y = W i µ), where W,...,W are independent copies of W. Let be a norm, set B to be the unit ball of the dual norm, and put R = sup x B Ex W µ)) 2) /2. Theorem.7. [5] For 0 < δ < there is a procedure µ δ such that { E G µ δ µ cmax E Y, + R } log2/δ). The mean estimation procedure is defined as follows: let T = extb ) to be the set of extreme points in B. For the wanted confidence parameter 0 < δ <, let n = log2/δ) and set m = /n. Let I j ) n j= be the natural partition of {,...,} to blocks of cardinality m and given a sample W,...,W set Z j = m i I j W j. For x T and ε > 0, set S x ε) = { y R d : x Y) x Z j ) ε for more than n/2 blocks }, and define Sε) = x T S x ε). Set ε 0 = inf{ε > 0 : Sε) }, and let µ δ be any vector in ε>ε 0 Sε). 2 By strong moment we mean the L norm of W µ, while the weak moment is just the largest L p norm of a marginal x W µ) for x B. 4
5 The main result of this note which is formulated in the next section), is that the right application of Theorem.7 leads to an almost) optimal covariance estimator: the procedure performs as if X were a gaussian vector even if it only satisfies an L 4 L 2 norm equivalence, and the accuracy/confidence tradeoff obeys the strong-weak inequality one would expect..2 From mean estimation to covariance estimation In what follows, we assume without loss of generality that X is symmetric, not only zero mean. We may do so because if X is a centered random vector and X is an independent copy of X then X X )/ 2 is symmetric and has the same covariance as X. The natural choice of a random vector in Theorem.7 is W = X X, but as it happens, a better alternative is to use a truncated version of X instead of the original one: Definition.8. Let and let TrΣ) Σ α = γ X = X½ { X 2 α}. ) 4, In the L-subgaussian case we set γ = and when X only satisfies L 4 L 2 norm equivalence, let γ = logrσ). We also denote Σ = E X X). The main result of this note is that the procedure described in Theorem.7 for W = X X is an optimal or very close to optimal) covariance estimation procedure. Specifically, we prove the following, which improves both.2) and.3). Theorem.9. Let X be a zero mean random vector with an unknown) covariance matrix Σ, denote by the operator norm and using the notation of Definition.8 set R 2 = sup E v T X X Σ)u) 2, u,v S d where S d is the Euclidean unit sphere in R d. For any 0 < δ <, there is a procedure that receives as data the sample X,...,X, returns a matrix ˆΣ δ and satisfies that: ) If X is L-subgaussian then with probability δ [ ] rσ) ˆΣ δ Σ cl) Σ + rσ) + R log2/δ) ) ; 2) If X satisfies an L 4 L 2 norm equivalence with constantland c L)rΣ)logrΣ)) then with probability at least δ, ) rσ)logrσ)) ˆΣ δ Σ cl) Σ + R log2/δ)..8) In both cases R cl) Σ and cl),c L) are constants that depend only on L. Remark.0. ote that the estimates in Theorem.9 do not depend on the dimension d; instead, they depend only on rσ) which may be small even if d tends to infinity. This is important in view of recent results on the covariance estimation in Banach spaces [2]. 5
6 The estimate in Theorem.9 is actually a strong-weak moment inequality as if X were gaussian at least up to the logarithmic term in.8)). Indeed, let G be the zero mean gaussian random vector that has the same covariance as X and set rσ). As noted previously, rσ) Σ E G i G i Σ, with the L.H.S being the strong term from Theorem.9. Moreover, the term involving R is actually the natural weak term associated with the operator norm. Indeed, recall the well-known fact that the dual norm to the operator norm is the nuclear norm. And, since a linear functional z acts on the matrix x via trace duality that is zx) = [z,x] := Trz T x) it follows from [4] that the extreme points of the dual unit ball B are { u v : u,v S d }. Thus, R 2 = sup x B E x X X Σ) ) 2 = sup u,v S d E v T X X Σ)u ) 2, and in particular, by.5) the weak term R/ ) log2/δ) appearing in Theorem.9 is sharp. Thus, up to the logarithmic factor in 2), Theorem.9 implies that the estimator ˆΣ δ performs as if X were gaussian, even though it can be very far from gaussian. Let us compare the outcome of Theorem.9 to the current state of the art we mentioned previously. In the subgaussian setup Theorem.9 improves Theorem.2 because there are situations in which R is significantly smaller than Σ see such an example in what follows). And, under an L 4 L 2 norm equivalence scenario the improvement is more dramatic: on top of an improvement in the logarithmic factor appearing in the strong term, the weak term, R/ ) log2/δ) is significantly smaller than the corresponding estimate of Σ rσ)/ log2/δ) from Theorem.5. The proof of Theorem.9 is presented in the following section. We end this introduction with some notation. Throughout, absolute constants are denoted by c,c,..., etc.. Their value may change from line to line. Constants that depend on a parameter L are denoted by cl), a b means that there is an absolute constant c such that a cb, and a b means that cb a c b. When the constants depend on L we write a L b and a L b respectively. Finally, we define ψ 2 -norm of a real valued random variable Y as Y ψ2 = inf{c > 0 : EexpY 2 /c 2 ) 2}. In what follows E Y we use the well known fact that Y ψ2 sup p ) p p. p 2 2 Proof of Theorem.9 We require several observations on properties of X. First, note that by the symmetry of X, X is symmetric as well. Second, for every p 2 and any u R d, X,u Lp = E X,u p ) /p E X,u p ) /p. Therefore, if X is L-subgaussian then X,u Lp L p X,u L2, and if X satisfies L 4 L 2 norm equivalence with constant L then X,u L4 L X,u L2. More important features of X have to do with its covariance matrix Σ: 6
7 Lemma 2.. Assume that X is zero mean and satisfies an L 4 L 2 norm equivalence with constant L. Using the notation of Definition.8 we have that γrσ) Σ Σ cl) Σ, 2.) and Tr Σ) TrΣ) γrσ) cl)trσ), 2.2) where cl) is a constant that depends only on L. Proof. Observe that Σ Σ = sup 2 sup u,v S d u,v S d u T EX X) E X X) ) v E X,u X X,v = 2 sup E X,u X,v ½{ X 2 α} u,v S d 2 sup u,v S d E X,u 4 ) /4 E X,v 4 ) /4 Pr /2 X 2 α). By the L 4 L 2 norm equivalence, and sup u S d E X,u 4 ) /4 L sup u S d E X,u 2 ) /2 = L Σ /2 E X 4 2 =E d X,e i 2 ) 2 E i,j X,e i 2 X,e j 2 i,j E X,ei 4) /2 E X,ej 4) /2 L 2 i,j E X,e i 2 E X,e j 2 = L 2 i,j Σ ii Σ jj = L 2 TrΣ) ) ) Recalling the definition of α, we have that ) E X Pr /2 4 /2 X 2 α) 2 L 2 γ TrΣ) ) ) 2 /2 γtrσ) α 4 = L TrΣ) Σ Σ γrσ) =L, 2.4) and combining the two observations, γrσ) Σ Σ cl) Σ, 2.5) as claimed. Turning to the second part of the lemma, note that TrΣ) = d E X,e i 2 and Tr Σ) = Therefore, by the L 4 L 2 norm equivalence and 2.4), d Tr Σ) TrΣ) = E X,e i 2 ½ { X 2>α} L 2 d d E X,e i 2 ½ { X 2 α}. d E X,e i 4) /2 Pr /2 X 2 > α) E X,e i 2 )Pr /2 X 2 > α) cl)trσ) γrσ). 7
8 Clearly, by the first part of Lemma 2. it suffices to address the covariance estimation problem for the random vector X, since ˆΣ δ Σ ˆΣ δ Σ + Σ Σ, and Σ Σ is smaller than the wanted accuracy. Thus, from here on we set W = X X R d d, and the norm is the operator norm. As a result, the estimation procedure of Theorem.7 is Let X,...,X be the given sample, let X i = X i ½ { Xi 2 α} where α is given in Definition.8, and set 0 < δ <. Let n = log2/δ) and split the sample to n blocks I j, each of cardinality m = /n; set M j = m i I Xi j X i. Let T = {u,v) : u,v S d } and for ε > 0 and a pair u,v) let S u,v ε) = { Y R d d : v T M j Y)u ε for more than n/2 blocks }. Set Sε) = u,v) T S u,v ε). Let ε 0 = inf{ε > 0 : Sε) }, and choose Σ δ to be any matrix in ε>ε 0 Sε). Thanks to Theorem.7, the proof of Theorem.9 follows once sufficient control on E Y, E G and R is established in the two cases we are interested in. Controlling R The required estimate on R is presented in the next lemma. Lemma 2.2. Assume that X is zero mean and satisfies an L 4 L 2 norm equivalence with constant L. Then R vx) L Σ, where v 2 X) = sup v S d E X,v 4. Proof. For every u,v S d, E v T X X Σ)u ) 2 =E X,v 2 X,u 2 v T Σu) 2 E X,v 2 X,u 2 E X,v 4) /2 E X,u 4 ) /2, where we have used that fact that E X,v X,u = v T Σu. Thus, R vx). Also, recalling that X satisfies and L 4 L 2 norm equivalence, implying that vx) L 2 Σ, as claimed. E X,v 4 L 4 E X,v 2) 2 L 4 Σ 2 8
9 Controlling E G and E Y In the context of Theorem.7, G is the zero mean gaussian vector on R d d whose covariance coincides with that of W = X X. Instead of dealing with that vector directly, note that E G liminf E Y, 2.6) Indeed, E G = sup E max G), T B, T is finite x T x { and by the multivariate CLT, /2 } x W i EW) : x T {x G) : x T }. Hence, 2.6) follows from tail integration. Thanks to 2.6), all that remains is to bound E Y. converges weakly to The subgaussian case Fix an integer and note that X i X i Σ = sup u S d X i,u 2 E X i,u 2, 2.7) which is the supremum of a quadratic empirical process indexed by S d. Such empirical processes have been studied extensivelysee, e.g.,[6, 7, 8]), mainly using chaining methods. As it happens, quadratic subgaussian processes may be controlled in terms of a natural metric invariant of the indexing class the so-called γ 2 functional 3. In the case of 2.7), the indexing class is S d whose elements are viewed as linear functionals on R d, and the underlying metric is the ψ 2 norm endowed by the random vector X. By Corollary.9 from [8] we have that E sup u S d X i,u 2 E X i,u 2 c D γ 2S d,ψ 2 X)) where c is an absolute constant and + γ2 2S d,ψ 2 X)) ), 2.8) E X,u p ) /p D = DS d,ψ 2 ) = sup X,u ψ2 sup sup u S d u S d p 2 p. To estimate 2.8) one requires two facts see, e.g., [5] for more details). Firstly, a general property of the γ 2 functional is monotonicity in d: if T,d) is a metric space and d is another metric on T which satisfies that for every t,t 2 T, dt,t 2 ) κd t,t 2 ), then γ 2 T,d) κγ 2 T,d ). Here, we have that for every p 2 and every u R d, E X,u p ) /p E X,u p ) /p L p E X,u 2 ) /2, implying that X,u ψ2 L X,u L2 ; 3 Rather than defining the γ 2 functional, we refer the reader to [5] for a detailed exposition on the topic, and to [8, 6, 7] for the study of the quadratic empirical process in this and more general situations. 9
10 hence, γ 2 S d,ψ 2 X)) Lγ 2 S d,l 2 X)). Secondly, by Talagrand s majorizing measures theorem, if G is a zero mean gaussian random vector with the same covariance as X then γ 2 S d,l 2 X)) ce sup u S d G,u c E G 2 2 for a suitable absolute constant c. Finally, again thanks to the fact that X is L-subgaussian, D L sup u S d X,u L2 = L Σ /2. Therefore, by 2.8), for every, ) TrΣ) E Y cl) Σ /2 + TrΣ), and in particular, liminf E Y cl) Σ /2 TrΣ). This completes the proof of the first part of Theorem.9. ) /2 = c TrΣ), L 4 L 2 norm equivalence Just as in the subgaussian case, the key issue is finding a suitable estimate on E Y. Thanks to the fact that X is a truncated random vector one may apply a version of the matrix Bernstein inequality. We invoke Corollary from the survey [6] which is a slightly modified version of the original result from [9]): if Z is a random vector which satisfies that Z Z β almost surely, and B = EZ Z) 2, then ) B logrb)) E Z i Z i EZ Z) c +β logrb)). 2.9) In our case, Z = X½ { X α} for α as in Definition.8, and all that remains is to estimate B and rb). It is straightforward to verify that c Σ Tr Σ) B c L) Σ TrΣ) and TrB) c L) TrΣ) ) 2 : indeed, the upper estimates on B and TrB) follow from a direct computation and the fact that X satisfies an L 4 L 2 norm equivalence see, e.g., Lemma 4. in [2]); the lower estimate is an outcome of the FKG inequality see Corollary 5. in the supplementary material to []). Moreover, by Lemma 2. and using its notation, both Σ and Tr Σ) are equivalent to Σ and TrΣ) respectively, as long as cl)γrσ); hence, rb) L rσ). Finally, observe that Z Z = Z 2 2 α2. By 2.9), the choice of the truncation level α, and the fact that L rσ)logrσ), ) TrΣ) E Y cl) Σ /2 logrσ)+α 2 logrσ) rσ) cl) Σ logrσ). In particular, lim inf E Y cl) Σ TrΣ) logrσ), which completes the proof of second part of Theorem.9. 0
11 Concluding remarks The drawback of our estimator is that it requires prior information on TrΣ) and Σ. This issue has already been addressed in [, 2] using Lepski s method. The alternative we present is to handle the problem by constructing appropriate median of means estimators ˆϕ and ˆϕ 2, and for our purpose it suffices that ˆϕ TrΣ) and ˆϕ 2 Σ with high probability. The freedom to estimate the quantities in question up to an absolute multiplicative constant simplifies the problem considerably. Consider the problem of trace estimation. Since TrΣ) = E d X,e i 2, a standard median-of-meansestimator ˆϕ of E d X,e i 2 satisfiesthat with probability at least δ, d ) ˆϕ TrΣ) c Var X,e i 2 log/δ). see [3] for what is by now a standard argument). Using 2.3) we have Var d X,e i 2 ) L 2 )TrΣ) 2. Therefore, log/δ) ˆϕ TrΣ) cl)trσ), which implies that in the regime c L)log/δ), one has ˆϕ TrΣ) with probability at least δ. The estimation of Σ may be addressed in a similar fashion. Because it is not the focus of this note and for the sake of brevity we just sketch an argument that leads to a bound that depends on the dimension d, rather than on rσ). The more accurate estimate can be derived from Theorem 2 in [5]. Let be a minimal /4 cover of S d with respect to the Euclidean norm. Thus, Σ sup u u T Σu. For any fixed u the median of means estimator ˆϕ 2,u of Eu T X Xu satisfies that with probability at least δ, log/δ) ˆϕ 2,u u T Σu cl) Σ, because Var u T X Xu ) L 4 Σ 2. Finally, recalling that 9 d, the union bound shows that with probability at least δ d+log/δ) sup ˆϕ 2,u u T Σu c L) Σ. u Therefore, when c L)log/δ)+d), one has that sup ˆϕ 2,u Σ with probability u at least δ. Finally, let us give an example showing that there could be a substantial gap between R and Σ as well as R and vx)), which is a reason of sub-optimality of Theorem.2 Theorem 9 in [2]). Example 2.3. Let X = X ),...,X d) ) where X i) = α i ε i ; ε i ) d are independent, symmetric, {,}-valued random variables; and α >... > α d 0. Since the X i) s are
12 centered, independent and subgaussian with an absolute constant, then X is a centered, L-subgaussian random vector for some absolute constant L. If we set Σ = EX X) then clearly Σ = α 2, rσ) = d α 2 i /α2 and E v T X X Σ)u ) 2 = E v i u j X i) X j)) 2 = α 2 iα 2 jviu 2 2 j +v i v j u i u j ) i j α α 2 ) 2 v i u j ) 2 + v i v j u i u j ) α α 2 ) 2 v 2 u 2 + v, u 2) 2α α 2 ) 2. i,j Therefore, by the suitable choice of α 2 i j R 2α α 2 α 2 = Σ, 2.0) and the gap between R and Σ may be arbitrary large. The inequality 2.0) is the best one can hope for. Indeed, let Y be a centered random vector taking its values in R d with Σ = EY Y). Then for R = RY) it holds EY Y Σ) 2 d = EY Y Σ) e i e T i Y Y Σ) d sup E e T i Y Y Σ)v) 2 dr 2. v S d As before Corollary 5. in [] implies EY Y) 2 TrΣ) Σ. Therefore, dr 2 EY Y Σ) 2 EY Y) 2 Σ 2 TrΣ)) Σ Σ 2. This gives the following general lower bound rσ) R Σ. 2.) d When all α 2,...,α d are of the same order R = RX) satisfies 2.) up to multiplicative constant factors. References [] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l Institut Henri Poincare, Probabilites et Statistiques, pages 48 85, 202. [2] V. Koltchinskii and K. Lounici. Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 207. [3] R. Latala and J. O. Wojtaszczyk. On the infimum convolution inequality. Studia Mathematica, [4] K. Lounici. High-dimensional covariance matrix estimation with missing observations. Bernoulli, 204. [5] G. Lugosi and S. Mendelson. ear-optimal mean estimators with respect to general norms [6] S. Mendelson. Empirical processes with a bounded ψ -diameter. Geometric and Functional Analysis, pages ,
13 [7] S. Mendelson. Upper bounds on product and multiplier empirical processes. Stochastic Processes and their Applications, 26: , 206. [8] S. Mendelson, A. Pajor, and. Tomczak-Jaegermann. Reconstruction and subgaussian operators in asymptotic geometric analysis. Geometric and Functional Analysis, 74): , [9] S. Minsker. On some extensions of bernstein s inequality for self-adjoint operators. Statistics and Probability Letters, 207. [0] S. Minsker. Sub-gaussian estimators of the mean of a random matrix with heavytailed entries. Annals of Statistics, 46: , 208. [] S. Minsker and X. Wei. Estimation of the covariance structure of heavy-tailed distributions. IPS, 207. [2] S. Minsker and X. Wei. Robust modifications of u-statistics and applications to covariance estimation problems [3] A. emirovski and D. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons Inc., 983. [4] W. So. Facial structure of shatten p-norms. Linear and multilinear algebra, 990. [5] M. Talagrand. Upper and lower bounds for stochastic processes: modern methods and classical problems, volume 60. Springer Science & Business Media, 204. [6] J. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning,
On the singular values of random matrices
On the singular values of random matrices Shahar Mendelson Grigoris Paouris Abstract We present an approach that allows one to bound the largest and smallest singular values of an N n random matrix with
More informationLeast singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)
Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix
More informationRisk minimization by median-of-means tournaments
Risk minimization by median-of-means tournaments Gábor Lugosi Shahar Mendelson August 2, 206 Abstract We consider the classical statistical learning/regression problem, when the value of a real random
More informationAn example of a convex body without symmetric projections.
An example of a convex body without symmetric projections. E. D. Gluskin A. E. Litvak N. Tomczak-Jaegermann Abstract Many crucial results of the asymptotic theory of symmetric convex bodies were extended
More informationLecture 7: Semidefinite programming
CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational
More informationSub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop Simple question: how to estimate
More informationApproximately Gaussian marginals and the hyperplane conjecture
Approximately Gaussian marginals and the hyperplane conjecture Tel-Aviv University Conference on Asymptotic Geometric Analysis, Euler Institute, St. Petersburg, July 2010 Lecture based on a joint work
More informationSections of Convex Bodies via the Combinatorial Dimension
Sections of Convex Bodies via the Combinatorial Dimension (Rough notes - no proofs) These notes are centered at one abstract result in combinatorial geometry, which gives a coordinate approach to several
More informationRademacher Averages and Phase Transitions in Glivenko Cantelli Classes
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which
More informationMinimax rate of convergence and the performance of ERM in phase recovery
Minimax rate of convergence and the performance of ERM in phase recovery Guillaume Lecué,3 Shahar Mendelson,4,5 ovember 0, 03 Abstract We study the performance of Empirical Risk Minimization in noisy phase
More informationEstimates for probabilities of independent events and infinite series
Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences
More informationModule 3. Function of a Random Variable and its distribution
Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationEmpirical Processes and random projections
Empirical Processes and random projections B. Klartag, S. Mendelson School of Mathematics, Institute for Advanced Study, Princeton, NJ 08540, USA. Institute of Advanced Studies, The Australian National
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationSmall Ball Probability, Arithmetic Structure and Random Matrices
Small Ball Probability, Arithmetic Structure and Random Matrices Roman Vershynin University of California, Davis April 23, 2008 Distance Problems How far is a random vector X from a given subspace H in
More informationWeak and strong moments of l r -norms of log-concave vectors
Weak and strong moments of l r -norms of log-concave vectors Rafał Latała based on the joint work with Marta Strzelecka) University of Warsaw Minneapolis, April 14 2015 Log-concave measures/vectors A measure
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationA Note on Hilbertian Elliptically Contoured Distributions
A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationat time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))
Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationSubspaces and orthogonal decompositions generated by bounded orthogonal systems
Subspaces and orthogonal decompositions generated by bounded orthogonal systems Olivier GUÉDON Shahar MENDELSON Alain PAJOR Nicole TOMCZAK-JAEGERMANN August 3, 006 Abstract We investigate properties of
More informationUniform uncertainty principle for Bernoulli and subgaussian ensembles
arxiv:math.st/0608665 v1 27 Aug 2006 Uniform uncertainty principle for Bernoulli and subgaussian ensembles Shahar MENDELSON 1 Alain PAJOR 2 Nicole TOMCZAK-JAEGERMANN 3 1 Introduction August 21, 2006 In
More informationMAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS
The Annals of Probability 2001, Vol. 29, No. 1, 411 417 MAJORIZING MEASURES WITHOUT MEASURES By Michel Talagrand URA 754 AU CNRS We give a reformulation of majorizing measures that does not involve measures,
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationAnti-concentration Inequalities
Anti-concentration Inequalities Roman Vershynin Mark Rudelson University of California, Davis University of Missouri-Columbia Phenomena in High Dimensions Third Annual Conference Samos, Greece June 2007
More informationRefining the Central Limit Theorem Approximation via Extreme Value Theory
Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationarxiv: v1 [math.st] 5 Jul 2007
EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive
More informationarxiv: v1 [math.pr] 11 Feb 2019
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationStability of optimization problems with stochastic dominance constraints
Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More informationThe Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:
Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply
More informationOn the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables
On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationSmall ball probabilities and metric entropy
Small ball probabilities and metric entropy Frank Aurzada, TU Berlin Sydney, February 2012 MCQMC Outline 1 Small ball probabilities vs. metric entropy 2 Connection to other questions 3 Recent results for
More informationPacking-Dimension Profiles and Fractional Brownian Motion
Under consideration for publication in Math. Proc. Camb. Phil. Soc. 1 Packing-Dimension Profiles and Fractional Brownian Motion By DAVAR KHOSHNEVISAN Department of Mathematics, 155 S. 1400 E., JWB 233,
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationGravitational allocation to Poisson points
Gravitational allocation to Poisson points Sourav Chatterjee joint work with Ron Peled Yuval Peres Dan Romik Allocation rules Let Ξ be a discrete subset of R d. An allocation (of Lebesgue measure to Ξ)
More informationGAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. 2π) n
GAUSSIAN MEASURE OF SECTIONS OF DILATES AND TRANSLATIONS OF CONVEX BODIES. A. ZVAVITCH Abstract. In this paper we give a solution for the Gaussian version of the Busemann-Petty problem with additional
More information4 Sums of Independent Random Variables
4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables
More informationA Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators
A Bernstein-Chernoff deviation inequality, and geometric properties of random families of operators Shiri Artstein-Avidan, Mathematics Department, Princeton University Abstract: In this paper we first
More informationA Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits
A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits Ran Raz Amir Shpilka Amir Yehudayoff Abstract We construct an explicit polynomial f(x 1,..., x n ), with coefficients in {0,
More informationFréchet differentiability of the norm of L p -spaces associated with arbitrary von Neumann algebras
Fréchet differentiability of the norm of L p -spaces associated with arbitrary von Neumann algebras (Part I) Fedor Sukochev (joint work with D. Potapov, A. Tomskova and D. Zanin) University of NSW, AUSTRALIA
More informationSpecification Test for Instrumental Variables Regression with Many Instruments
Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationLARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*
LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationOn Some Extensions of Bernstein s Inequality for Self-Adjoint Operators
On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators Stanislav Minsker e-mail: sminsker@math.duke.edu Abstract: We present some extensions of Bernstein s inequality for random self-adjoint
More informationTHE UNIFORMISATION THEOREM OF RIEMANN SURFACES
THE UNIFORISATION THEORE OF RIEANN SURFACES 1. What is the aim of this seminar? Recall that a compact oriented surface is a g -holed object. (Classification of surfaces.) It can be obtained through a 4g
More informationarxiv: v1 [math.pr] 22 Dec 2018
arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationAN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES
Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationStructured signal recovery from non-linear and heavy-tailed measurements
Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern
More informationThe circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)
The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues
More informationTOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS
TOPOLOGICAL EQUIVALENCE OF LINEAR ORDINARY DIFFERENTIAL EQUATIONS ALEX HUMMELS Abstract. This paper proves a theorem that gives conditions for the topological equivalence of linear ordinary differential
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More informationCONSTRAINED PERCOLATION ON Z 2
CONSTRAINED PERCOLATION ON Z 2 ZHONGYANG LI Abstract. We study a constrained percolation process on Z 2, and prove the almost sure nonexistence of infinite clusters and contours for a large class of probability
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationBrownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationIowa State University. Instructor: Alex Roitershtein Summer Homework #1. Solutions
Math 501 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 015 EXERCISES FROM CHAPTER 1 Homework #1 Solutions The following version of the
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationA note on the convex infimum convolution inequality
A note on the convex infimum convolution inequality Naomi Feldheim, Arnaud Marsiglietti, Piotr Nayar, Jing Wang Abstract We characterize the symmetric measures which satisfy the one dimensional convex
More informationTail inequalities for additive functionals and empirical processes of. Markov chains
Tail inequalities for additive functionals and empirical processes of geometrically ergodic Markov chains University of Warsaw Banff, June 2009 Geometric ergodicity Definition A Markov chain X = (X n )
More informationStein s Method and Characteristic Functions
Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method
More informationProblem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function
Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationGaussian Estimation under Attack Uncertainty
Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationAn almost sure invariance principle for additive functionals of Markov chains
Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department
More informationThe Canonical Gaussian Measure on R
The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where
More informationHigh-dimensional distributions with convexity properties
High-dimensional distributions with convexity properties Bo az Klartag Tel-Aviv University A conference in honor of Charles Fefferman, Princeton, May 2009 High-Dimensional Distributions We are concerned
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More information