18.S096: Concentration Inequalities, Scalar and Matrix Versions

Size: px
Start display at page:

Download "18.S096: Concentration Inequalities, Scalar and Matrix Versions"

Transcription

1 18S096: Concentration Inequalities, Scalar and Matrix Versions Topics in Mathematics of Data Science Fall 015 Afonso S Bandeira bandeira@mitedu October 15, 015 These are lecture notes not in final form and will be continuously edited and/or corrected as I am sure it contains many typos Please let me know if you find any typo/mistake Also, I am posting short descriptions of these notes together with the open problems on my Blog, see [Ban15a] 41 Large Deviation Inequalities Concentration and large deviations inequalities are among the most useful tools when understanding the performance of some algorithms In a nutshell they control the probability of a random variable being very far from its expectation The simplest such inequality is Markov s inequality: Theorem 41 Markov s Inequality Let X 0 be a non-negative random variable with [X] < Then, Prob{X > t [X] 1 t Proof Let t > 0 Define a random variable Y t as { 0 if X t Y t = t if X > t Clearly, Y t X, hence [Y t ] [X], and t Prob{X > t = [Y t ] [X], concluding the proof Markov s inequality can be used to obtain many more concentration inequalities inequality is a simple inequality that control fluctuations from the mean Chebyshev s Theorem 4 Chebyshev s inequality Let X be a random variable with [X ] < Then, Prob{ X X > t VarX t 1

2 Proof Apply Markov s inequality to the random variable X [X] to get: Prob{ X X > t = Prob{X X > t [ X X ] t = VarX t 411 Sums of independent random variables In what follows we ll show two useful inequalities involving sums of independent random variables The intuitive idea is that if we have a sum of independent random variables X = X X n, where X i are iid centered random variables, then while the value of X can be of order On it will very likely be of order O n note that this is the order of its standard deviation The inequalities that follow are ways of very precisely controlling the probability of X being larger than O n While we could use, for example, Chebyshev s inequality for this, in the inequalities that follow the probabilities will be exponentially small, rather than quadratic, which will be crucial in many applications to come Theorem 43 Hoeffding s Inequality Let X 1, X,, X n be independent bounded random variables, ie, X i a and [X i ] = 0 Then, { Prob X i > t exp t na The inequality implies that fluctuations larger than O n have small probability For example, for t = a n log n we get that the probability is at most n Proof We first get a probability bound for the event n X i > t The proof, again, will follow from Markov Since we want an exponentially small probability, we use a classical trick that involves exponentiating with any λ > 0 and then choosing the optimal λ { { Prob X i > t = Prob X i > t = Prob [eλ {e λ n X i > e λt n X i ] e tλ n = e tλ [e λx i ], 3 where the penultimate step follows from Markov s inequality and the last equality follows from independence of the X i s

3 We now use the fact that X i a to bound [e λx i ] Because the function fx = e λx is convex, e λx a + x a eλa + a x a e λa, for all x [ a, a] Since, for all i, [X i ] = 0 we get [ [e λx i a + Xi ] a eλa + a X ] i a e λa 1 e λa + e λa = coshλa Hence, Note that 1 coshx e x /, Together with, this gives { Prob X i > t for all x R [e λx i ] [e λx i / ] e λa / e tλ n e λa / = e tλ e nλa / This inequality holds for any choice of λ 0, so we choose the value of λ that minimizes min {n λa tλ λ Differentiating readily shows that the minimizer is given by which satisfies λ > 0 For this choice of λ, Thus, λ = nλa / tλ = 1 n t na, { Prob X i > t t a t a = t na e t na By using the same argument on n X i, and union bounding over the two events we get, { Prob X i > t e t na 1 This follows immediately from the Taylor expansions: coshx = n=0 3 x n n!, ex / = n=0 x n n n!, and n! n n!

4 Remark 44 Let s say that we have random variables r 1,, r n iid distributed as 1 with probability p/ r i = 0 with probability 1 p 1 with probability p/ Then, r i = 0 and r i 1 so Hoeffding s inequality gives: { Prob r i > t exp t n Intuitively, the smallest p is the more concentrated n r i should be, however Hoeffding s inequality does not capture this behavior A natural way to quantify this intuition is by noting that the variance of n r i depends on p as Varr i = p The inequality that follows, Bernstein s inequality, uses the variance of the summands to improve over Hoeffding s inequality The way this is going to be achieved is by strengthening the proof above, more specifically in step 3 we will use the bound on the variance to get a better estimate on [e λx i ] essentially by realizing that if X i is centered, Xi = σ, and X i a then, for k, Xi k σ a k = σ a k a Theorem 45 Bernstein s Inequality Let X 1, X,, X n be independent centered bounded random variables, ie, X i a and [X i ] = 0, with variance [Xi ] = σ Then, { t Prob X i > t exp nσ + 3 at Remark 46 Before proving Bernstein s Inequality, note that on the example of Remark 44 we get { t Prob r i > t exp np + 3 t, which exhibits a dependence on p and, for small values of p is considerably smaller than what Hoeffding s inequality gives Proof As before, we will prove { t Prob X i > t exp nσ + 3 at, and then union bound with the same result for n X i, to prove the Theorem 4

5 For any λ > 0 we have { Prob X i > t = Prob{e λ X i > e λt [eλ Xi ] e λt n = e λt [e λx i ] Now comes the source of the improvement over Hoeffding s, [e λx i ] = [ λ m Xi m 1 + λx i + m! m= λ m a m σ 1 + m! m= = 1 + σ λa m a m! m= = 1 + σ a e λa 1 λa Therefore, { [ Prob X i > t e λt 1 + σ ] n a e λa 1 λa We will use a few simple inequalities that can be easily proved with calculus such as 1 + x e x, for all x R This means that, 1 + σ a e λa 1 λa e σ a eλa 1 λa, which readily implies { Prob X i > t e λt e nσ a eλa 1 λa As before, we try to find the value of λ > 0 that minimizes min { λt + nσ λ a eλa 1 λa Differentiation gives t + nσ a aeλa a = 0 In fact y = 1 + x is a tangent line to the graph of fx = e x ] 5

6 which implies that the optimal choice of λ is given by λ = 1 a log 1 + at nσ If we set then λ = 1 a log1 + u Now, the value of the minimum is given by u = at nσ, 4 λ t + nσ a eλ a 1 λ a = nσ [1 + u log1 + u u] a Which means that, { Prob X i > t exp nσ {1 + u log1 + u u a The rest of the proof follows by noting that, for every u > 0, which implies: 1 + u log1 + u u u u +, 5 3 { Prob X i > t exp = exp nσ a u u + 3 t nσ + 3 at 4 Gaussian Concentration One of the most important results in concentration of measure is Gaussian concentration, although being a concentration result specific for normally distributed random variables, it will be very useful throughout these lectures Intuitively it says that if F : R n R is a function that is stable in terms of its input then F g is very well concentrated around its mean, where g N 0, I More precisely: Theorem 47 Gaussian Concentration Let X = [X 1,, X n ] T be a vector with iid standard Gaussian entries and F : R n R a σ-lipschitz function ie: F x F y σ x y, for all x, y R n Then, for every t 0 Prob { F X F X t exp t σ 6

7 For the sake of simplicity we will show the proof for a slightly weaker bound in terms of the constant inside the exponent: Prob { F X F X t exp t This exposition follows closely π σ the proof of Theorem 11 in [Tao1] and the original argument is due to Maurey and Pisier For a proof with the optimal constants see, for example, Theorem 35 in these notes [vh14] We will also assume the function F is smooth this is actually not a restriction, as a limiting argument can generalize the result from smooth functions to general Lipschitz functions Proof If F is smooth, then it is easy to see that the Lipschitz property implies that, for every x R n, F x σ By subtracting a constant to F, we can assume that F X = 0 Also, it is enough to show a one-sided bound Prob {F X F X t exp π t σ, since obtaining the same bound for F X and taking a union bound would gives the result We start by using the same idea as in the proof of the large deviation inequalities above; for any λ > 0, Markov s inequality implies that Prob {F X t = Prob {exp λf X exp λt [exp λf X] exp λt This means we need to upper bound [exp λf X] using a bound on F The idea is to introduce a random independent copy Y of X Since exp λ is convex, Jensen s inequality implies that [exp λf Y ] exp λf Y = exp0 = 1 Hence, since X and Y are independent, [exp λ [F X F Y ]] = [exp λf X] [exp λf Y ] [exp λf X] Now we use the Fundamental Theorem of Calculus in a circular arc from X to Y : F X F Y = π 0 F Y cos θ + X sin θ dθ θ The advantage of using the circular arc is that, for any θ, X θ := Y cos θ + X sin θ is another random variable with the same distribution Also, its derivative with respect to θ, X θ = Y sin θ + X cos θ also is Moreover, X θ and X θ are independent In fact, note that [ ] X θ X θt = [Y cos θ + X sin θ] [ Y sin θ + X cos θ] T = 0 We use Jensen s again with respect to the integral now to get: exp λ [F X F Y ] = exp λ π 1 π/ π/ 1 π/ π/ 0 exp 0 λ π θ F X θ dθ θ F X θ dθ 7

8 Using the chain rule, and taking expectations exp λ [F X F Y ] π exp λ [F X F Y ] π π/ 0 π/ 0 exp λ π F X θ X θ dθ, exp λ π F X θ X θ dθ, If we condition on X θ, since λ π F X θ λ π σ, λ π F X θ X θ with variance at most λ π σ This directly implies that, for every value of Xθ X θ exp λ π [ 1 F X θ X θ exp λ π ] σ Taking expectation now in X θ, and putting everything together, gives [ 1 [exp λf X] exp λ π ] σ, which means that [ 1 Prob {F X t exp λ π ] σ λt, Optimizing for λ gives λ = t π, which gives σ Prob {F X t exp [ π t σ ] is a gaussian random variable 41 Spectral norm of a Wigner Matrix We give an illustrative example of the utility of Gaussian concentration Let W R n n be a standard Gaussian Wigner matrix, a symmetric matrix with otherwise independent gaussian entries, the offdiagonal entries have unit variance and the diagonal entries have variance W depends on nn+1 independent standard gaussian random variables and it is easy to see that it is a -Lipschitz function of these variables, since W 1 W W 1 W W 1 W F The symmetry of the matrix and the variance of the diagonal entries are responsible for an extra factor of Using Gaussian Concentration Theorem 47 we immediately get Since 3 W n we get Prob { W W + t exp t 4 3 It is an excellent exercise to prove W n using Slepian s inequality 8

9 Proposition 48 Let W R n n be a standard Gaussian Wigner matrix, a symmetric matrix with otherwise independent gaussian entries, the off-diagonal entries have unit variance and the diagonal entries have variance Then, Prob { W n + t exp t 4 Note that this gives an extremely precise control of the fluctuations of W In fact, for t = log n this gives { Prob W n + log n exp 4 log n = 1 4 n 4 Talagrand s concentration inequality A remarkable result by Talagrand [Tal95], Talangrad s concentration inequality, provides an analogue of Gaussian concentration to bounded random variables Theorem 49 Talangrand concentration inequality, Theorem 113 [Tao1] Let K > 0, and let X 1,, X n be independent bounded random variables, X i K for all 1 i n Let F : R n R be a σ-lipschitz and convex function Then, for any t 0, t Prob { F X [F X] tk c 1 exp c, for positive constants c 1, and c Other useful similar inequalities with explicit constants are available in [Mas00] 43 Other useful large deviation inequalities This Section contains, without proof, some scalar large deviation inequalities that I have found useful 431 Additive Chernoff Bound The additive Chernoff bound, also known as Chernoff-Hoeffding theorem concerns Bernoulli random variables Theorem 410 Given 0 < p < 1 and X 1,, X n iid random variables distributed as Bernoullip random variable meaning that it is 1 with probability p and 0 with probability 1 p, then, for any ε > 0: Prob Prob { 1 n { 1 n X i p + ε X i p ε [ p p+ε ] 1 p 1 p ε n p + ε 1 p ε [ p p ε ] 1 p 1 p+ε n p ε 1 p + ε σ 9

10 43 Multiplicative Chernoff Bound There is also a multiplicative version see, for example Lemma 33 in [Dur06], which is particularly useful Theorem 411 Let X 1,, X n be independent random variables taking values is {0, 1 meaning they are Bernoulli distributed but not necessarily identically distributed Let µ = n X i, then, for any δ > 0: [ e δ ] µ Prob {X > 1 + δµ < 1 + δ 1+δ [ e δ ] µ Prob {X < 1 δµ < 1 δ 1 δ 433 Deviation bounds on χ variables A particularly useful deviation inequality is Lemma 1 in Laurent and Massart [LM00]: Theorem 41 Lemma 1 in Laurent and Massart [LM00] Let X 1,, X n be iid standard gaussian random variables N 0, 1, and a 1,, a n non-negative numbers Let Z = The following inequalities hold for any t > 0: a k X k 1 Prob {Z a x + a x exp x, Prob {Z a x exp x, where a = n a k and a = max 1 k n a k Note that if a k = 1, for all k, then Z is a χ with n degrees of freedom, so this theorem immediately gives a deviation inequality for χ random variables 44 Matrix Concentration In many important applications, some of which we will see in the proceeding lectures, one needs to use a matrix version of the inequalities above Given {X k n independent random symmetric d d matrices one is interested in deviation inequalities for λ max X k For example, a very useful adaptation of Bernstein s inequality exists for this setting 10

11 Theorem 413 Theorem 14 in [Tro1] Let {X k n be a sequence of independent random symmetric d d matrices Assume that each X k satisfies: X k = 0 and λ max X k R almost surely Then, for all t 0, { t Prob λ max X k t d exp σ + 3 Rt where σ = X k Note that A denotes the spectral norm of A In what follows we will state and prove various matrix concentration results, somewhat similar to Theorem 413 Motivated by the derivation of Proposition 48, that allowed us to easily transform bounds on the expected spectral norm of a random matrix into tail bounds, we will mostly focus on bounding the expected spectral norm Tropp s monograph [Tro15b] is a nice introduction to matrix concentration and includes a proof of Theorem 413 as well as many other useful inequalities A particularly important inequality of this type is for gaussian series, it is intimately related to the non-commutative Khintchine inequality [Pis03], and for that reason we will often refer to it as Non-commutative Khintchine see, for example, 49 in [Tro1] Theorem 414 Non-commutative Khintchine NCK Let A 1,, A n R d d be symmetric matrices and g 1,, g n N 0, 1 iid, then: 1 g k A k + logd σ, where σ = A k 6 Note that, akin to Proposition 48, we can also use Gaussian Concentration to get a tail bound on n g ka k We consider the function F : R n g k A k 11

12 We now estimate its Lipschitz constant; let g, h R n then g k A k h k A k g k A k h k A k = g k h k A k = max v: v =1 vt g k h k A k v = max = v: v =1 max v: v =1 max g k h k v T A k v n g k h k n v T A k v v: v =1 v T A k v g h, where the first inequality made use of the triangular inequality and the last one of the Cauchy-Schwarz inequality This motivates us to define a new parameter, the weak variance σ Definition 415 Weak Variance see, for example, [Tro15b] Given A 1,, A n R d d symmetric matrices We define the weak variance parameter as σ = max v: v =1 v T A k v This means that, using Gaussian concentration and setting t = uσ, we have Prob { g k A k + logd 1 σ + uσ exp 1 u 7 This means that although the expected value of n g ka k is controlled by the parameter σ, its fluctuations seem to be controlled by σ We compare the two quantities in the following Proposition Proposition 416 Given A 1,, A n R d d symmetric matrices, recall that σ = A k and σ = max v T A k v v: v =1 We have σ σ 1

13 Proof Using the Cauchy-Schwarz inequality, σ = max v: v =1 = max v: v =1 max v: v =1 = max v: v =1 = max = v: v =1 = σ A k v T A k v v T [A k v] v A k v A k v v T A k v 45 Optimality of matrix concentration result for gaussian series The following simple calculation is suggestive that the parameter σ in Theorem 414 is indeed the correct parameter to understand n g ka k g k A k = g k A k = max v: v =1 vt g k A k v max v: v =1 vt g k A k v = max v: v =1 vt A k v = σ 8 But a natural question is whether the logarithmic term is needed Motivated by this question we ll explore a couple of examples xample 417 We can write a d d Wigner matrix W as a gaussian series, by taking A ij for i j defined as A ij = e i e T j + e j e T i, if i j, and A ii = e i e T i It is not difficult to see that, in this case, i j A ij = d + 1I d d, meaning that σ = d + 1 This means that Theorem 414 gives us W d log d, 13

14 however, we know that W d, meaning that the bound given by NCK Theorem 414 is, in this case, suboptimal by a logarithmic factor 4 The next example will show that the logarithmic factor is in fact needed in some examples xample 418 Consider A k = e k e T k Rd d for k = 1,, d The matrix n g ka k corresponds to a diagonal matrix with independent standard gaussian random variables as diagonal entries, and so it s spectral norm is given by max k g k It is known that max 1 k d g k log d On the other hand, a direct calculation shows that σ = 1 This shows that the logarithmic factor cannot, in general, be removed This motivates the question of trying to understand when is it that the extra dimensional factor is needed For both these examples, the resulting matrix X = n g ka k has independent entries except for the fact that it is symmetric The case of independent entries [RS13, Seg00, Lat05, BvH15] is now somewhat understood: Theorem 419 [BvH15] If X is a d d random symmetric matrix with gaussian independent entries except for the symmetry constraint whose entry i, j has variance b ij then X max d 1 i d j=1 b ij + max b ij log d ij Remark 40 X in the theorem above can be written in terms of a Gaussian series by taking A ij = b ij ei e T j + e j e T i, for i < j and A ii = b ii e i e T i One can then compute σ and σ : σ = max 1 i d j=1 d b ij and σ b ij This means that, when the random matrix in NCK Theorem 414 has negative entries modulo symmetry then X σ + log dσ 9 Theorem 419 together with a recent improvement of Theorem 414 by Tropp [Tro15c] 5 motivate the bold possibility of 9 holding in more generality Conjecture 41 Let A 1,, A n R d d be symmetric matrices and g 1,, g n N 0, 1 iid, then: g k A k σ + log d 1 σ, 4 By a b we mean a b and a b 5 We briefly discuss this improvement in Remark 43 14

15 While it may very will be that this Conjecture 41 is false, no counter example is known, up to date Open Problem 41 Improvement on Non-Commutative Khintchine Inequality Prove or disprove Conjecture 41 I would also be pretty excited to see interesting examples that satisfy the bound in Conjecture 41 while such a bound would not trivially follow from Theorems 414 or An interesting observation regarding random matrices with independent matrices For the independent { entries setting, Theorem 419 is tight up to constants for a wide range of variance profiles b ij the details are available as Corollary 315 in [BvH15]; the basic idea is that if the i j largest variance is comparable to the variance of a sufficient number of entries, then the bound in Theorem 419 is tight up to constants { However, the situation is not as well understood when the variance profiles b ij are arbitrary i j Since the spectral norm of a matrix is always at least the l norm of a row, the following lower bound holds for X a symmetric random matrix with independent gaussian entries: X max Xe k k Observations in papers of Lata la [Lat05] and Riemer and Schutt [RS13], together with the results in [BvH15], motivate the conjecture that this lower bound is always tight up to constants Open Problem 4 Lata la-riemer-schutt Given X a symmetric random matrix with independent gaussian entries, is the following true? X max Xe k k The results in [BvH15] answer this in the positive for a large range of variance profiles, but not in full generality Recently, van Handel [vh15] proved this conjecture in the positive with an extra factor of log log d More precisely, that X log log d max Xe k, k where d is the number of rows and columns of X 46 A matrix concentration inequality for Rademacher Series In what follows, we closely follow [Tro15a] and present an elementary proof of a few useful matrix concentration inequalities We start with a Master Theorem of sorts for Rademacher series the Rademacher analogue of Theorem

16 Theorem 4 Let H 1,, H n R d d be symmetric matrices and ε 1,, ε n iid Rademacher random variables meaning = +1 with probability 1/ and = 1 with probability 1/, then: 1 ε k H k 1 + logd σ, where σ = H k 10 Before proving this theorem, we take first a small detour in discrepancy theory followed by derivations, using this theorem, of a couple of useful matrix concentration inequalities 461 A small detour on discrepancy theory The following conjecture appears in a nice blog post of Raghu Meka [Mek14] Conjecture 43 [Matrix Six-Deviations Suffice] There exists a universal constant C such that, for any choice of n symmetric matrices H 1,, H n R n n satisfying H k 1 for all k = 1,, n, there exists ε 1,, ε n {±1 such that ε k H k C n Open Problem 43 Prove or disprove Conjecture 43 Note that, when the matrices H k are diagonal, this problem corresponds to Spencer s Six Standard Deviations Suffice Theorem [Spe85] Remark 44 Also, using Theorem 4, it is easy to show that if one picks ε i as iid Rademacher random variables, then with positive probability via the probabilistic method the inequality will be satisfied with an extra log n term In fact one has ε k H k log n H k log n n H k log n n Remark 45 Remark 44 motivates asking whether Conjecture 43 can be strengthened to ask for ε 1,, ε n such that 1 ε k H k Hk 11 16

17 46 Back to matrix concentration Using Theorem 4, we ll prove the following Theorem Theorem 46 Let T 1,, T n R d d be random independent positive semidefinite matrices, then T i 1 T i + Cd max T i i 1, where Cd := log d 1 A key step in the proof of Theorem 46 is an idea that is extremely useful in Probability, the trick of symmetrization For this reason we isolate it in a lemma Lemma 47 Symmetrization Let T 1,, T n be independent random matrices note that they don t necessarily need to be positive semidefinite, for the sake of this lemma and ε 1,, ε n random iid Rademacher random variables independent also from the matrices Then T i T i + ε i T i Proof Triangular inequality gives T i T i + T i T i Let us now introduce, for each i, a random matrix T i identically distributed to T i and independent all n matrices are independent Then T i T i = T = T T [ ] T i T i T T i i T i T i Ti T i Ti T i, where we use the notation a to mean that the expectation is taken with respect to the variable a and the last step follows from Jensen s inequality with respect to T Since T i T i is a symmetric random variable, it is identically distributed to ε i T i T i which gives Ti T i = ε i Ti T i ε i T i + ε i T i = ε i T i, concluding the proof 17

18 Proof [of Theorem 46] Using Lemma 47 and Theorem 4 we get T i T i + Cd The trick now is to make a term like the one in the LHS appear in the RHS For that we start by noting you can see Fact 3 in [Tro15a] for an elementary proof that, since T i 0, Ti max T i T i i This means that T i T i + Cd Further applying the Cauchy-Schwarz inequality for gives, T i T i + Cd max T i i max T i i T i T i T i 1, Now that the term n T i appears in the RHS, the proof can be finished with a simple application of the quadratic formula see Section 61 in [Tro15a] for details We now show an inequality for general symmetric matrices Theorem 48 Let Y 1,, Y n R d d be random independent positive semidefinite matrices, then Y i Cdσ + CdL, where, and, as in 1, σ = Yi and L = max Y i 13 i Cd := log d Proof Using Symmetrization Lemma 47 and Theorem 4, we get [ ] Y i Y ε ε i Y i Cd 18 Y i 1

19 Jensen s inequality gives Yi 1 Yi and the proof can be concluded by noting that Y i 0 and using Theorem 46 1, Remark 49 The rectangular case One can extend Theorem 48 to general rectangular matrices S 1,, S n R d 1 d by setting [ ] 0 Si Y i = Si T, 0 and noting that Y i [ ] [ 0 Si = Si T 0 = Si Si T 0 0 Si T S i We defer the details to [Tro15a] ] = max { Si T S i, S i Si T In order to prove Theorem 4, we will use an AM-GM like inequality for matrices for which, unlike the one on Open Problem 0 in [Ban15b], an elementary proof is known Lemma 430 Given symmetric matrices H, W, Y R d d and non-negative integers r, q satisfying q r, Tr [ HW q HY r q] + Tr [ HW r q HY q] Tr [ H W r + Y r], and summing over q gives r q=0 Tr [ HW q HY r q] r + 1 Tr [ H W r + Y r] We refer to Fact 4 in [Tro15a] for an elementary proof but note that it is a matrix analogue to the inequality, µ θ λ 1 θ + µ 1 θ λ θ λ + θ for µ, λ 0 and 0 θ 1, which can be easily shown by adding two AM-GM inequalities µ θ λ 1 θ θµ + 1 θλ and µ 1 θ λ θ 1 θµ + θλ Proof [of Theorem 4] Let X = n ε kh k, then for any positive integer p, X X p 1 p = X p 1 p Tr X p 1 p, where the first inequality follows from Jensen s inequality and the last from X p 0 and the observation that the trace of a positive semidefinite matrix is at least its spectral norm In the sequel, we 19

20 upper bound Tr X p We introduce X +i and X i as X conditioned on ε i being, respectively +1 or 1 More precisely X +i = H i + ε j H j and X i = H i + ε j H j j i j i Then, we have Tr X p = Tr [ XX p 1] = Tr ε i H i X p 1 Note that εi Tr [ ε i H i X p 1] [ ] = 1 Tr H i X p 1 +i X p 1 i, this means that Tr X p = 1 [ ] Tr H i X p 1 +i X p 1 i, where the expectation can be taken over ε j for j i Now we rewrite X p 1 +i X p 1 i as a telescopic sum: p X p 1 +i X p 1 i = X q +i X +i X i X p q i q=0 Which gives Tr X p = Since X +i X i = H i we get p q=0 1 [ Tr H i X q +i X +i X i X p q i ] Tr X p = p q=0 [ Tr H i X q +i H ix p q i ] 14 We now make use of Lemma 430 to get 6 to get Tr X p p 1 [ Tr Hi ] X p +i + X p i 15 6 See Remark 43 regarding the suboptimality of this step 0

21 Hence, p 1 [ Tr Hi ] X p +i + X p i = p 1 = p 1 = p 1 Tr H i X p +i + X p i Tr [ Hi [ εi X p ]] Tr [ Hi X p ] = p 1 Tr [ H i X p ] Since X p 0 we have [ Tr which gives H i ] X p Applying this inequality, recursively, we get Hence, H i Tr Xp = σ Tr X p, 16 Tr X p σ p 1 Tr X p 17 Tr X p [p 1p 3 31] σ p Tr X 0 = p 1!!σ p d X Tr X p 1 p [p 1!!] 1 p σd 1 Taking p = log d and using the fact that p 1!! p+1 e p p see [Tro15a] for an elementary proof consisting essentially of taking logarithms and comparing the sum with an integral we get 1 log d X σd log d log d σ e Remark 431 A similar argument can be used to prove Theorem 414 the gaussian series case based on gaussian integration by parts, see Section 7 in [Tro15c] Remark 43 Note that, up until the step from 14 to 15 all steps are equalities suggesting that this step may be the lossy step responsible by the suboptimal dimensional factor in several cases although 16 can also potentially be lossy, it is not uncommon that Hi is a multiple of the identity matrix, which would render this step also an equality In fact, Joel Tropp [Tro15c] recently proved an improvement over the NCK inequality that, essentially, consists in replacing inequality 15 with a tighter argument In a nutshell, the idea is that, if the H i s are non-commutative, most summands in 14 are actually expected to be smaller than the ones corresponding to q = 0 and q = p, which are the ones that appear in 15 1

22 47 Other Open Problems 471 Oblivious Sparse Norm-Approximating Projections There is an interesting random matrix problem related to Oblivious Sparse Norm-Approximating Projections [NN], a form of dimension reduction useful for fast linear algebra In a nutshell, The idea is to try to find random matrices Π that achieve dimension reduction, meaning Π R m n with m n, and that preserve the norm of every point in a certain subspace [NN], moreover, for the sake of computational efficiency, these matrices should be sparse to allow for faster matrix-vector multiplication In some sense, this is a generalization of the ideas of the Johnson-Lindenstrauss Lemma and Gordon s scape through the Mesh Theorem that we will discuss next Section Open Problem 44 OSNAP [NN] Let s d m n 1 Let Π R m n be a random matrix with iid entries Π ri = δ riσ ri s, where σ ri is a Rademacher random variable and { 1 s s with probability m δ ri = 0 with probability 1 s m Prove or disprove: there exist positive universal constants c 1 and c such that For any U R n d for which U T U = I d d for m c 1 d+log 1 δ ε and s c log d δ ε Same setting as in 1 but conditioning on Prob { ΠU T ΠU I ε < δ, m δ ri = s, for all i, r=1 meaning that each column of Π has exactly s non-zero elements, rather than on average The conjecture is then slightly different: Prove or disprove: there exist positive universal constants c 1 and c such that For any U R n d for which U T U = I d d for m c 1 d+log 1 δ ε and s c log d δ ε Prob { ΠU T ΠU I ε < δ,

23 3 The conjecture in 1 but for the specific choice of U: [ I U = d d 0 n d d In this case, the object in question is a sum of rank 1 independent matrices More precisely, z 1,, z m R d corresponding to the first d coordinates of each of the m rows of Π are iid random vectors with iid entries z k j = ] 1 s with probability s m 0 with probability 1 s m 1 s with probability s m Note that z k zk T = 1 m I d d The conjecture is then that, there exists c 1 and c positive universal constants such that { m [ Prob zk zk T z kzk T ] ε < δ, d+log for m c 1 δ log 1 and s c d ε δ ε I think this would is an interesting question even for fixed δ, for say δ = 01, or even simply understand the value of m [ zk zk T z kzk T ] 47 k-lifts of graphs Given a graph G, on n nodes and with max-degree, and an integer k a random k lift G k of G is a graph on kn nodes obtained by replacing each edge of G by a random k k bipartite matching More precisely, the adjacency matrix A k of G k is a nk nk matrix with k k blocks given by A k ij = A ij Π ij, where Π ij is uniformly randomly drawn from the set of permutations on k elements, and all the edges are independent, except for the fact that Π ij = Π ji In other words, A k = i<j A ij ei e T j Π ij + e j e T i Π T ij, where corresponds to the Kronecker product Note that 1 A k = A k J, where J = 11 T is the all-ones matrix 3

24 Open Problem 45 Random k-lifts of graphs Give a tight upperbound to A k A k Oliveira [Oli10] gives a bound that is essentially of the form lognk, while the results in [ABG1] suggest that one may expect more concentration for large k It is worth noting that the case of k = can essentially be reduced to a problem where the entries of the random matrix are independent and the results in [BvH15] can be applied to, in some case, remove the logarithmic factor 48 Another open problem Feige [Fei05] posed the following remarkable conjecture Conjecture 433 Given n independent random variables X 1,, X n st, for all i, X i 0 and X i = 1 we have Prob X i n e 1 Note that, if X i are iid and X i = n + 1 with probability 1/n + 1 and X i = 0 otherwise, then Prob n n X i n + 1 = 1 n n+1 1 e 1 Open Problem 46 Prove or disprove Conjecture References [ABG1] L Addario-Berry and S Griffiths The spectrum of random lifts available at arxiv: [mathco], 01 [Ban15a] A S Bandeira Relax and Conquer BLOG: 18S096 Concentration Inequalities, Scalar and Matrix Versions 015 [Ban15b] A S Bandeira Relax and Conquer BLOG: Ten Lectures and Forty-two Open Problems in Mathematics of Data Science 015 [BvH15] A S Bandeira and R v Handel Sharp nonasymptotic bounds on the norm of random matrices with independent entries Annals of Probability, to appear, 015 [Dur06] R Durrett Random Graph Dynamics Cambridge Series in Statistical and Probabilistic Mathematics Cambridge University Press, New York, NY, USA, 006 [Fei05] U Feige On sums of independent random variables with unbounded variance, and estimating the average degree in a graph 005 [Lat05] R Lata la Some estimates of norms of random matrices Proc Amer Math Soc, 1335: electronic, We thank Francisco Unda and Philippe Rigollet for suggesting this problem 4

25 [LM00] B Laurent and P Massart Adaptive estimation of a quadratic functional by model selection Ann Statist, 000 [Mas00] P Massart About the constants in Talagrand s concentration inequalities for empirical processes The Annals of Probability, 8, 000 [Mek14] R Meka Windows on Theory BLOG: Discrepancy and Beating the Union Bound http: // windowsontheory org/ 014/ 0/ 07/ discrepancy-and-beating-the-union-bound/, 014 [NN] [Oli10] [Pis03] [RS13] J Nelson and L Nguyen Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings Available at arxiv: [csds] R I Oliveira The spectrum of random k-lifts of large graphs with possibly large k Journal of Combinatorics, 010 G Pisier Introduction to operator space theory, volume 94 of London Mathematical Society Lecture Note Series Cambridge University Press, Cambridge, 003 S Riemer and C Schütt On the expectation of the norm of random matrices with nonidentically distributed entries lectron J Probab, 18, 013 [Seg00] Y Seginer The expected norm of random matrices Combin Probab Comput, 9: , 000 [Spe85] J Spencer Six standard deviations suffice Trans Amer Math Soc, 89, 1985 [Tal95] [Tao1] [Tro1] [Tro15a] M Talagrand Concentration of measure and isoperimetric inequalities in product spaces Inst Hautes tudes Sci Publ Math, 81:73 05, 1995 T Tao Topics in Random Matrix Theory Graduate studies in mathematics American Mathematical Soc, 01 J A Tropp User-friendly tail bounds for sums of random matrices Foundations of Computational Mathematics, 14: , 01 J A Tropp The expected norm of a sum of independent random matrices: An elementary approach Available at arxiv: [mathpr], 015 [Tro15b] J A Tropp An introduction to matrix concentration inequalities Foundations and Trends in Machine Learning, 015 [Tro15c] J A Tropp Second-order matrix concentration inequalities In preparation, 015 [vh14] [vh15] R van Handel Probability in high dimensions ORF 570 Lecture Notes, Princeton University, 014 R van Handel On the spectral norm of inhomogeneous random matrices Available online at arxiv: [mathpr], 015 5

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Matrix concentration inequalities

Matrix concentration inequalities ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random

More information

Lecture 3. Random Fourier measurements

Lecture 3. Random Fourier measurements Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our

More information

Stein s Method for Matrix Concentration

Stein s Method for Matrix Concentration Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology

More information

Using Friendly Tail Bounds for Sums of Random Matrices

Using Friendly Tail Bounds for Sums of Random Matrices Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

4 Concentration Inequalities, Scalar and Matrix Versions

4 Concentration Inequalities, Scalar and Matrix Versions 4 Cocetratio Iequalities, Scalar ad Matrix Versios 4. Large Deviatio Iequalities Cocetratio ad large deviatios iequalities are amog the most useful tools whe uderstadig the performace of some algorithms.

More information

High Dimensional Probability

High Dimensional Probability High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability

More information

Machine Learning Theory Lecture 4

Machine Learning Theory Lecture 4 Machine Learning Theory Lecture 4 Nicholas Harvey October 9, 018 1 Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail

More information

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Tail and Concentration Inequalities

Tail and Concentration Inequalities CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

Sparsification by Effective Resistance Sampling

Sparsification by Effective Resistance Sampling Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

On the Spectra of General Random Graphs

On the Spectra of General Random Graphs On the Spectra of General Random Graphs Fan Chung Mary Radcliffe University of California, San Diego La Jolla, CA 92093 Abstract We consider random graphs such that each edge is determined by an independent

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators Stanislav Minsker e-mail: sminsker@math.duke.edu Abstract: We present some extensions of Bernstein s inequality for random self-adjoint

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Lectures 2 3 : Wigner s semicircle law

Lectures 2 3 : Wigner s semicircle law Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j=1 be a symmetric n n matrix with Random entries

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

On the concentration of eigenvalues of random symmetric matrices

On the concentration of eigenvalues of random symmetric matrices On the concentration of eigenvalues of random symmetric matrices Noga Alon Michael Krivelevich Van H. Vu April 23, 2012 Abstract It is shown that for every 1 s n, the probability that the s-th largest

More information

Structured Random Matrices

Structured Random Matrices Structured Random Matrices Ramon van Handel Abstract Random matrix theory is a well-developed area of probability theory that has numerous connections with other areas of mathematics and its applications.

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

OXPORD UNIVERSITY PRESS

OXPORD UNIVERSITY PRESS Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables

More information

THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS

THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS KEITH CONRAD. Introduction The easiest matrices to compute with are the diagonal ones. The sum and product of diagonal matrices can be computed componentwise

More information

Sequences of Real Numbers

Sequences of Real Numbers Chapter 8 Sequences of Real Numbers In this chapter, we assume the existence of the ordered field of real numbers, though we do not yet discuss or use the completeness of the real numbers. In the next

More information

MATH 320, WEEK 7: Matrices, Matrix Operations

MATH 320, WEEK 7: Matrices, Matrix Operations MATH 320, WEEK 7: Matrices, Matrix Operations 1 Matrices We have introduced ourselves to the notion of the grid-like coefficient matrix as a short-hand coefficient place-keeper for performing Gaussian

More information

12. Perturbed Matrices

12. Perturbed Matrices MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Lecture 7: February 6

Lecture 7: February 6 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 7: February 6 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Elementary Linear Algebra

Elementary Linear Algebra Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

arxiv: v1 [math.pr] 11 Feb 2019

arxiv: v1 [math.pr] 11 Feb 2019 A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

ORDERS OF ELEMENTS IN A GROUP

ORDERS OF ELEMENTS IN A GROUP ORDERS OF ELEMENTS IN A GROUP KEITH CONRAD 1. Introduction Let G be a group and g G. We say g has finite order if g n = e for some positive integer n. For example, 1 and i have finite order in C, since

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

MATH 1A, Complete Lecture Notes. Fedor Duzhin

MATH 1A, Complete Lecture Notes. Fedor Duzhin MATH 1A, Complete Lecture Notes Fedor Duzhin 2007 Contents I Limit 6 1 Sets and Functions 7 1.1 Sets................................. 7 1.2 Functions.............................. 8 1.3 How to define a

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

arxiv: v1 [math.pr] 22 Dec 2018

arxiv: v1 [math.pr] 22 Dec 2018 arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper

More information

Bounds for all eigenvalues of sums of Hermitian random matrices

Bounds for all eigenvalues of sums of Hermitian random matrices 20 Chapter 2 Bounds for all eigenvalues of sums of Hermitian random matrices 2.1 Introduction The classical tools of nonasymptotic random matrix theory can sometimes give quite sharp estimates of the extreme

More information

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION Annales Univ. Sci. Budapest., Sect. Comp. 33 (2010) 273-284 ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION L. László (Budapest, Hungary) Dedicated to Professor Ferenc Schipp on his 70th

More information

On the Logarithmic Calculus and Sidorenko s Conjecture

On the Logarithmic Calculus and Sidorenko s Conjecture On the Logarithmic Calculus and Sidorenko s Conjecture by Xiang Li A thesis submitted in conformity with the requirements for the degree of Msc. Mathematics Graduate Department of Mathematics University

More information

Weak and strong moments of l r -norms of log-concave vectors

Weak and strong moments of l r -norms of log-concave vectors Weak and strong moments of l r -norms of log-concave vectors Rafał Latała based on the joint work with Marta Strzelecka) University of Warsaw Minneapolis, April 14 2015 Log-concave measures/vectors A measure

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

16 Embeddings of the Euclidean metric

16 Embeddings of the Euclidean metric 16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.

More information

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke

More information

Eigenvalues, random walks and Ramanujan graphs

Eigenvalues, random walks and Ramanujan graphs Eigenvalues, random walks and Ramanujan graphs David Ellis 1 The Expander Mixing lemma We have seen that a bounded-degree graph is a good edge-expander if and only if if has large spectral gap If G = (V,

More information

arxiv: v1 [math.co] 3 Nov 2014

arxiv: v1 [math.co] 3 Nov 2014 SPARSE MATRICES DESCRIBING ITERATIONS OF INTEGER-VALUED FUNCTIONS BERND C. KELLNER arxiv:1411.0590v1 [math.co] 3 Nov 014 Abstract. We consider iterations of integer-valued functions φ, which have no fixed

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017 U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017 Scribed by Luowen Qian Lecture 8 In which we use spectral techniques to find certificates of unsatisfiability

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

The symmetry in the martingale inequality

The symmetry in the martingale inequality Statistics & Probability Letters 56 2002 83 9 The symmetry in the martingale inequality Sungchul Lee a; ;, Zhonggen Su a;b;2 a Department of Mathematics, Yonsei University, Seoul 20-749, South Korea b

More information

18.S096: Spectral Clustering and Cheeger s Inequality

18.S096: Spectral Clustering and Cheeger s Inequality 18.S096: Spectral Clustering and Cheeger s Inequality Topics in Mathematics of Data Science (Fall 015) Afonso S. Bandeira bandeira@mit.edu http://math.mit.edu/~bandeira October 6, 015 These are lecture

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Assignment 4: Solutions

Assignment 4: Solutions Math 340: Discrete Structures II Assignment 4: Solutions. Random Walks. Consider a random walk on an connected, non-bipartite, undirected graph G. Show that, in the long run, the walk will traverse each

More information

Constructive bounds for a Ramsey-type problem

Constructive bounds for a Ramsey-type problem Constructive bounds for a Ramsey-type problem Noga Alon Michael Krivelevich Abstract For every fixed integers r, s satisfying r < s there exists some ɛ = ɛ(r, s > 0 for which we construct explicitly an

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Tail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002

Tail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002 Tail Inequalities 497 - Randomized Algorithms Sariel Har-Peled December 0, 00 Wir mssen wissen, wir werden wissen (We must know, we shall know) David Hilbert 1 Tail Inequalities 1.1 The Chernoff Bound

More information

CUT-OFF FOR QUANTUM RANDOM WALKS. 1. Convergence of quantum random walks

CUT-OFF FOR QUANTUM RANDOM WALKS. 1. Convergence of quantum random walks CUT-OFF FOR QUANTUM RANDOM WALKS AMAURY FRESLON Abstract. We give an introduction to the cut-off phenomenon for random walks on classical and quantum compact groups. We give an example involving random

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5 Instructor: Farid Alizadeh Scribe: Anton Riabov 10/08/2001 1 Overview We continue studying the maximum eigenvalue SDP, and generalize

More information

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families Nikhil Srivastava Microsoft Research India Simons Institute, August 27, 2014 The Last Two Lectures Lecture 1. Every weighted

More information

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before

More information

Generalized eigenvector - Wikipedia, the free encyclopedia

Generalized eigenvector - Wikipedia, the free encyclopedia 1 of 30 18/03/2013 20:00 Generalized eigenvector From Wikipedia, the free encyclopedia In linear algebra, for a matrix A, there may not always exist a full set of linearly independent eigenvectors that

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of

More information

On Systems of Diagonal Forms II

On Systems of Diagonal Forms II On Systems of Diagonal Forms II Michael P Knapp 1 Introduction In a recent paper [8], we considered the system F of homogeneous additive forms F 1 (x) = a 11 x k 1 1 + + a 1s x k 1 s F R (x) = a R1 x k

More information

Matrices and Vectors

Matrices and Vectors Matrices and Vectors James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 11, 2013 Outline 1 Matrices and Vectors 2 Vector Details 3 Matrix

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1

More information

1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III

1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III Korea Lectures June 017 Joel Spencer Tuesday Lecture III 1 Large Deviations We analyze how random variables behave asymptotically Initially we consider the sum of random variables that take values of 1

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Positive entries of stable matrices

Positive entries of stable matrices Positive entries of stable matrices Shmuel Friedland Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago Chicago, Illinois 60607-7045, USA Daniel Hershkowitz,

More information