A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

Similar documents
Matrix Concentration. Nick Harvey University of British Columbia

STAT 200C: High-dimensional Statistics

Stein s Method for Matrix Concentration

Lecture 3. Random Fourier measurements

A Matrix Expander Chernoff Bound

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Sparsification by Effective Resistance Sampling

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators

STAT 200C: High-dimensional Statistics

Concentration Inequalities

Concentration inequalities and the entropy method

Proving the central limit theorem

Matrix concentration inequalities

arxiv: v1 [math.pr] 11 Feb 2019

Using Friendly Tail Bounds for Sums of Random Matrices

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Matrix Solutions to Linear Systems of ODEs

The AKLT Model. Lecture 5. Amanda Young. Mathematics, UC Davis. MAT290-25, CRN 30216, Winter 2011, 01/31/11

Lecture 21: HSP via the Pretty Good Measurement

In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation.

Lecture 1 Measure concentration

Calculus in Gauss Space

Random regular digraphs: singularity and spectrum

arxiv: v1 [math.pr] 22 May 2008

Module 3. Function of a Random Variable and its distribution

The goal of this chapter is to study linear systems of ordinary differential equations: dt,..., dx ) T

NATIONAL UNIVERSITY OF SINGAPORE PC5215 NUMERICAL RECIPES WITH APPLICATIONS. (Semester I: AY ) Time Allowed: 2 Hours

High Dimensional Probability

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Fourier analysis of boolean functions in quantum computation

1 Quantum states and von Neumann entropy

EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION

conventions and notation

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Mahler Conjecture and Reverse Santalo

Problem set 2 The central limit theorem.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

1 Directional Derivatives and Differentiability

Notes on uniform convergence

STABILITY FOR PARABOLIC SOLVERS

A = 3 1. We conclude that the algebraic multiplicity of the eigenvalues are both one, that is,

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Linearization of Differential Equation Models

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Systems of Algebraic Equations and Systems of Differential Equations

1 Sequences of events and their limits

Universal examples. Chapter The Bernoulli process

Probability Inequalities for the Sum of Random Variables When Sampling Without Replacement

Lecture Notes 6: Dynamic Equations Part C: Linear Difference Equation Systems

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Operator-valued extensions of matrix-norm inequalities

Examples include: (a) the Lorenz system for climate and weather modeling (b) the Hodgkin-Huxley system for neuron modeling

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Kernel Method: Data Analysis with Positive Definite Kernels

1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III

Stat 426 : Homework 1.

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

CS 246 Review of Linear Algebra 01/17/19

User-Friendly Tail Bounds for Sums of Random Matrices

Reconstruction from Anisotropic Random Measurements

Eigenvalues and Eigenvectors

Elementary Operations and Matrices

Optimal Sequences and Sum Capacity of Synchronous CDMA Systems

Invertibility of random matrices

FIXED POINT ITERATIONS

Diagonalization by a unitary similarity transformation

Exponential Tail Bounds

On the Spectra of General Random Graphs

The Canonical Gaussian Measure on R

6.842 Randomness and Computation March 3, Lecture 8

Overlapping qubits THOMAS VIDICK CALIFORNIA INSTITUTE OF TECHNOLOGY J O I N T WO R K W I T H BEN REICHARDT, UNIVERSITY OF SOUTHERN CALIFORNIA

ECEN 689 Special Topics in Data Science for Communications Networks

Putzer s Algorithm. Norman Lebovitz. September 8, 2016

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

EXAMPLES OF PROOFS BY INDUCTION

ODE Final exam - Solutions

A Note on Jackknife Based Estimates of Sampling Distributions. Abstract

Commutative Banach algebras 79

Lecture 18: March 15

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Convex Functions and Optimization

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

Machine Learning Theory Lecture 4

DELOCALIZATION OF EIGENVECTORS OF RANDOM MATRICES

Introduction to quantum information processing

Wald for non-stopping times: The rewards of impatient prophets

Effective Dimension and Generalization of Kernel Learning

Tail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002

Lecture 2: Review of Basic Probability Theory

On some special cases of the Entropy Photon-Number Inequality

3. Review of Probability and Statistics

Math 108b: Notes on the Spectral Theorem

3 Integration and Expectation

Recall : Eigenvalues and Eigenvectors

. Find E(V ) and var(v ).

Bernstein-Szegö Inequalities in Reproducing Kernel Hilbert Spaces ABSTRACT 1. INTRODUCTION

Transcription:

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The purpose of this note is to indicate how this method implies Rudelson s sampling theorems for random vectors in isotropic position. Let X 1,..., X n be independent random d d real matrices, and let S n = X 1 + + X n. We will be interested in the magnitude of the deviation S n ES n in the operator norm. 1.1. Real valued random variables. Ashlwede-Winter s method [1] is parallel to the classical approach to deviation inequalities for real valued random variables. We briefly outline the real valued method. Let X 1,..., X n be independent mean zero random variables. We are interested in the magnitude of S n = i X i. For simplicity, we shall assume that X i 1 a.s. This hypothesis can be relaxed to some control of the moments, precisely to having sub-exponential tail. Fix a t > 0 and let λ > 0 be a parameter to be chosen later. We want to estimate p := P(S n > t) = P(e λsn > e λt ). By Markov inequality and using independence, we have p e λt Ee λsn = e λt i Ee λx i. Next, Taylor s expansion and the mean zero and boundedness hypotheses can be used to show that, for every i, This yields Ee λx i e λ2 Var X i, 0 λ 1. p e λt+λ2 σ 2, where σ 2 := i Var X i. The optimal choice of the parameter λ min(τ/2σ 2, 1) implies Chernoff s inequality ) p max (e t2 /σ 2, e t/2. 1

2 1.2. Random matrices. Now we try to generalize this method when X i M d are independent mean zero random matrices, where M d denotes the class of symmetric d d matrices. Some of the matrix calculus is straightforward. Thus, for A M d, the matrix exponential e A is defined as usual by Taylor s series. Recall that e A has the same eigenvectors as A, and eigenvalues e λ i(a). The partial order A B means A B 0, i.e. A B is positive semidefinite. The non-straightforward part is that, in general, e A+B e A e B. However, Golden-Thompson s inequality (see [3]) states that tr e A+B tr(e A e B ) holds for arbitrary A, B M d (and in fact for arbitrary unitaryinvariant norm replacing the trace). Therefore, for S n = X 1 + + X n and for I d being the identity on M d, we have p := P(S n ti d ) = P(e λsn e λti d ) P(tr e λsn > e λt ) e λt E tr(e λsn ). This estimate is not sharp: e λsn e λti d means that the biggest eigenvalue of e λsn exceeds e λt, while tr e λsn > e λt means that the sum of all d eigenvalues exceeds the same. This will be responsible for the (sometimes inevitable) loss of the log d factor in Rudelson s selection theorem. Since S n = X n + S n 1, we can use Golden-Thomson s inequality to separate the last term from the sum: E tr(e λsn ) E tr(e λxn e λs n 1 ). Now, using independence and that E and trace commute, we continue to write = E n 1 tr(e n e λxn e λs n 1 ) E n e λxn E n 1 tr(e λs n 1 ). Continuing by induction, we arrive (since tr(i d ) = d) to E tr(e λsn ) d Ee λx i We have proved that P(S n ti d ) de λt Ee λx i Repeating for S n and using that ti d S n ti d is equivalent to S n t, we have shown that (1) P( S n > t) 2de λt Ee λx i

Remark. As in the real valued case, full independence is never needed in the above argument. It works out well for martingales. The main result: Theorem 1 (Chernoff-type inequality). Let X i M d be independent mean zero random matrices, X i 1 for all i a.s. Let S n = X 1 + + X n, σ 2 = n Var X i Then for every t > 0 we have P( S n > t) d max(e t2 /4σ 2, e t/2 ). To prove this theorem, we have to estimate Ee λx i in (1), which is easy. For example, if X M d and X 1, then Taylor series expansion shows that e Z I d + Z + Z 2. Therefore, we have Lemma 2. Let Z M d be a mean zero random matrix, Z 1 a.s. Then Ee Z e Var Z. Proof. Using the mean zero assumption, we have Ee Z E(I d + Z + Z 2 ) = I d + Var(Z) e Var Z. Let 0 < λ 1. Therefore, by the Theorem s hypotheses, Hence by (1), Ee λx i e λ2 Var X i = e λ2 Var X i. P( S > t) d e λt+λ2 σ 2. With the optimal choice of λ := min(t/2σ 2, 1), the conclusion of Theorem follows. Problem: does the Theorem hold for σ 2 replaced by n Var X i? If so, this would generalize Pisier-Lust-Piquard s non-commutative Khinchine inequality. Corollary 3. Let X i M d be independent random matrices, X i 0, X i 1 for all i a.s. Let S n = X 1 + + X n, E = n EX i Then for every ε (0, 1) we have P( S n ES n > εe) d e ε2 E/4. Proof. Applying Theorem for X i EX i, we have (2) P( S n ES n > t) d max(e t2 /4σ 2, e t/2 ). Note that X i 1 implies that Var X i EX 2 i E( X i X i ) EX i. 3

4 Therefore, σ 2 E. Now we use (2) for t = εe, and note that t 2 /4σ 2 = ε 2 E 2 /4σ 2 ε 2 E/4. Remark. The hypothesis X i 1 can be relaxed throughout to X i ψ1 1. One just has to be more careful with Taylor series. 2. Applications Let x be a random vector in isotropic position in R d, i.e. Ex x = I d. Denote x ψ1 = M. Then the random matrix X := M 2 x x satisfies the hypotheses of Corollary (see remark below it). Clearly, Then Corollary gives We have thus proved: EX = M 2 I d, E = n/m 2, ES n = (n/m 2 )I d. P( S n ES n > ε ES ) d e ε2 n/4m 2. Corollary 4. Let x be a random vector in isotropic position in R d, such that M := x ψ1 <. Let x 1,..., x n be independent copies of x. Then for every ε (0, 1), we have ( 1 n ) P x i x i I d > ε d e ε2 n/4m 2. n The probability in the Corollary is smaller than 1 provided that the number of samples is n ε 2 M 2 log d. This is a version of Rudelson s sampling theorem, where M played the role of (E x log n ) 1/ log n. One can also deduce the main Lemma in [2] from the Corollary in the previous section. Given vectors x i in R d, we are interested in the magnitude of N g i x i x i where g i are independent Gaussians. For normalization, we can assume that N x i x i = A, A = 1. Denote M := max x i i

and consider the random operator X := M 2 x i x i with probability 1/N. As before, let X 1, X 2,..., X n be independent copies of X, and S = X 1 + + X n. This time, we are going to let n. By the Central Limit Theorem, the properly scaled sum S ES will converge to N g ix i x i. One then chooses the parameters correctly to produce a version of the main Lemma in [2]. We omit the details. References [1] R. Ahlswede, A. Winter, Strong converse for identification via quantum channels, IEEE Trans. Information Theory 48 (2002), 568 579 [2] M. Rudelson, Random vectors in the isotropic position [3] A. Wigderson, D. Xiao, Derandomzing the Ahlswede-Winter matrix-valued Chernoff bound using pessimistic estimators, and applications, Theory of Computing 4 (2008), 53 76 5