Stein s Method for Matrix Concentration

Similar documents
Matrix Completion and Matrix Concentration

MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1

Using Friendly Tail Bounds for Sums of Random Matrices

Matrix Concentration. Nick Harvey University of British Columbia

arxiv: v2 [math.pr] 27 Mar 2014

arxiv: v1 [math.pr] 11 Feb 2019

A Matrix Expander Chernoff Bound

Matrix concentration inequalities

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

Divide-and-Conquer Matrix Factorization

Sparsification by Effective Resistance Sampling

DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS. Daniel Paulin Lester Mackey Joel A. Tropp

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators

User-Friendly Tail Bounds for Sums of Random Matrices

On the Spectra of General Random Graphs

Stein Couplings for Concentration of Measure

Stein s Method: Distributional Approximation and Concentration of Measure

Concentration of Measures by Bounded Couplings

Efron Stein Inequalities for Random Matrices

Lecture 3. Random Fourier measurements

User-Friendly Tools for Random Matrices

An iterative hard thresholding estimator for low rank matrix recovery

User-Friendly Tools for Random Matrices

Technical Report No January 2011

Concentration Inequalities for Random Matrices

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Random regular digraphs: singularity and spectrum

arxiv: v7 [math.pr] 15 Jun 2011

Multivariate trace inequalities. David Sutter, Mario Berta, Marco Tomamichel

EFRON STEIN INEQUALITIES FOR RANDOM MATRICES

arxiv: v1 [math.pr] 22 May 2008

OXPORD UNIVERSITY PRESS

Handout 6: Some Applications of Conic Linear Programming

arxiv: v6 [math.pr] 16 Jan 2011

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Spectral k-support Norm Regularization

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series

Kernel Method: Data Analysis with Positive Definite Kernels

Concentration of Measures by Bounded Size Bias Couplings

STAT 200C: High-dimensional Statistics

Concentration inequalities for non-lipschitz functions

Lecture 9: Matrix approximation continued

Intrinsic Volumes of Convex Cones Theory and Applications

Sparse and Low Rank Recovery via Null Space Properties

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

COR-OPT Seminar Reading List Sp 18

On a class of stochastic differential equations in a financial network model

Convergence Rates of Kernel Quadrature Rules

Bounds for all eigenvalues of sums of Hermitian random matrices

Handout 8: Dealing with Data Uncertainty

Fundamentals of Matrices

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

On the exponential convergence of. the Kaczmarz algorithm

Preface to the Second Edition...vii Preface to the First Edition... ix

Operator-valued extensions of matrix-norm inequalities

Different quantum f-divergences and the reversibility of quantum operations

Invertibility of random matrices

Notes on matrix arithmetic geometric mean inequalities

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Preliminary Examination, Numerical Analysis, August 2016

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Itô isomorphisms for L p -valued Poisson stochastic integrals

8. Diagonalization.

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Stein s Method: Distributional Approximation and Concentration of Measure

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

arxiv: v1 [math.na] 1 Sep 2018

Spectral Partitiong in a Stochastic Block Model

Dissertation Defense

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Reconstruction from Anisotropic Random Measurements

THE PAVING PROPERTY FOR UNIFORMLY BOUNDED MATRICES

Analysis of Robust PCA via Local Incoherence

Compressed Sensing and Robust Recovery of Low Rank Matrices

Error Bound for Compound Wishart Matrices

The Hilbert Space of Random Variables

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Spectral inequalities and equalities involving products of matrices

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Non commutative Khintchine inequalities and Grothendieck s theo

STAT 200C: High-dimensional Statistics

Extreme eigenvalues of Erdős-Rényi random graphs

18.S096: Concentration Inequalities, Scalar and Matrix Versions

Overparametrization for Landscape Design in Non-convex Optimization

Lecture 1 and 2: Random Spanning Trees

WEAK TYPE ESTIMATES ASSOCIATED TO BURKHOLDER S MARTINGALE INEQUALITY

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Random Matrices: Invertibility, Structure, and Applications

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Multivariate Trace Inequalities

RandNLA: Randomized Numerical Linear Algebra

Mahler Conjecture and Reverse Santalo

Convexity of the Joint Numerical Range

Structured Random Matrices

Extremal properties of the variance and the quantum Fisher information; Phys. Rev. A 87, (2013).

The Knaster problem and the geometry of high-dimensional cubes

Transcription:

Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology BEARS 2012 Mackey (UC Berkeley) Stein s Method BEARS 2012 1 / 20

Motivation Concentration Inequalities Matrix concentration P{ X EX t} δ P{λ max (X EX) t} δ Non-asymptotic control of random matrices with complex distributions Applications Matrix estimation from sparse random measurements (Gross, 2011; Recht, 2009; Mackey, Talwalkar, and Jordan, 2011) Randomized matrix multiplication and factorization (Drineas, Mahoney, and Muthukrishnan, 2008; Hsu, Kakade, and Zhang, 2011b) Convex relaxation of robust or chance-constrained optimization (Nemirovski, 2007; So, 2011; Cheung, So, and Wang, 2011) Random graph analysis (Christofides and Markström, 2008; Oliveira, 2009) Mackey (UC Berkeley) Stein s Method BEARS 2012 2 / 20

Motivation Concentration Inequalities Matrix concentration P{λ max (X EX) t} δ Difficulty: Matrix multiplication is not commutative Past approaches (Oliveira, 2009; Tropp, 2011; Hsu, Kakade, and Zhang, 2011a) Deep results from matrix analysis Sums of independent matrices and matrix martingales This work Stein s method of exchangeable pairs (1972), as advanced by Chatterjee (2007) for scalar concentration Improved exponential tail inequalities (Hoeffding, Bernstein) Polynomial moment inequalities (Khintchine, Rosenthal) Dependent sums and more general matrix functionals Mackey (UC Berkeley) Stein s Method BEARS 2012 3 / 20

Roadmap Motivation 1 Motivation 2 Stein s Method Background and Notation 3 Exponential Tail Inequalities 4 Polynomial Moment Inequalities 5 Extensions Mackey (UC Berkeley) Stein s Method BEARS 2012 4 / 20

Notation Background Hermitian matrices: H d = {A C d d : A = A } All matrices in this talk are Hermitian. Maximum eigenvalue: λ max ( ) Trace: trb, the sum of the diagonal entries of B Spectral norm: B, the maximum singular value of B Schatten p-norm: B p := ( tr B p) 1/p for p 1 Mackey (UC Berkeley) Stein s Method BEARS 2012 5 / 20

Matrix Stein Pair Background Definition (Exchangeable Pair) (Z,Z ) is an exchangeable pair if (Z,Z ) d = (Z,Z). Definition (Matrix Stein Pair) Let (Z,Z ) be an auxiliary exchangeable pair, and let Ψ : Z H d be a measurable function. Define the random matrices X := Ψ(Z) and X := Ψ(Z ). (X,X ) is a matrix Stein pair with scale factor α (0,1] if E[X Z] = (1 α)x. Matrix Stein pairs are exchangeable pairs Matrix Stein pairs always have zero mean Mackey (UC Berkeley) Stein s Method BEARS 2012 6 / 20

Background The Conditional Variance Definition (Conditional Variance) Suppose that (X,X ) is a matrix Stein pair with scale factor α, constructed from the exchangeable pair (Z,Z ). The conditional variance is the random matrix X := X (Z) := 1 2α E[ (X X ) 2 Z ]. X is a stochastic estimate for the variance, EX 2 Control over X yields control over λ max (X) Mackey (UC Berkeley) Stein s Method BEARS 2012 7 / 20

Exponential Tail Inequalities Exponential Concentration for Random Matrices Theorem (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (X,X ) be a matrix Stein pair with X H d. Suppose that X cx +vi almost surely for c,v 0. Then, for all t 0, { } t 2 P{λ max (X) t} d exp. 2v +2ct Control over the conditional variance X yields Gaussian tail for λ max (X) for small t, Poisson tail for large t When d = 1, reduces to scalar result of Chatterjee (2007) The dimensional factor d cannot be removed Mackey (UC Berkeley) Stein s Method BEARS 2012 8 / 20

Exponential Tail Inequalities Application: Matrix Hoeffding Inequality Corollary (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (Y k ) k 1 be independent matrices in H d satisfying EY k = 0 and Yk 2 A 2 k for deterministic matrices (A k ) k 1. Define the variance parameter σ 2 := 1 ( ) A 2 2 k +EYk 2. k Then, for all t 0, ( P {λ max k Y k ) } t d e t2 /2σ 2. Improves upon the matrix Hoeffding inequality of Tropp (2011) Optimal constant 1/2 in the exponent Variance parameter σ 2 smaller than the bound k A2 k Tighter than classical Hoeffding inequality (1963) when d = 1 Mackey (UC Berkeley) Stein s Method BEARS 2012 9 / 20

Exponential Tail Inequalities Exponential Concentration: Proof Sketch 1. Matrix Laplace transform method (Ahlswede & Winter, 2002) Relate tail probability to the trace of the mgf of X where m(θ) := Etre θx How to bound the trace mgf? P{λ max (X) t} inf θ>0 e θt m(θ) Past approaches: Golden-Thompson, Lieb s concavity theorem Chatterjee s strategy for scalar concentration Control mgf growth by bounding derivative m (θ) = EtrXe θx for θ R. Rewrite using exchangeable pairs Mackey (UC Berkeley) Stein s Method BEARS 2012 10 / 20

Exponential Tail Inequalities Method of Exchangeable Pairs Lemma Suppose that (X,X ) is a matrix Stein pair with scale factor α. Let F : H d H d be a measurable function satisfying Then Intuition E (X X )F(X) <. E[X F(X)] = 1 2α E[(X X )(F(X) F(X ))]. (1) Can characterize the distribution of a random matrix by integrating it against a class of test functions F Eq. 1 allows us to estimate this integral using the smoothness properties of F and the discrepancy X X Mackey (UC Berkeley) Stein s Method BEARS 2012 11 / 20

Exponential Tail Inequalities Exponential Concentration: Proof Sketch 2. Method of Exchangeable Pairs Rewrite the derivative of the trace mgf m (θ) = EtrXe θx = 1 2α Etr[ (X X ) ( e θx e θx )]. Goal: Use the smoothness of F(X) = e θx to bound the derivative Mackey (UC Berkeley) Stein s Method BEARS 2012 12 / 20

Exponential Tail Inequalities Mean Value Trace Inequality Lemma (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Suppose that g : R R is a weakly increasing function and that h : R R is a function whose derivative h is convex. For all matrices A,B H d, it holds that tr[(g(a) g(b)) (h(a) h(b))] 1 2 tr[(g(a) g(b)) (A B) (h (A)+h (B))]. Standard matrix functions: If g : R R, then g(λ 1 ) 1 g(a) := Q... Q when A := Q λ... g(λd) λ d Q Inequality does not hold without the trace For exponential concentration we let g(a) = A and h(b) = e θb Mackey (UC Berkeley) Stein s Method BEARS 2012 13 / 20

Exponential Tail Inequalities Exponential Concentration: Proof Sketch 3. Mean Value Trace Inequality Bound the derivative of the trace mgf m (θ) = 1 2α Etr[ (X X ) ( e θx e θx )] θ 4α Etr[ (X X ) 2 (e θx +e θx )] = θ Etr [ X e θx]. 4. Conditional Variance Bound: X cx +vi Yields differential inequality m (θ) cθ m (θ)+vθ m(θ). Solve to bound m(θ) and thereby bound P{λ max (X) t} Mackey (UC Berkeley) Stein s Method BEARS 2012 14 / 20

Polynomial Moment Inequalities Polynomial Moments for Random Matrices Theorem (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let p = 1 or p 1.5. Suppose that (X,X ) is a matrix Stein pair where E X 2p 2p <. Then ( E X 1/2p 2p) 2p 1 (E X p 1/2p p). Moral: The conditional variance controls the moments of X Generalizes Chatterjee s version (2007) of the scalar Burkholder-Davis-Gundy inequality (Burkholder, 1973) See also Pisier & Xu (1997); Junge & Xu (2003, 2008) Proof techniques mirror those for exponential concentration Also holds for infinite dimensional Schatten-class operators Mackey (UC Berkeley) Stein s Method BEARS 2012 15 / 20

Polynomial Moment Inequalities Application: Matrix Khintchine Inequality Corollary (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (ε k ) k 1 be an independent sequence of Rademacher random variables and (A k ) k 1 be a deterministic sequence of Hermitian matrices. Then if p = 1 or p 1.5, ( E k ) εka 2p 1/2p k ( 1/2 2p 1 k) k A2. 2p 2p Noncommutative Khintchine inequality (Lust-Piquard, 1986; Lust-Piquard and Pisier, 1991) is a dominant tool in applied matrix analysis e.g., Used in analysis of column sampling and projection for approximate SVD (Rudelson and Vershynin, 2007) Stein s method offers an unusually concise proof The constant 2p 1 is within e of optimal Mackey (UC Berkeley) Stein s Method BEARS 2012 16 / 20

Extensions Extensions Refined Exponential Concentration Relate trace mgf of conditional variance to trace mgf of X Yields matrix generalization of classical Bernstein inequality Offers tool for unbounded random matrices General Complex Matrices Map any matrix B C d 1 d 2 [ to a Hermitian ] matrix via dilation 0 B D(B) := B H d 1+d 2. 0 Preserves spectral information: λ max (D(B)) = B Dependent Sequences Sums of conditionally zero-mean random matrices Combinatorial matrix statistics (e.g., sampling w/o replacement) Matrix-valued functions satisfying a self-reproducing property Yields a dependent bounded differences inequality for matrices Mackey (UC Berkeley) Stein s Method BEARS 2012 17 / 20

The End Extensions Thanks! Mackey (UC Berkeley) Stein s Method BEARS 2012 18 / 20

References I Extensions Ahlswede, R. and Winter, A. Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory, 48(3): 569 579, Mar. 2002. Burkholder, D. L. Distribution function inequalities for martingales. Ann. Probab., 1:19 42, 1973. doi: 10.1214/aop/1176997023. Chatterjee, S. Stein s method for concentration inequalities. Probab. Theory Related Fields, 138:305 321, 2007. Cheung, S.-S., So, A. Man-Cho, and Wang, K. Chance-constrained linear matrix inequalities with dependent perturbations: A safe tractable approximation approach. Available at http://www.se.cuhk.edu.hk/~manchoso/papers/cclmi_sta.pdf, 2011. Christofides, D. and Markström, K. Expansion properties of random cayley graphs and vertex transitive graphs via matrix martingales. Random Struct. Algorithms, 32(1):88 100, 2008. Drineas, P., Mahoney, M. W., and Muthukrishnan, S. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30:844 881, 2008. Gross, D. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory, 57(3):1548 1566, Mar. 2011. Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13 30, 1963. Hsu, D., Kakade, S. M., and Zhang, T. Dimension-free tail inequalities for sums of random matrices. Available at arxiv:1104.1672, 2011a. Hsu, D., Kakade, S. M., and Zhang, T. Dimension-free tail inequalities for sums of random matrices. arxiv:1104.1672v3[math.pr], 2011b. Junge, M. and Xu, Q. Noncommutative Burkholder/Rosenthal inequalities. Ann. Probab., 31(2):948 995, 2003. Junge, M. and Xu, Q. Noncommutative Burkholder/Rosenthal inequalities II: Applications. Israel J. Math., 167:227 282, 2008. Lust-Piquard, F. Inégalités de Khintchine dans C p (1 < p < ). C. R. Math. Acad. Sci. Paris, 303(7):289 292, 1986. Lust-Piquard, F. and Pisier, G. Noncommutative Khintchine and Paley inequalities. Ark. Mat., 29(2):241 260, 1991. Mackey, L., Talwalkar, A., and Jordan, M. I. Divide-and-conquer matrix factorization. In Advances in Neural Information Processing Systems 24. 2011. Mackey (UC Berkeley) Stein s Method BEARS 2012 19 / 20

References II Extensions Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B., and Tropp, J. A. Matrix concentration inequalities via the method of exchangeable pairs. Available at arxiv, Jan. 2012. Nemirovski, A. Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program., 109:283 317, January 2007. ISSN 0025-5610. doi: 10.1007/s10107-006-0033-0. URL http://dl.acm.org/citation.cfm?id=1229716.1229726. Oliveira, R. I. Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at arxiv:0911.0600, Nov. 2009. Pisier, G. and Xu, Q. Non-commutative martingale inequalities. Comm. Math. Phys., 189(3):667 698, 1997. Recht, B. A simpler approach to matrix completion. arxiv:0910.0651v2[cs.it], 2009. Rudelson, M. and Vershynin, R. Sampling from large matrices: An approach through geometric functional analysis. J. Assoc. Comput. Mach., 54(4):Article 21, 19 pp., Jul. 2007. (electronic). So, A. Man-Cho. Moment inequalities for sums of random matrices and their applications in optimization. Math. Program., 130 (1):125 151, 2011. Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symp. Math. Statist. Probab., Berkeley, 1972. Univ. California Press. Tropp, J. A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math., August 2011. Mackey (UC Berkeley) Stein s Method BEARS 2012 20 / 20