Stein s Method for Matrix Concentration

Size: px

Start display at page:

Download "Stein s Method for Matrix Concentration"

Gabriel Harper
5 years ago
Views:

1 Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology BEARS 2012 Mackey (UC Berkeley) Stein s Method BEARS / 20

2 Motivation Concentration Inequalities Matrix concentration P{ X EX t} δ P{λ max (X EX) t} δ Non-asymptotic control of random matrices with complex distributions Applications Matrix estimation from sparse random measurements (Gross, 2011; Recht, 2009; Mackey, Talwalkar, and Jordan, 2011) Randomized matrix multiplication and factorization (Drineas, Mahoney, and Muthukrishnan, 2008; Hsu, Kakade, and Zhang, 2011b) Convex relaxation of robust or chance-constrained optimization (Nemirovski, 2007; So, 2011; Cheung, So, and Wang, 2011) Random graph analysis (Christofides and Markström, 2008; Oliveira, 2009) Mackey (UC Berkeley) Stein s Method BEARS / 20

3 Motivation Concentration Inequalities Matrix concentration P{λ max (X EX) t} δ Difficulty: Matrix multiplication is not commutative Past approaches (Oliveira, 2009; Tropp, 2011; Hsu, Kakade, and Zhang, 2011a) Deep results from matrix analysis Sums of independent matrices and matrix martingales This work Stein s method of exchangeable pairs (1972), as advanced by Chatterjee (2007) for scalar concentration Improved exponential tail inequalities (Hoeffding, Bernstein) Polynomial moment inequalities (Khintchine, Rosenthal) Dependent sums and more general matrix functionals Mackey (UC Berkeley) Stein s Method BEARS / 20

4 Roadmap Motivation 1 Motivation 2 Stein s Method Background and Notation 3 Exponential Tail Inequalities 4 Polynomial Moment Inequalities 5 Extensions Mackey (UC Berkeley) Stein s Method BEARS / 20

5 Notation Background Hermitian matrices: H d = {A C d d : A = A } All matrices in this talk are Hermitian. Maximum eigenvalue: λ max ( ) Trace: trb, the sum of the diagonal entries of B Spectral norm: B, the maximum singular value of B Schatten p-norm: B p := ( tr B p) 1/p for p 1 Mackey (UC Berkeley) Stein s Method BEARS / 20

6 Matrix Stein Pair Background Definition (Exchangeable Pair) (Z,Z ) is an exchangeable pair if (Z,Z ) d = (Z,Z). Definition (Matrix Stein Pair) Let (Z,Z ) be an auxiliary exchangeable pair, and let Ψ : Z H d be a measurable function. Define the random matrices X := Ψ(Z) and X := Ψ(Z ). (X,X ) is a matrix Stein pair with scale factor α (0,1] if E[X Z] = (1 α)x. Matrix Stein pairs are exchangeable pairs Matrix Stein pairs always have zero mean Mackey (UC Berkeley) Stein s Method BEARS / 20

7 Background The Conditional Variance Definition (Conditional Variance) Suppose that (X,X ) is a matrix Stein pair with scale factor α, constructed from the exchangeable pair (Z,Z ). The conditional variance is the random matrix X := X (Z) := 1 2α E[ (X X ) 2 Z ]. X is a stochastic estimate for the variance, EX 2 Control over X yields control over λ max (X) Mackey (UC Berkeley) Stein s Method BEARS / 20

8 Exponential Tail Inequalities Exponential Concentration for Random Matrices Theorem (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (X,X ) be a matrix Stein pair with X H d. Suppose that X cx +vi almost surely for c,v 0. Then, for all t 0, { } t 2 P{λ max (X) t} d exp. 2v +2ct Control over the conditional variance X yields Gaussian tail for λ max (X) for small t, Poisson tail for large t When d = 1, reduces to scalar result of Chatterjee (2007) The dimensional factor d cannot be removed Mackey (UC Berkeley) Stein s Method BEARS / 20

9 Exponential Tail Inequalities Application: Matrix Hoeffding Inequality Corollary (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (Y k ) k 1 be independent matrices in H d satisfying EY k = 0 and Yk 2 A 2 k for deterministic matrices (A k ) k 1. Define the variance parameter σ 2 := 1 ( ) A 2 2 k +EYk 2. k Then, for all t 0, ( P {λ max k Y k ) } t d e t2 /2σ 2. Improves upon the matrix Hoeffding inequality of Tropp (2011) Optimal constant 1/2 in the exponent Variance parameter σ 2 smaller than the bound k A2 k Tighter than classical Hoeffding inequality (1963) when d = 1 Mackey (UC Berkeley) Stein s Method BEARS / 20

10 Exponential Tail Inequalities Exponential Concentration: Proof Sketch 1. Matrix Laplace transform method (Ahlswede & Winter, 2002) Relate tail probability to the trace of the mgf of X where m(θ) := Etre θx How to bound the trace mgf? P{λ max (X) t} inf θ>0 e θt m(θ) Past approaches: Golden-Thompson, Lieb s concavity theorem Chatterjee s strategy for scalar concentration Control mgf growth by bounding derivative m (θ) = EtrXe θx for θ R. Rewrite using exchangeable pairs Mackey (UC Berkeley) Stein s Method BEARS / 20

11 Exponential Tail Inequalities Method of Exchangeable Pairs Lemma Suppose that (X,X ) is a matrix Stein pair with scale factor α. Let F : H d H d be a measurable function satisfying Then Intuition E (X X )F(X) <. E[X F(X)] = 1 2α E[(X X )(F(X) F(X ))]. (1) Can characterize the distribution of a random matrix by integrating it against a class of test functions F Eq. 1 allows us to estimate this integral using the smoothness properties of F and the discrepancy X X Mackey (UC Berkeley) Stein s Method BEARS / 20

12 Exponential Tail Inequalities Exponential Concentration: Proof Sketch 2. Method of Exchangeable Pairs Rewrite the derivative of the trace mgf m (θ) = EtrXe θx = 1 2α Etr[ (X X ) ( e θx e θx )]. Goal: Use the smoothness of F(X) = e θx to bound the derivative Mackey (UC Berkeley) Stein s Method BEARS / 20

13 Exponential Tail Inequalities Mean Value Trace Inequality Lemma (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Suppose that g : R R is a weakly increasing function and that h : R R is a function whose derivative h is convex. For all matrices A,B H d, it holds that tr[(g(a) g(b)) (h(a) h(b))] 1 2 tr[(g(a) g(b)) (A B) (h (A)+h (B))]. Standard matrix functions: If g : R R, then g(λ 1 ) 1 g(a) := Q... Q when A := Q λ... g(λd) λ d Q Inequality does not hold without the trace For exponential concentration we let g(a) = A and h(b) = e θb Mackey (UC Berkeley) Stein s Method BEARS / 20

14 Exponential Tail Inequalities Exponential Concentration: Proof Sketch 3. Mean Value Trace Inequality Bound the derivative of the trace mgf m (θ) = 1 2α Etr[ (X X ) ( e θx e θx )] θ 4α Etr[ (X X ) 2 (e θx +e θx )] = θ Etr [ X e θx]. 4. Conditional Variance Bound: X cx +vi Yields differential inequality m (θ) cθ m (θ)+vθ m(θ). Solve to bound m(θ) and thereby bound P{λ max (X) t} Mackey (UC Berkeley) Stein s Method BEARS / 20

15 Polynomial Moment Inequalities Polynomial Moments for Random Matrices Theorem (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let p = 1 or p 1.5. Suppose that (X,X ) is a matrix Stein pair where E X 2p 2p <. Then ( E X 1/2p 2p) 2p 1 (E X p 1/2p p). Moral: The conditional variance controls the moments of X Generalizes Chatterjee s version (2007) of the scalar Burkholder-Davis-Gundy inequality (Burkholder, 1973) See also Pisier & Xu (1997); Junge & Xu (2003, 2008) Proof techniques mirror those for exponential concentration Also holds for infinite dimensional Schatten-class operators Mackey (UC Berkeley) Stein s Method BEARS / 20

16 Polynomial Moment Inequalities Application: Matrix Khintchine Inequality Corollary (Mackey, Jordan, Chen, Farrell, and Tropp, 2012) Let (ε k ) k 1 be an independent sequence of Rademacher random variables and (A k ) k 1 be a deterministic sequence of Hermitian matrices. Then if p = 1 or p 1.5, ( E k ) εka 2p 1/2p k ( 1/2 2p 1 k) k A2. 2p 2p Noncommutative Khintchine inequality (Lust-Piquard, 1986; Lust-Piquard and Pisier, 1991) is a dominant tool in applied matrix analysis e.g., Used in analysis of column sampling and projection for approximate SVD (Rudelson and Vershynin, 2007) Stein s method offers an unusually concise proof The constant 2p 1 is within e of optimal Mackey (UC Berkeley) Stein s Method BEARS / 20

17 Extensions Extensions Refined Exponential Concentration Relate trace mgf of conditional variance to trace mgf of X Yields matrix generalization of classical Bernstein inequality Offers tool for unbounded random matrices General Complex Matrices Map any matrix B C d 1 d 2 [ to a Hermitian ] matrix via dilation 0 B D(B) := B H d 1+d 2. 0 Preserves spectral information: λ max (D(B)) = B Dependent Sequences Sums of conditionally zero-mean random matrices Combinatorial matrix statistics (e.g., sampling w/o replacement) Matrix-valued functions satisfying a self-reproducing property Yields a dependent bounded differences inequality for matrices Mackey (UC Berkeley) Stein s Method BEARS / 20

18 The End Extensions Thanks! Mackey (UC Berkeley) Stein s Method BEARS / 20

19 References I Extensions Ahlswede, R. and Winter, A. Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory, 48(3): , Mar Burkholder, D. L. Distribution function inequalities for martingales. Ann. Probab., 1:19 42, doi: /aop/ Chatterjee, S. Stein s method for concentration inequalities. Probab. Theory Related Fields, 138: , Cheung, S.-S., So, A. Man-Cho, and Wang, K. Chance-constrained linear matrix inequalities with dependent perturbations: A safe tractable approximation approach. Available at Christofides, D. and Markström, K. Expansion properties of random cayley graphs and vertex transitive graphs via matrix martingales. Random Struct. Algorithms, 32(1):88 100, Drineas, P., Mahoney, M. W., and Muthukrishnan, S. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30: , Gross, D. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory, 57(3): , Mar Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13 30, Hsu, D., Kakade, S. M., and Zhang, T. Dimension-free tail inequalities for sums of random matrices. Available at arxiv: , 2011a. Hsu, D., Kakade, S. M., and Zhang, T. Dimension-free tail inequalities for sums of random matrices. arxiv: v3[math.pr], 2011b. Junge, M. and Xu, Q. Noncommutative Burkholder/Rosenthal inequalities. Ann. Probab., 31(2): , Junge, M. and Xu, Q. Noncommutative Burkholder/Rosenthal inequalities II: Applications. Israel J. Math., 167: , Lust-Piquard, F. Inégalités de Khintchine dans C p (1 < p < ). C. R. Math. Acad. Sci. Paris, 303(7): , Lust-Piquard, F. and Pisier, G. Noncommutative Khintchine and Paley inequalities. Ark. Mat., 29(2): , Mackey, L., Talwalkar, A., and Jordan, M. I. Divide-and-conquer matrix factorization. In Advances in Neural Information Processing Systems Mackey (UC Berkeley) Stein s Method BEARS / 20

20 References II Extensions Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B., and Tropp, J. A. Matrix concentration inequalities via the method of exchangeable pairs. Available at arxiv, Jan Nemirovski, A. Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program., 109: , January ISSN doi: /s URL Oliveira, R. I. Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at arxiv: , Nov Pisier, G. and Xu, Q. Non-commutative martingale inequalities. Comm. Math. Phys., 189(3): , Recht, B. A simpler approach to matrix completion. arxiv: v2[cs.it], Rudelson, M. and Vershynin, R. Sampling from large matrices: An approach through geometric functional analysis. J. Assoc. Comput. Mach., 54(4):Article 21, 19 pp., Jul (electronic). So, A. Man-Cho. Moment inequalities for sums of random matrices and their applications in optimization. Math. Program., 130 (1): , Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symp. Math. Statist. Probab., Berkeley, Univ. California Press. Tropp, J. A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math., August Mackey (UC Berkeley) Stein s Method BEARS / 20

Matrix Completion and Matrix Concentration

Matrix Completion and Matrix Concentration Lester Mackey Collaborators: Ameet Talwalkar, Michael I. Jordan, Richard Y. Chen, Brendan Farrell, Joel A. Tropp, and Daniel Paulin Stanford University UCLA UC