Matrix concentration inequalities

Size: px
Start display at page:

Download "Matrix concentration inequalities"

Transcription

1 ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018

2 Recap: matrix Bernstein inequality Consider a sequence of independent random matrices { X l R d } 1 d 2 E[X l ] = 0 X l B for each l variance statistic: { E [ ] ] } v := max X l lxl E, [ l X l X l Theorem 3.1 (Matrix Bernstein inequality) For all τ 0, { P X l l } τ (d 1 + d 2 ) exp ( τ 2 /2 ) v + Bτ/3 Matrix concentration 3-2

3 Recap: matrix Bernstein inequality exponential tail exponential tail Gaussian tail Matrix concentration 3-3

4 This lecture: detailed introduction of matrix Bernstein An introduction to matrix concentration inequalities Joel Tropp 15

5 Outline Matrix theory background Matrix Laplace transform method Matrix Bernstein inequality Application: random features Matrix concentration 3-5

6 Matrix theory background

7 Matrix function Suppose the eigendecomposition of a symmetric matrix A R d d is λ 1 A = U... U λ d Then we can define f(a) := U f(λ 1 )... f(λd) U Matrix concentration 3-7

8 Examples of matrix functions Let f(a) = c 0 + k=1 c k a k, then f(a) := c 0 I + c k A k k=1 matrix exponential: e A := I + k=1 1 k! Ak (why?) monotonicity: if A H, then tr e A tr e H matrix logarithm: log(e A ) := A monotonicity: if 0 A H, then log A log(h) Matrix concentration 3-8

9 Matrix moments and cumulants Let X be a random symmetric matrix. Then matrix moment generating function (MGF): M X (θ) := E[e θx ] matrix cumulant generating function (CGF): Ξ X (θ) := log E[e θx ] Matrix concentration 3-9

10 Matrix Laplace transform method

11 Matrix Laplace transform A key step for a scalar random variable Y : by Markov s inequality, P {Y t} inf θ>0 e θt E [ e θy ] This can be generalized to the matrix case Matrix concentration 3-11

12 Matrix Laplace transform Lemma 3.2 Let Y be a random symmetric matrix. For all t R, P {λ max (Y ) t} inf θ>0 e θt E [ tr e θy ] can control extreme eigenvalues of Y via the trace of the matrix MGF Matrix concentration 3-12

13 Proof of Lemma 3.2 For any θ > 0, P {λ max (Y ) t} = P {e θλmax(y ) e θt} E[eθλmax(Y ) ] e θt = E[eλmax(θY ) ] e θt (Markov s inequality) = E[λ max(e θy )] e θt (e λmax(z) = λ max (e Z )) E[tr eθy ] e θt This completes the proof since it holds for any θ > 0 Matrix concentration 3-13

14 Issues of the matrix MGF The Laplace transform method is effective for controlling an independent sum when MGF decomposes in the scalar case where X = X X n with independent {X l }: M X (θ) = E[e θx 1+ +θx n ] = E[e θx 1 ] E[e θxn ] = Issues in the matrix settings: e X 1+X 2 e X 1 e X 2 unless X 1 and X 2 commute n M Xl (θ) l=1 }{{} look at each X l separately tr e X 1+ +X n tr e X 1 e X1 e Xn Matrix concentration 3-14

15 Subadditivity of the matrix CGF Fortunately, the matrix CGF satisfies certain subadditivity rules, allowing us to decompose independent matrix components Lemma 3.3 Consider a finite sequence {X l } 1 l n of independent random symmetric matrices. Then for any θ R, [ ] E tr e θ l X l }{{} tr exp ( Ξ Σl X l (θ) ) ( tr exp l log E[ e θx ]) l }{{} ( ) tr exp l Ξ X l (θ) this is deep result based on Lieb s Theorem! Matrix concentration 3-15

16 Lieb s Theorem Theorem 3.4 (Lieb 73) Fix a symmetric matrix H. Then A tr exp(h + log A) Elliott Lieb is concave on positive-semidefinite cone Lieb s Theorem immediately implies (exercise (Jensen s inequality)) E [ tr exp(h + X) ] tr exp ( H + log E [ e X]) (3.1) Matrix concentration 3-16

17 Proof of Lemma 3.3 E [ tr e θ l X ] l = E [ tr exp ( θ n 1 X )] l=1 l + θx n [ E tr exp (θ n 1 X l=1 l + log E [ e θxn])] (by (3.1)) [ E tr exp (θ n 2 X l=1 l + log E [ e θx ] n 1 + log E [ e θxn])] ( n tr exp log l=1 E[ e θx ]) l Matrix concentration 3-17

18 Master bounds Combining the Laplace transform method with the subadditivity of CGF yields: Theorem 3.5 (Master bounds for sum of independent matrices) Consider a finite sequence {X l } of independent random symmetric matrices. Then ( P {λ max X ) } tr exp ( l l t l inf log E[eθX l] ) θ>0 e θt this is a general result underlying the proofs of the matrix Bernstein inequality and beyond (e.g. matrix Chernoff) Matrix concentration 3-18

19 Matrix Bernstein inequality

20 Matrix CGF ( P {λ max X ) } tr exp ( l l t l inf log E[eθX l] ) θ>0 e θt To invoke the master bound, one needs tocontrol the matrix CGF }{{} main step for proving matrix Bernstein Matrix concentration 3-20

21 Symmetric case Consider a sequence of independent random symmetric matrices { Xl R d d} E[X l ] = 0 λ max (X l ) B for each l variance statistic: v := E [ ] l X2 l Theorem 3.6 (Matrix Bernstein inequality: symmetric case) For all τ 0, ( P {λ max l X l ) ( } τ 2 ) /2 τ d exp v + Bτ/3 Matrix concentration 3-21

22 Bounding matrix CGF For bounded random matrices, one can control the matrix CGF as follows: Lemma 3.7 Suppose E[X] = 0 and λ max (X) B. Then for 0 < θ < 3/B, log E [ e θx] θ2 /2 1 θb/3 E[X2 ] Matrix concentration 3-22

23 Proof of Theorem 3.6 Let g(θ) := θ2 /2 1 θb/3, then it follows from the master bound that ( P {λ max X ) } tr exp ( n i t inf i=1 log E[e θx i ] ) i θ>0 Lemma 3.7 inf 0<θ<3/B e θt tr exp ( g(θ) n i=1 E[Xi 2]) e θt d exp ( g(θ)v ) inf 0<θ<3/B e θt t v+bt/3 Taking θ = matrix Bernstein and simplifying the above expression, we establish Matrix concentration 3-23

24 Proof of Lemma 3.7 Define f(x) = eθx 1 θx x 2, then for any X with λ max (X) B: e θx = I + θx + ( e θx I θx ) = I + θx + X f(x) X I + θx + f(b) X 2 In addition, we note an elementary inequality: for any 0 < θ < 3/B, f(b) = eθb 1 θb B 2 = 1 B 2 (θb) k k=2 k! θ2 2 k=2 (θb) k 2 3 k 2 = θ2 /2 1 θb/3 = e θx I + θx + θ2 /2 1 θb/3 X2 Since X is zero-mean, one further has E [ e θx] ( I + θ2 /2 θ 2 ) /2 1 θb/3 E[X2 ] exp 1 θb/3 E[X2 ] Matrix concentration 3-24

25 Application: random features

26 Kernel trick A modern idea in machine learning: replace inner product by a kernel evaluation (i.e. certain similarity measure) Advantage: work beyond the Euclidean domain via task-specific similarity measures Matrix concentration 3-26

27 Similarity measure Define the similarity measure Φ Φ(x, x) = 1 Φ(x, y) 1 Φ(x, y) = Φ(y, x) Example: angular similarity Φ(x, y) = 2 π x, y 2 (x, y) arcsin = 1 x 2 y 2 π Matrix concentration 3-27

28 Kernel matrix Consider N data points x 1,, x N R d. Then the kernel matrix G R N N is G i,j = Φ(x i, x j ) 1 i, j N Kernel Φ is said to be positive definite if G 0 for any {x i } Challenge: kernel matrices are usually large cost of constructing G is O(dN 2 ) Question: can we approximate G more efficiently? Matrix concentration 3-28

29 Random features Introduce a random variable w and a feature map ψ such that Φ(x, y) = E w [ψ(x; w) ψ(y; w) ] }{{} decouple x and y example (angular similarity) Φ(x, y) = 1 2 (x, y) π with w uniformly drawn from unit sphere = E w [sgn x, w sgn y, w ] (3.2) Matrix concentration 3-29

30 Random features Introduce a random variable w and a feature map ψ such that Φ(x, y) = E w [ψ(x; w) ψ(y; w) ] }{{} decouple x and y this results in a random feature vector z 1 ψ(x 1 ; w) z =. = z N. ψ(x N ; w) zz }{{} is an unbiased estimate of G, i.e. G = E[zz ] rank 1 Matrix concentration 3-29

31 Example Angular similarity: 2 (x, y) Φ(x, y) = 1 π = E w [sign x, w sign y, w ] where w is uniformly drawn from unit sphere As a result, the random feature map is ψ(x, w) = sign x, w Matrix concentration 3-30

32 Random feature approximation Generate n independent copies of R = zz, i.e. {R l } 1 l n Estimator of kernel matrix G: Ĝ = 1 n n R l l=1 Question: how many random features are needed to guarantee accurate estimation? Matrix concentration 3-31

33 Statistical guarantees for random feature approximation Consider the angular similarity example (3.2): To begin with, E[Rl 2 ] = E[zz zz ] = NE[zz ] = NG = v = 1 n n 2 l=1 E[R2 l ] = N n G Next, 1 n R = 1 n z 2 2 = N n := B Applying the matrix Bernstein inequality yields: with high prob. Ĝ G N v log N + B log N n G log N + N n log N N n G log N (for sufficiently large n) }{{} 1 Matrix concentration 3-32

34 Sample complexity Define the intrinsic dimension of G as intdim(g) = trg G = If n ε 2 intdim(g) log N, then we have N G Ĝ G G ε Matrix concentration 3-33

35 Reference [1] An introduction to matrix concentration inequalities, J. Tropp, Foundations and Trends in Machine Learning, [2] Convex trace functions and the Wigner-Yanase-Dyson conjecture, E. Lieb, Advances in Mathematics, [3] User-friendly tail bounds for sums of random matrices, J. Tropp, Foundations of computational mathematics, [4] Random features for large-scale kernel machines, A. Rahimi, B. Recht, NIPS, Matrix concentration 3-34

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators

On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators Stanislav Minsker e-mail: sminsker@math.duke.edu Abstract: We present some extensions of Bernstein s inequality for random self-adjoint

More information

arxiv: v1 [math.pr] 11 Feb 2019

arxiv: v1 [math.pr] 11 Feb 2019 A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke

More information

Technical Report No January 2011

Technical Report No January 2011 USER-FRIENDLY TAIL BOUNDS FOR MATRIX MARTINGALES JOEL A. TRO! Technical Report No. 2011-01 January 2011 A L I E D & C O M U T A T I O N A L M A T H E M A T I C S C A L I F O R N I A I N S T I T U T E O

More information

User-Friendly Tail Bounds for Sums of Random Matrices

User-Friendly Tail Bounds for Sums of Random Matrices DOI 10.1007/s10208-011-9099-z User-Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Received: 16 January 2011 / Accepted: 13 June 2011 The Authors 2011. This article is published with open

More information

Stein s Method for Matrix Concentration

Stein s Method for Matrix Concentration Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology

More information

arxiv: v7 [math.pr] 15 Jun 2011

arxiv: v7 [math.pr] 15 Jun 2011 USER-FRIENDLY TAIL BOUNDS FOR SUMS OF RANDOM MATRICES JOEL A. TRO arxiv:1004.4389v7 [math.r] 15 Jun 2011 Abstract. This paper presents new probability inequalities for sums of independent, random, selfadjoint

More information

Matrix Concentration. Nick Harvey University of British Columbia

Matrix Concentration. Nick Harvey University of British Columbia Matrix Concentration Nick Harvey University of British Columbia The Problem Given any random nxn, symmetric matrices Y 1,,Y k. Show that i Y i is probably close to E[ i Y i ]. Why? A matrix generalization

More information

arxiv: v6 [math.pr] 16 Jan 2011

arxiv: v6 [math.pr] 16 Jan 2011 USER-FRIENDLY TAIL BOUNDS FOR SUMS OF RANDOM MATRICES JOEL A. TRO arxiv:1004.4389v6 [math.r] 16 Jan 2011 Abstract. This paper presents new probability inequalities for sums of independent, random, selfadjoint

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Bounds for all eigenvalues of sums of Hermitian random matrices

Bounds for all eigenvalues of sums of Hermitian random matrices 20 Chapter 2 Bounds for all eigenvalues of sums of Hermitian random matrices 2.1 Introduction The classical tools of nonasymptotic random matrix theory can sometimes give quite sharp estimates of the extreme

More information

On the Spectra of General Random Graphs

On the Spectra of General Random Graphs On the Spectra of General Random Graphs Fan Chung Mary Radcliffe University of California, San Diego La Jolla, CA 92093 Abstract We consider random graphs such that each edge is determined by an independent

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS. Daniel Paulin Lester Mackey Joel A. Tropp

DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS. Daniel Paulin Lester Mackey Joel A. Tropp DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS By Daniel Paulin Lester Mackey Joel A. Tropp Technical Report No. 014-10 August 014 Department of Statistics STANFORD UNIVERSITY Stanford,

More information

Introduction to Self-normalized Limit Theory

Introduction to Self-normalized Limit Theory Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized

More information

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang. Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)

More information

Concentration inequalities and the entropy method

Concentration inequalities and the entropy method Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many

More information

Using Friendly Tail Bounds for Sums of Random Matrices

Using Friendly Tail Bounds for Sums of Random Matrices Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1

MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1 The Annals of Probability 014, Vol. 4, No. 3, 906 945 DOI: 10.114/13-AOP89 Institute of Mathematical Statistics, 014 MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1 BY LESTER MACKEY,MICHAEL

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

Efron Stein Inequalities for Random Matrices

Efron Stein Inequalities for Random Matrices Efron Stein Inequalities for Random Matrices Daniel Paulin, Lester Mackey and Joel A. Tropp D. Paulin Department of Statistics and Applied Probability National University of Singapore 6 Science Drive,

More information

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER

A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Lecture 2: Convex functions

Lecture 2: Convex functions Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on

More information

Lecture 7 Monotonicity. September 21, 2008

Lecture 7 Monotonicity. September 21, 2008 Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Concentration inequalities and tail bounds

Concentration inequalities and tail bounds Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples

More information

Sparsification by Effective Resistance Sampling

Sparsification by Effective Resistance Sampling Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

EFRON STEIN INEQUALITIES FOR RANDOM MATRICES

EFRON STEIN INEQUALITIES FOR RANDOM MATRICES The Annals of Probability 016, Vol. 44, No. 5, 3431 3473 DOI: 10.114/15-AOP1054 Institute of Mathematical Statistics, 016 EFRON STEIN INEQUALITIES FOR RANDOM MATRICES BY DANIEL PAULIN,1,LESTER MACKEY AND

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

MATH 217A HOMEWORK. P (A i A j ). First, the basis case. We make a union disjoint as follows: P (A B) = P (A) + P (A c B)

MATH 217A HOMEWORK. P (A i A j ). First, the basis case. We make a union disjoint as follows: P (A B) = P (A) + P (A c B) MATH 217A HOMEWOK EIN PEASE 1. (Chap. 1, Problem 2. (a Let (, Σ, P be a probability space and {A i, 1 i n} Σ, n 2. Prove that P A i n P (A i P (A i A j + P (A i A j A k... + ( 1 n 1 P A i n P (A i P (A

More information

Shunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences

Shunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences Superadditivity of Fisher Information: Classical vs. Quantum Shunlong Luo Academy of Mathematics and Systems Science Chinese Academy of Sciences luosl@amt.ac.cn Information Geometry and its Applications

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Chapter 7. Basic Probability Theory

Chapter 7. Basic Probability Theory Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries

More information

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

sample lectures: Compressed Sensing, Random Sampling I & II

sample lectures: Compressed Sensing, Random Sampling I & II P. Jung, CommIT, TU-Berlin]: sample lectures Compressed Sensing / Random Sampling I & II 1/68 sample lectures: Compressed Sensing, Random Sampling I & II Peter Jung Communications and Information Theory

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Intrinsic Volumes of Convex Cones Theory and Applications

Intrinsic Volumes of Convex Cones Theory and Applications Intrinsic Volumes of Convex Cones Theory and Applications M\cr NA Manchester Numerical Analysis Martin Lotz School of Mathematics The University of Manchester with the collaboration of Dennis Amelunxen,

More information

Lecture Semidefinite Programming and Graph Partitioning

Lecture Semidefinite Programming and Graph Partitioning Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

A Matrix Expander Chernoff Bound

A Matrix Expander Chernoff Bound A Matrix Expander Chernoff Bound Ankit Garg, Yin Tat Lee, Zhao Song, Nikhil Srivastava UC Berkeley Vanilla Chernoff Bound Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

arxiv: v2 [math.pr] 27 Mar 2014

arxiv: v2 [math.pr] 27 Mar 2014 The Annals of Probability 2014, Vol. 42, No. 3, 906 945 DOI: 10.1214/13-AOP892 c Institute of Mathematical Statistics, 2014 arxiv:1201.6002v2 [math.pr] 27 Mar 2014 MATRIX CONCENTRATION INEQUALITIES VIA

More information

Approximately Gaussian marginals and the hyperplane conjecture

Approximately Gaussian marginals and the hyperplane conjecture Approximately Gaussian marginals and the hyperplane conjecture Tel-Aviv University Conference on Asymptotic Geometric Analysis, Euler Institute, St. Petersburg, July 2010 Lecture based on a joint work

More information

18.S096: Concentration Inequalities, Scalar and Matrix Versions

18.S096: Concentration Inequalities, Scalar and Matrix Versions 18S096: Concentration Inequalities, Scalar and Matrix Versions Topics in Mathematics of Data Science Fall 015 Afonso S Bandeira bandeira@mitedu http://mathmitedu/~bandeira October 15, 015 These are lecture

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Kernel Learning with Bregman Matrix Divergences

Kernel Learning with Bregman Matrix Divergences Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

Stein s Method: Distributional Approximation and Concentration of Measure

Stein s Method: Distributional Approximation and Concentration of Measure Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Convex relaxations of chance constrained optimization problems

Convex relaxations of chance constrained optimization problems Convex relaxations of chance constrained optimization problems Shabbir Ahmed School of Industrial & Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta, GA 30332. May 12, 2011

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

High-dimensional distributions with convexity properties

High-dimensional distributions with convexity properties High-dimensional distributions with convexity properties Bo az Klartag Tel-Aviv University A conference in honor of Charles Fefferman, Princeton, May 2009 High-Dimensional Distributions We are concerned

More information

Stein Couplings for Concentration of Measure

Stein Couplings for Concentration of Measure Stein Couplings for Concentration of Measure Jay Bartroff, Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] [arxiv:1402.6769] Borchard

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information

From Steiner Formulas for Cones to Concentration of Intrinsic Volumes

From Steiner Formulas for Cones to Concentration of Intrinsic Volumes Discrete Comput Geom (2014 51:926 963 DOI 10.1007/s00454-014-9595-4 From Steiner Formulas for Cones to Concentration of Intrinsic Volumes Michael B. McCoy Joel A. Tropp Received: 23 August 2013 / Revised:

More information

Probability inequalities 11

Probability inequalities 11 Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations

Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Concentration inequalities: basics and some new challenges

Concentration inequalities: basics and some new challenges Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Geometry of log-concave Ensembles of random matrices

Geometry of log-concave Ensembles of random matrices Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann

More information

Review of Mathematical Concepts. Hongwei Zhang

Review of Mathematical Concepts. Hongwei Zhang Review of Mathematical Concepts Hongwei Zhang http://www.cs.wayne.edu/~hzhang Outline Limits of real number sequences A fixed-point theorem Probability and random processes Probability model Random variable

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

Lecture 1 Measure concentration

Lecture 1 Measure concentration CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Lecture 20: Linear model, the LSE, and UMVUE

Lecture 20: Linear model, the LSE, and UMVUE Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;

More information

Lecture 3. Random Fourier measurements

Lecture 3. Random Fourier measurements Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information