Matrix concentration inequalities
|
|
- Mary Hodges
- 5 years ago
- Views:
Transcription
1 ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018
2 Recap: matrix Bernstein inequality Consider a sequence of independent random matrices { X l R d } 1 d 2 E[X l ] = 0 X l B for each l variance statistic: { E [ ] ] } v := max X l lxl E, [ l X l X l Theorem 3.1 (Matrix Bernstein inequality) For all τ 0, { P X l l } τ (d 1 + d 2 ) exp ( τ 2 /2 ) v + Bτ/3 Matrix concentration 3-2
3 Recap: matrix Bernstein inequality exponential tail exponential tail Gaussian tail Matrix concentration 3-3
4 This lecture: detailed introduction of matrix Bernstein An introduction to matrix concentration inequalities Joel Tropp 15
5 Outline Matrix theory background Matrix Laplace transform method Matrix Bernstein inequality Application: random features Matrix concentration 3-5
6 Matrix theory background
7 Matrix function Suppose the eigendecomposition of a symmetric matrix A R d d is λ 1 A = U... U λ d Then we can define f(a) := U f(λ 1 )... f(λd) U Matrix concentration 3-7
8 Examples of matrix functions Let f(a) = c 0 + k=1 c k a k, then f(a) := c 0 I + c k A k k=1 matrix exponential: e A := I + k=1 1 k! Ak (why?) monotonicity: if A H, then tr e A tr e H matrix logarithm: log(e A ) := A monotonicity: if 0 A H, then log A log(h) Matrix concentration 3-8
9 Matrix moments and cumulants Let X be a random symmetric matrix. Then matrix moment generating function (MGF): M X (θ) := E[e θx ] matrix cumulant generating function (CGF): Ξ X (θ) := log E[e θx ] Matrix concentration 3-9
10 Matrix Laplace transform method
11 Matrix Laplace transform A key step for a scalar random variable Y : by Markov s inequality, P {Y t} inf θ>0 e θt E [ e θy ] This can be generalized to the matrix case Matrix concentration 3-11
12 Matrix Laplace transform Lemma 3.2 Let Y be a random symmetric matrix. For all t R, P {λ max (Y ) t} inf θ>0 e θt E [ tr e θy ] can control extreme eigenvalues of Y via the trace of the matrix MGF Matrix concentration 3-12
13 Proof of Lemma 3.2 For any θ > 0, P {λ max (Y ) t} = P {e θλmax(y ) e θt} E[eθλmax(Y ) ] e θt = E[eλmax(θY ) ] e θt (Markov s inequality) = E[λ max(e θy )] e θt (e λmax(z) = λ max (e Z )) E[tr eθy ] e θt This completes the proof since it holds for any θ > 0 Matrix concentration 3-13
14 Issues of the matrix MGF The Laplace transform method is effective for controlling an independent sum when MGF decomposes in the scalar case where X = X X n with independent {X l }: M X (θ) = E[e θx 1+ +θx n ] = E[e θx 1 ] E[e θxn ] = Issues in the matrix settings: e X 1+X 2 e X 1 e X 2 unless X 1 and X 2 commute n M Xl (θ) l=1 }{{} look at each X l separately tr e X 1+ +X n tr e X 1 e X1 e Xn Matrix concentration 3-14
15 Subadditivity of the matrix CGF Fortunately, the matrix CGF satisfies certain subadditivity rules, allowing us to decompose independent matrix components Lemma 3.3 Consider a finite sequence {X l } 1 l n of independent random symmetric matrices. Then for any θ R, [ ] E tr e θ l X l }{{} tr exp ( Ξ Σl X l (θ) ) ( tr exp l log E[ e θx ]) l }{{} ( ) tr exp l Ξ X l (θ) this is deep result based on Lieb s Theorem! Matrix concentration 3-15
16 Lieb s Theorem Theorem 3.4 (Lieb 73) Fix a symmetric matrix H. Then A tr exp(h + log A) Elliott Lieb is concave on positive-semidefinite cone Lieb s Theorem immediately implies (exercise (Jensen s inequality)) E [ tr exp(h + X) ] tr exp ( H + log E [ e X]) (3.1) Matrix concentration 3-16
17 Proof of Lemma 3.3 E [ tr e θ l X ] l = E [ tr exp ( θ n 1 X )] l=1 l + θx n [ E tr exp (θ n 1 X l=1 l + log E [ e θxn])] (by (3.1)) [ E tr exp (θ n 2 X l=1 l + log E [ e θx ] n 1 + log E [ e θxn])] ( n tr exp log l=1 E[ e θx ]) l Matrix concentration 3-17
18 Master bounds Combining the Laplace transform method with the subadditivity of CGF yields: Theorem 3.5 (Master bounds for sum of independent matrices) Consider a finite sequence {X l } of independent random symmetric matrices. Then ( P {λ max X ) } tr exp ( l l t l inf log E[eθX l] ) θ>0 e θt this is a general result underlying the proofs of the matrix Bernstein inequality and beyond (e.g. matrix Chernoff) Matrix concentration 3-18
19 Matrix Bernstein inequality
20 Matrix CGF ( P {λ max X ) } tr exp ( l l t l inf log E[eθX l] ) θ>0 e θt To invoke the master bound, one needs tocontrol the matrix CGF }{{} main step for proving matrix Bernstein Matrix concentration 3-20
21 Symmetric case Consider a sequence of independent random symmetric matrices { Xl R d d} E[X l ] = 0 λ max (X l ) B for each l variance statistic: v := E [ ] l X2 l Theorem 3.6 (Matrix Bernstein inequality: symmetric case) For all τ 0, ( P {λ max l X l ) ( } τ 2 ) /2 τ d exp v + Bτ/3 Matrix concentration 3-21
22 Bounding matrix CGF For bounded random matrices, one can control the matrix CGF as follows: Lemma 3.7 Suppose E[X] = 0 and λ max (X) B. Then for 0 < θ < 3/B, log E [ e θx] θ2 /2 1 θb/3 E[X2 ] Matrix concentration 3-22
23 Proof of Theorem 3.6 Let g(θ) := θ2 /2 1 θb/3, then it follows from the master bound that ( P {λ max X ) } tr exp ( n i t inf i=1 log E[e θx i ] ) i θ>0 Lemma 3.7 inf 0<θ<3/B e θt tr exp ( g(θ) n i=1 E[Xi 2]) e θt d exp ( g(θ)v ) inf 0<θ<3/B e θt t v+bt/3 Taking θ = matrix Bernstein and simplifying the above expression, we establish Matrix concentration 3-23
24 Proof of Lemma 3.7 Define f(x) = eθx 1 θx x 2, then for any X with λ max (X) B: e θx = I + θx + ( e θx I θx ) = I + θx + X f(x) X I + θx + f(b) X 2 In addition, we note an elementary inequality: for any 0 < θ < 3/B, f(b) = eθb 1 θb B 2 = 1 B 2 (θb) k k=2 k! θ2 2 k=2 (θb) k 2 3 k 2 = θ2 /2 1 θb/3 = e θx I + θx + θ2 /2 1 θb/3 X2 Since X is zero-mean, one further has E [ e θx] ( I + θ2 /2 θ 2 ) /2 1 θb/3 E[X2 ] exp 1 θb/3 E[X2 ] Matrix concentration 3-24
25 Application: random features
26 Kernel trick A modern idea in machine learning: replace inner product by a kernel evaluation (i.e. certain similarity measure) Advantage: work beyond the Euclidean domain via task-specific similarity measures Matrix concentration 3-26
27 Similarity measure Define the similarity measure Φ Φ(x, x) = 1 Φ(x, y) 1 Φ(x, y) = Φ(y, x) Example: angular similarity Φ(x, y) = 2 π x, y 2 (x, y) arcsin = 1 x 2 y 2 π Matrix concentration 3-27
28 Kernel matrix Consider N data points x 1,, x N R d. Then the kernel matrix G R N N is G i,j = Φ(x i, x j ) 1 i, j N Kernel Φ is said to be positive definite if G 0 for any {x i } Challenge: kernel matrices are usually large cost of constructing G is O(dN 2 ) Question: can we approximate G more efficiently? Matrix concentration 3-28
29 Random features Introduce a random variable w and a feature map ψ such that Φ(x, y) = E w [ψ(x; w) ψ(y; w) ] }{{} decouple x and y example (angular similarity) Φ(x, y) = 1 2 (x, y) π with w uniformly drawn from unit sphere = E w [sgn x, w sgn y, w ] (3.2) Matrix concentration 3-29
30 Random features Introduce a random variable w and a feature map ψ such that Φ(x, y) = E w [ψ(x; w) ψ(y; w) ] }{{} decouple x and y this results in a random feature vector z 1 ψ(x 1 ; w) z =. = z N. ψ(x N ; w) zz }{{} is an unbiased estimate of G, i.e. G = E[zz ] rank 1 Matrix concentration 3-29
31 Example Angular similarity: 2 (x, y) Φ(x, y) = 1 π = E w [sign x, w sign y, w ] where w is uniformly drawn from unit sphere As a result, the random feature map is ψ(x, w) = sign x, w Matrix concentration 3-30
32 Random feature approximation Generate n independent copies of R = zz, i.e. {R l } 1 l n Estimator of kernel matrix G: Ĝ = 1 n n R l l=1 Question: how many random features are needed to guarantee accurate estimation? Matrix concentration 3-31
33 Statistical guarantees for random feature approximation Consider the angular similarity example (3.2): To begin with, E[Rl 2 ] = E[zz zz ] = NE[zz ] = NG = v = 1 n n 2 l=1 E[R2 l ] = N n G Next, 1 n R = 1 n z 2 2 = N n := B Applying the matrix Bernstein inequality yields: with high prob. Ĝ G N v log N + B log N n G log N + N n log N N n G log N (for sufficiently large n) }{{} 1 Matrix concentration 3-32
34 Sample complexity Define the intrinsic dimension of G as intdim(g) = trg G = If n ε 2 intdim(g) log N, then we have N G Ĝ G G ε Matrix concentration 3-33
35 Reference [1] An introduction to matrix concentration inequalities, J. Tropp, Foundations and Trends in Machine Learning, [2] Convex trace functions and the Wigner-Yanase-Dyson conjecture, E. Lieb, Advances in Mathematics, [3] User-friendly tail bounds for sums of random matrices, J. Tropp, Foundations of computational mathematics, [4] Random features for large-scale kernel machines, A. Rahimi, B. Recht, NIPS, Matrix concentration 3-34
On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators
On Some Extensions of Bernstein s Inequality for Self-Adjoint Operators Stanislav Minsker e-mail: sminsker@math.duke.edu Abstract: We present some extensions of Bernstein s inequality for random self-adjoint
More informationarxiv: v1 [math.pr] 11 Feb 2019
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm arxiv:190.03736v1 math.pr] 11 Feb 019 Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Rong Ge Duke
More informationTechnical Report No January 2011
USER-FRIENDLY TAIL BOUNDS FOR MATRIX MARTINGALES JOEL A. TRO! Technical Report No. 2011-01 January 2011 A L I E D & C O M U T A T I O N A L M A T H E M A T I C S C A L I F O R N I A I N S T I T U T E O
More informationUser-Friendly Tail Bounds for Sums of Random Matrices
DOI 10.1007/s10208-011-9099-z User-Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Received: 16 January 2011 / Accepted: 13 June 2011 The Authors 2011. This article is published with open
More informationStein s Method for Matrix Concentration
Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology
More informationarxiv: v7 [math.pr] 15 Jun 2011
USER-FRIENDLY TAIL BOUNDS FOR SUMS OF RANDOM MATRICES JOEL A. TRO arxiv:1004.4389v7 [math.r] 15 Jun 2011 Abstract. This paper presents new probability inequalities for sums of independent, random, selfadjoint
More informationMatrix Concentration. Nick Harvey University of British Columbia
Matrix Concentration Nick Harvey University of British Columbia The Problem Given any random nxn, symmetric matrices Y 1,,Y k. Show that i Y i is probably close to E[ i Y i ]. Why? A matrix generalization
More informationarxiv: v6 [math.pr] 16 Jan 2011
USER-FRIENDLY TAIL BOUNDS FOR SUMS OF RANDOM MATRICES JOEL A. TRO arxiv:1004.4389v6 [math.r] 16 Jan 2011 Abstract. This paper presents new probability inequalities for sums of independent, random, selfadjoint
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationBounds for all eigenvalues of sums of Hermitian random matrices
20 Chapter 2 Bounds for all eigenvalues of sums of Hermitian random matrices 2.1 Introduction The classical tools of nonasymptotic random matrix theory can sometimes give quite sharp estimates of the extreme
More informationOn the Spectra of General Random Graphs
On the Spectra of General Random Graphs Fan Chung Mary Radcliffe University of California, San Diego La Jolla, CA 92093 Abstract We consider random graphs such that each edge is determined by an independent
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationDERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS. Daniel Paulin Lester Mackey Joel A. Tropp
DERIVING MATRIX CONCENTRATION INEQUALITIES FROM KERNEL COUPLINGS By Daniel Paulin Lester Mackey Joel A. Tropp Technical Report No. 014-10 August 014 Department of Statistics STANFORD UNIVERSITY Stanford,
More informationIntroduction to Self-normalized Limit Theory
Introduction to Self-normalized Limit Theory Qi-Man Shao The Chinese University of Hong Kong E-mail: qmshao@cuhk.edu.hk Outline What is the self-normalization? Why? Classical limit theorems Self-normalized
More informationMarch 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More informationConcentration inequalities and the entropy method
Concentration inequalities and the entropy method Gábor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many
More informationUsing Friendly Tail Bounds for Sums of Random Matrices
Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationMATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1
The Annals of Probability 014, Vol. 4, No. 3, 906 945 DOI: 10.114/13-AOP89 Institute of Mathematical Statistics, 014 MATRIX CONCENTRATION INEQUALITIES VIA THE METHOD OF EXCHANGEABLE PAIRS 1 BY LESTER MACKEY,MICHAEL
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationEfron Stein Inequalities for Random Matrices
Efron Stein Inequalities for Random Matrices Daniel Paulin, Lester Mackey and Joel A. Tropp D. Paulin Department of Statistics and Applied Probability National University of Singapore 6 Science Drive,
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationA NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER
A NOTE ON SUMS OF INDEPENDENT RANDOM MATRICES AFTER AHLSWEDE-WINTER 1. The method Ashwelde and Winter [1] proposed a new approach to deviation inequalities for sums of independent random matrices. The
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationLecture 2: Convex functions
Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on
More informationLecture 7 Monotonicity. September 21, 2008
Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationConcentration inequalities and tail bounds
Concentration inequalities and tail bounds John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno bounds II Sub-Gaussian random variables 1 Definitions 2 Examples
More informationSparsification by Effective Resistance Sampling
Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationEFRON STEIN INEQUALITIES FOR RANDOM MATRICES
The Annals of Probability 016, Vol. 44, No. 5, 3431 3473 DOI: 10.114/15-AOP1054 Institute of Mathematical Statistics, 016 EFRON STEIN INEQUALITIES FOR RANDOM MATRICES BY DANIEL PAULIN,1,LESTER MACKEY AND
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationMATH 217A HOMEWORK. P (A i A j ). First, the basis case. We make a union disjoint as follows: P (A B) = P (A) + P (A c B)
MATH 217A HOMEWOK EIN PEASE 1. (Chap. 1, Problem 2. (a Let (, Σ, P be a probability space and {A i, 1 i n} Σ, n 2. Prove that P A i n P (A i P (A i A j + P (A i A j A k... + ( 1 n 1 P A i n P (A i P (A
More informationShunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences
Superadditivity of Fisher Information: Classical vs. Quantum Shunlong Luo Academy of Mathematics and Systems Science Chinese Academy of Sciences luosl@amt.ac.cn Information Geometry and its Applications
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationChapter 7. Basic Probability Theory
Chapter 7. Basic Probability Theory I-Liang Chern October 20, 2016 1 / 49 What s kind of matrices satisfying RIP Random matrices with iid Gaussian entries iid Bernoulli entries (+/ 1) iid subgaussian entries
More informationGaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula
Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationsample lectures: Compressed Sensing, Random Sampling I & II
P. Jung, CommIT, TU-Berlin]: sample lectures Compressed Sensing / Random Sampling I & II 1/68 sample lectures: Compressed Sensing, Random Sampling I & II Peter Jung Communications and Information Theory
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationIntrinsic Volumes of Convex Cones Theory and Applications
Intrinsic Volumes of Convex Cones Theory and Applications M\cr NA Manchester Numerical Analysis Martin Lotz School of Mathematics The University of Manchester with the collaboration of Dennis Amelunxen,
More informationLecture Semidefinite Programming and Graph Partitioning
Approximation Algorithms and Hardness of Approximation April 16, 013 Lecture 14 Lecturer: Alantha Newman Scribes: Marwa El Halabi 1 Semidefinite Programming and Graph Partitioning In previous lectures,
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationA Matrix Expander Chernoff Bound
A Matrix Expander Chernoff Bound Ankit Garg, Yin Tat Lee, Zhao Song, Nikhil Srivastava UC Berkeley Vanilla Chernoff Bound Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationarxiv: v2 [math.pr] 27 Mar 2014
The Annals of Probability 2014, Vol. 42, No. 3, 906 945 DOI: 10.1214/13-AOP892 c Institute of Mathematical Statistics, 2014 arxiv:1201.6002v2 [math.pr] 27 Mar 2014 MATRIX CONCENTRATION INEQUALITIES VIA
More informationApproximately Gaussian marginals and the hyperplane conjecture
Approximately Gaussian marginals and the hyperplane conjecture Tel-Aviv University Conference on Asymptotic Geometric Analysis, Euler Institute, St. Petersburg, July 2010 Lecture based on a joint work
More information18.S096: Concentration Inequalities, Scalar and Matrix Versions
18S096: Concentration Inequalities, Scalar and Matrix Versions Topics in Mathematics of Data Science Fall 015 Afonso S Bandeira bandeira@mitedu http://mathmitedu/~bandeira October 15, 015 These are lecture
More informationInvertibility of random matrices
University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationMathematical Statistics
Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More informationThe Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York
The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationECE 275A Homework 6 Solutions
ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =
More informationKernel Learning with Bregman Matrix Divergences
Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,
More informationModule 3. Function of a Random Variable and its distribution
Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given
More informationOptimization Theory. A Concise Introduction. Jiongmin Yong
October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization
More informationStein s Method: Distributional Approximation and Concentration of Measure
Stein s Method: Distributional Approximation and Concentration of Measure Larry Goldstein University of Southern California 36 th Midwest Probability Colloquium, 2014 Concentration of Measure Distributional
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationConvex relaxations of chance constrained optimization problems
Convex relaxations of chance constrained optimization problems Shabbir Ahmed School of Industrial & Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive, Atlanta, GA 30332. May 12, 2011
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationHigh-dimensional distributions with convexity properties
High-dimensional distributions with convexity properties Bo az Klartag Tel-Aviv University A conference in honor of Charles Fefferman, Princeton, May 2009 High-Dimensional Distributions We are concerned
More informationStein Couplings for Concentration of Measure
Stein Couplings for Concentration of Measure Jay Bartroff, Subhankar Ghosh, Larry Goldstein and Ümit Işlak University of Southern California [arxiv:0906.3886] [arxiv:1304.5001] [arxiv:1402.6769] Borchard
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationThe Canonical Gaussian Measure on R
The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationFrom Steiner Formulas for Cones to Concentration of Intrinsic Volumes
Discrete Comput Geom (2014 51:926 963 DOI 10.1007/s00454-014-9595-4 From Steiner Formulas for Cones to Concentration of Intrinsic Volumes Michael B. McCoy Joel A. Tropp Received: 23 August 2013 / Revised:
More informationProbability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationGlobal Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations
Global Maxwellians over All Space and Their Relation to Conserved Quantites of Classical Kinetic Equations C. David Levermore Department of Mathematics and Institute for Physical Science and Technology
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationConcentration inequalities: basics and some new challenges
Concentration inequalities: basics and some new challenges M. Ledoux University of Toulouse, France & Institut Universitaire de France Measure concentration geometric functional analysis, probability theory,
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationGeometry of log-concave Ensembles of random matrices
Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann
More informationReview of Mathematical Concepts. Hongwei Zhang
Review of Mathematical Concepts Hongwei Zhang http://www.cs.wayne.edu/~hzhang Outline Limits of real number sequences A fixed-point theorem Probability and random processes Probability model Random variable
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationLecture 20: Linear model, the LSE, and UMVUE
Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More information