Estimation of large dimensional sparse covariance matrices

Size: px
Start display at page:

Download "Estimation of large dimensional sparse covariance matrices"

Transcription

1 Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009

2 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed) observations of a random vector (X i ) n i= in Rp. Suppose X i -s have covariance matrix Σ p Often interested in estimating Σ p (e.g for PCA); eigenvalues: λ i Standard estimator: Σ p = (X X ) (X X )/(n ); eigenvalues l i

3 Role of eigenvalues: examples Principal Component Analysis (PCA); large eigenvalues: Optimization problems: Example in risk management: Invest a j in X t,j. Risk measured by var (X t a) = a Σ p a. (Constraints on a in more realistic versions: e.g a i = ) Role of small(/all) eigenvalues.

4 Classical results from multivariate Statistics Classical theory: p fixed, n Fact: Σ unbiased, n-consistent estimator of Σ: n Σ Σ = OP (). Also, fluctuation theory (CLT) for largest eigenvalues (Anderson, 63), under eg normality of the data Much more is known about estimation of covariance matrices: Efron, Haff, Morris, Stein etc... Will focus here on large n, large p situation (quite common now), so p/n has finite non-zero limit

5 Visual Example Histogram eigenvalues of X'X/n X i : N(0,Id) p=2000 n= "True" (i.e population) eigenvalues

6 Graphical illustration: n=500, p= Marchenko-Pastur Law and Histogram of empirical eigenvalues X: 500*200 matrix, entries i.i.d N(0,) Histogram of eigenvalues of X'X/n and corresponding Marchenko-Pastur density superimposed Note: Σ = Id, so all population (or true ) eigenvalues equal.

7 Surprising? Previous plots perhaps surprising: In particular, CLT implies that ˆσ ij σ ij = O(n /2 ): good entrywise estimation But poor spectral estimation

8 Importance of ρ n = p/n Case of Σ = Id Suppose entries of n p matrix X are iid mean 0, sd, 4th moment. Population covariance = Id Let ρ n = p/n. Theorem (Marčenko-Pastur, 67) l i : (decreasing) eigenvalues of n X X. Assume ρ n ρ (0, ]. If F p (x) = p {#l i x}, then F ρ has known density. F p = F ρ in probability. Support: [a, b], with a = ( ρ /2 ) 2, b = ( + ρ /2 ) 2 Bias issue/inconcistency problems for these asymptotics.

9 Marcenko-Pastur law illustration Population covariance: Σ = Id Here, density of limiting spectral distribution computable: Density of Marcenko Pastur law for varying r r =

10 Remark about extensions/limitations RMT also gives results about limiting spectral distribution of eigenvalues when Σ Id Models of form X i = Σ /2 Y i, entries of Y i i.i.d (say 4+ɛ moments) Problem: geometric implications for the data; near orthogonality, near constant norm of X / p, when λ (Σ) bounded Two models can have same Σ and different limiting spectral distributions

11 Estimation problems Previous results highlights fact that naive estimation of covariance matrix results in inconsistency in high-dimension. Question: can we find procedures that consistently estimate spectral properties of these matrices? (I.e also eigenspaces) And are more robust to distributional assumptions? Can we exploit good entrywise estimation to improved spectral estimation?

12 Previous work Related ideas: Banding Scheme proposed by P. Bickel and L. Levina ( 06): Consider population covariance matrices with small entries away from the diagonal Technically: Σ p satisfies max{ σ(i, j) } < Cm α, m j i j >m Recommend banding: ˆσ i,j = 0 if i j > k, otherwise keep estimate from Σ p Call this estimator B k ( Σ p ) Theorem (Bickel and Levina) If k (n /2 log(p)) /(α+), + tail decay conditions B k ( Σ p ) Σ p 2 0

13 Sparsity Introduction Oft-made assumption: many σ(i, j)=0 or small Appeal of thresholding (i.e keep large values, and put small ones to 0) Aim: find simple methods for improving estimation Practical requirements: speed, parallelizability, limited assumptions Theoretical requirements: good convergence properties, in spectral norm ( 2 ) Estimator not sensitive to ordering of variables

14 What is a good estimator? Requirement: where (for symmetric matrices) Σ p Σ p 2 0 M 2 = σ (M) = λ (M M) = max λ i (M). i Convergence of 2 implies Consistency of all eigenvalues (Weyl s inequality) Consistency of stable subspaces corresponding to separated eigenvalues (Davis-Kahan sin θ theorem) Our aim: permutation equivariant estimator, operator-norm consistent Our strategy: (hard) thresholding

15 Standard notion of sparsity Ill-suited for spectral problems Standard notion : count number of non-zero elements p p... p p E = E 2 = p p p p p p p p p Eigenvalues E : ± p p, (multiplicity: p-2) Eigenvalues( E 2 : ) + 2 p cos πk p+, k =,..., p

16 Adjacency matrices p p... p p E = p p A =

17 Adjacency matrices E 2 =. p p... 0 p p p p 0 p A 2 =

18 Adjacency matrices and graphs: comparison Closed walks of length k Closed walk: start and finishes at same vertex Length of walk: number of vertices it traverses

19 Adjacency matrices and graphs Connection with spectrum of covariance matrices Closed walk (length k) γ: i i 2... i k i k+ = i Weight of γ: w γ = σ(i, i 2 )... σ(i k, i ) ¾(,) ¾(,) ¾(,6) ¾(,2) ¾(,2) ¾(6,6) 6 2 ¾(2,2) ¾(6,6) 6 2 ¾(2,2) ¾(,5) ¾(,4) ¾(,3) ¾(5,6) ¾(2,3) ¾(5,5) 5 3 ¾(3,3) ¾(5,5) 5 3 ¾(3,3) 4 ¾(4,5) 4 ¾(3,4) ¾(4,4) trace (Σ k) = λ k i (Σ) = γ C p(k) ¾(4,4) w γ

20 Notion of sparsity compatible with spectral analysis A proposal Given Σ p, p p covariance matrix, compute adjacency matrix A p = σ(i,j) 0 Associate graph G p to it Consider C p (k) = {closed walks of length k on the graph with adjacency matrix A p } and φ p (k) = Card {C p (k)} = trace ( A k p). Call sequence of Σ p β-sparse if k 2N, φ p (k) f (k)p β(k )+ where f (k) independent of p and 0 β <

21 Examples Computation of sparsity coefficients Diagonal matrix : A p = Id p. φ(k) = p, for all k. Sparsity coefficient: 0. Matrices with at most M non-zero elements on each line φ(k) pm k. Sparsity coefficient: 0. Matrices with at most Mp α non-zero elements on each line φ(k) M (k ) pp α(k ). Sparsity coefficient: α

22 Assumptions underlying results In all that follows, Σ p (i, i) stay bounded 2 X i,j have infinitely many moments 3 Rows of (n p) data matrix X i.i.d 4 p/n l (0, )

23 Simple case: gap in entries of covariance matrix Gaussian MLE, centered case X i,j centered; cov(x i ) = Σ Suppose Σ p β-sparse, β = /2 η and η > 0 Theorem S p = n n X X, and T α (S p )(i, j) = S p (i, j) Sp(i,j) >Cn α. i= T α (S p )= thresholded version of S p at level Cn α Then, if α = β + ɛ, ɛ < η/2, T α (S p ) Σ p 2 0i.p.

24 Beyond truly sparse matrices Approximation by sparse matrices How does thresholding perform on matrices approximated by sparse matrices? Suppose T α (Σ p ) = Σ p, β-sparse. Suppose Σ p Σ p 2 0. Suppose α 0 < α < /2 δ 0 such that adjacency matrix of (i, j) s such that Cn α < σ(i, j) < Cn α 0 is γ-sparse, γ α 0 ζ 0, ζ 0 > 0. Proposition Then conclusions of theorem above apply: for α (α 0, α ), T α (S p ) Σ p 2 0 i.p.

25 Complements Theorems apply to sample covariance matrix, i.e (X X ) (X X )/(n ), not just the Gaussian MLE Also to correlation matrices Finite number of moments possible p = n r for certain r depending on β Proof provides finite n, p bounds on deviation probabilities of T α (S p ) Σ p 2 Example: Σ () p (i, j) = ρ i j, and Σ p = P Σ () p P, P random permutation matrix Can be approximated by Σ p = T n /2+ɛ(Σ p ) Σ p is asymptotically 0-sparse Proposition above applies when moment conditions satisfied

26 Sharpness of /2-sparse assumption Consider α 2 α 3... α p α Σ p = α p α p with, e.g, α i = p Σ p : /2-sparse (eigenvalues A p : ( ± p and s) Oracle estimator (O( Σ p )) of Σ p inconsistent in Gaussian case: O( Σ p ) Σ p 2 2 = p i=2 (ˆα i α i ) 2 /2 η sparsity assumption sharp.

27 Practical observations Asymptopia? Practical implementation Practicalities: implemented thresholding technique using FDR at level / p Theory: great results possible; Practice: good results (n,p a few 00 s to a few 000 s); maybe not as good as hoped for. Practice very good when few coefficients to estimate; far away from σ/ n (unsurprisingly) Making mistakes OK. Softer techniques (Huber-type) might be good alternatives, especially for non-sparse matrices Eigenvector practice somewhat disappointing. In hard situations, performs OK. In less hard situations, results comparable or a bit worse than sparse PCA. Example thresholding

28 Conclusions Highlighted some difficulties in covariance estimation in large n, large p setting Proposed notion of sparsity compatible with spectral analysis

29 Real world example Data Analysis: Portfolio of Industry Indexes Data: Daily Returns 48 Indexes, by Industry: Agriculture, Toys, Beer, Soda, etc... from Kenneth French s website at Dartmouth. p = 48 2 years of observations n = 504 Effect of shrinkage on smallest eigenvalue: l empirical l shrunk 0.27 Through Subsampling, get (0.205,0.303) 95% CI

30 Data Analysis: Portfolio of Industry Indexes Scree plot for Industry Index Portfolio data Smallest eigenvalue Figure: Scree plot for Industry index data

31 Data Analysis: Portfolio of Industry Indexes Subsampling distribution of adjusted estimator of smallest eigenvalue Industry indexes n=504 p=48 60 number Repetitions: number subsamples: empirical smallest eigenvalue Figure: Subsampling distribution of estimator Back

32 (Easy) Example : Toeplitz(,.3,.4) n = p = Population covariance: Toeplitz(,0.3,0.4) n=p= Thresholded matrix eigenvalues 0.5 Population eigenvalues

33 (Easy) Example : Toeplitz(,.3,.4) n = p = Population covariance: Toeplitz(,0.3,0.4) n=p=500 2 Oracle eigenvalues.5 Thresholded matrix eigenvalues 0.5 Population eigenvalues

34 (Easy) Example : Toeplitz(,.3,.4) n = p = Population covariance: Toeplitz(,0.3,0.4) n=p= Sample Covariance eigenvalues 2 Thresholded matrix eigenvalues 0 Population eigenvalues

35 Example 2: Toeplitz(2,.2,.3,0,-.4) n = p = Population covariance: Toeplitz(2,.2,.3,0,.4) n=p= Population Spectrum 2.5 Spectrum Thresholded Matrix

36 Example 2: Toeplitz(2,.2,.3,0,-.4) n = p = Population covariance: Toeplitz(2,.2,.3,0,.4) n=p= Population Spectrum Sample Covariance Matrix 2 0 Spectrum Thresholded Matrix

37 Example 2: Toeplitz(2,.2,.3,0,-.4) Independent simulation Population covariance: Toeplitz(2,.2,.3,0,.4) n=p= Black line: Population eigenvalues 2.5 Blue: Thresholded Matrix eigenvalues 0.5 Green + : oracle banded matrix Red.: oracle matrix Back

38 Estimating the theoretical frontier Practical implementation Data: from Fama-French website. Year n = 252. Those are p = 48 industry indexes. Orders of magnitude: α = 0.040, β = , γ = â = , b = , ĉ =.563 Also, δ = , d =.0208

39 Estimating the theoretical frontier Practical implementation: 252 days, 47 assets; raw data 0. Correction to empirical frontier: year of data Target Returns Naive Plug-in Estimator Corrected Estimator σ

40 Estimating the theoretical frontier Practical implementation: 252 days, 47 assets; simulations 0. Correction of efficient frontier: simulation results Efficient frontier 0.02 Mean of Corrected Frontier Bottom of 95% CI for corrected frontier Top of 95% CI for corrected frontier Median of Corrected Frontier 0.0 Mean uncorrected Frontier Bottom of 95% CI for uncorrected frontier Top of 95% CI for uncorrected frontier Back

41 Eigenfaces An interesting example of PCA Question: compression of digitized pictures of human faces. Idea: Gather a database of faces. Example: ORL face database: 0 pictures of 40 distinct subjects Pictures are say 92*2=0304 pixels, 256 levels of gray images. Code them as vectors. Get data matrix X. Run Principal Component Analysis on the corresponding matrix Eigenvectors corresponding to large eigenvalues are eigenfaces. Need only a few numbers to approximate a new face: coefficients of its projection on eigenfaces.

42 Some eigenfaces for ORL face database Data obtained from Wikipedia. Copyright by AT&T Laboratories Cambridge. Back

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Preface to the Second Edition...vii Preface to the First Edition... ix

Preface to the Second Edition...vii Preface to the First Edition... ix Contents Preface to the Second Edition...vii Preface to the First Edition........... ix 1 Introduction.............................................. 1 1.1 Large Dimensional Data Analysis.........................

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Eigenvector stability: Random Matrix Theory and Financial Applications

Eigenvector stability: Random Matrix Theory and Financial Applications Eigenvector stability: Random Matrix Theory and Financial Applications J.P Bouchaud with: R. Allez, M. Potters http://www.cfm.fr Portfolio theory: Basics Portfolio weights w i, Asset returns X t i If expected/predicted

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY ILSE C.F. IPSEN Abstract. Absolute and relative perturbation bounds for Ritz values of complex square matrices are presented. The bounds exploit quasi-sparsity

More information

Robust Covariance Estimation for Approximate Factor Models

Robust Covariance Estimation for Approximate Factor Models Robust Covariance Estimation for Approximate Factor Models Jianqing Fan, Weichen Wang and Yiqiao Zhong arxiv:1602.00719v1 [stat.me] 1 Feb 2016 Department of Operations Research and Financial Engineering,

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Random Matrices: Invertibility, Structure, and Applications

Random Matrices: Invertibility, Structure, and Applications Random Matrices: Invertibility, Structure, and Applications Roman Vershynin University of Michigan Colloquium, October 11, 2011 Roman Vershynin (University of Michigan) Random Matrices Colloquium 1 / 37

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Correlation Matrices and the Perron-Frobenius Theorem

Correlation Matrices and the Perron-Frobenius Theorem Wilfrid Laurier University July 14 2014 Acknowledgements Thanks to David Melkuev, Johnew Zhang, Bill Zhang, Ronnnie Feng Jeyan Thangaraj for research assistance. Outline The - theorem extensions to negative

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao

A note on a Marčenko-Pastur type theorem for time series. Jianfeng. Yao A note on a Marčenko-Pastur type theorem for time series Jianfeng Yao Workshop on High-dimensional statistics The University of Hong Kong, October 2011 Overview 1 High-dimensional data and the sample covariance

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws

Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Random Matrices: Beyond Wigner and Marchenko-Pastur Laws Nathan Noiry Modal X, Université Paris Nanterre May 3, 2018 Wigner s Matrices Wishart s Matrices Generalizations Wigner s Matrices ij, (n, i, j

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Large sample covariance matrices and the T 2 statistic

Large sample covariance matrices and the T 2 statistic Large sample covariance matrices and the T 2 statistic EURANDOM, the Netherlands Joint work with W. Zhou Outline 1 2 Basic setting Let {X ij }, i, j =, be i.i.d. r.v. Write n s j = (X 1j,, X pj ) T and

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Local Kesten McKay law for random regular graphs

Local Kesten McKay law for random regular graphs Local Kesten McKay law for random regular graphs Roland Bauerschmidt (with Jiaoyang Huang and Horng-Tzer Yau) University of Cambridge Weizmann Institute, January 2017 Random regular graph G N,d is the

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Spectral gap in bipartite biregular graphs and applications

Spectral gap in bipartite biregular graphs and applications Spectral gap in bipartite biregular graphs and applications Ioana Dumitriu Department of Mathematics University of Washington (Seattle) Joint work with Gerandy Brito and Kameron Harris ICERM workshop on

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Extreme eigenvalues of Erdős-Rényi random graphs

Extreme eigenvalues of Erdős-Rényi random graphs Extreme eigenvalues of Erdős-Rényi random graphs Florent Benaych-Georges j.w.w. Charles Bordenave and Antti Knowles MAP5, Université Paris Descartes May 18, 2018 IPAM UCLA Inhomogeneous Erdős-Rényi random

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Empirical properties of large covariance matrices in finance

Empirical properties of large covariance matrices in finance Empirical properties of large covariance matrices in finance Ex: RiskMetrics Group, Geneva Since 2010: Swissquote, Gland December 2009 Covariance and large random matrices Many problems in finance require

More information

Spectral Partitiong in a Stochastic Block Model

Spectral Partitiong in a Stochastic Block Model Spectral Graph Theory Lecture 21 Spectral Partitiong in a Stochastic Block Model Daniel A. Spielman November 16, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Operator norm convergence for sequence of matrices and application to QIT

Operator norm convergence for sequence of matrices and application to QIT Operator norm convergence for sequence of matrices and application to QIT Benoît Collins University of Ottawa & AIMR, Tohoku University Cambridge, INI, October 15, 2013 Overview Overview Plan: 1. Norm

More information

Ritz Value Bounds That Exploit Quasi-Sparsity

Ritz Value Bounds That Exploit Quasi-Sparsity Banff p.1 Ritz Value Bounds That Exploit Quasi-Sparsity Ilse Ipsen North Carolina State University Banff p.2 Motivation Quantum Physics Small eigenvalues of large Hamiltonians Quasi-Sparse Eigenvector

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Matching Polynomials of Graphs

Matching Polynomials of Graphs Spectral Graph Theory Lecture 25 Matching Polynomials of Graphs Daniel A Spielman December 7, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class The notes

More information

Parameter estimation in linear Gaussian covariance models

Parameter estimation in linear Gaussian covariance models Parameter estimation in linear Gaussian covariance models Caroline Uhler (IST Austria) Joint work with Piotr Zwiernik (UC Berkeley) and Donald Richards (Penn State University) Big Data Reunion Workshop

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Large sample distribution for fully functional periodicity tests

Large sample distribution for fully functional periodicity tests Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles

More information

Lectures 6 7 : Marchenko-Pastur Law

Lectures 6 7 : Marchenko-Pastur Law Fall 2009 MATH 833 Random Matrices B. Valkó Lectures 6 7 : Marchenko-Pastur Law Notes prepared by: A. Ganguly We will now turn our attention to rectangular matrices. Let X = (X 1, X 2,..., X n ) R p n

More information

Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors

Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors Invariant Subspace Perturbations or: How I Learned to Stop Worrying and Love Eigenvectors Alexander Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Overlapping Variable Clustering with Statistical Guarantees and LOVE with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017 U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017 Scribed by Luowen Qian Lecture 8 In which we use spectral techniques to find certificates of unsatisfiability

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Fluctuations from the Semicircle Law Lecture 4

Fluctuations from the Semicircle Law Lecture 4 Fluctuations from the Semicircle Law Lecture 4 Ioana Dumitriu University of Washington Women and Math, IAS 2014 May 23, 2014 Ioana Dumitriu (UW) Fluctuations from the Semicircle Law Lecture 4 May 23, 2014

More information

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio

More information

Portfolio Allocation using High Frequency Data. Jianqing Fan

Portfolio Allocation using High Frequency Data. Jianqing Fan Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

Norms of Random Matrices & Low-Rank via Sampling

Norms of Random Matrices & Low-Rank via Sampling CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 4-10/05/2009 Norms of Random Matrices & Low-Rank via Sampling Lecturer: Michael Mahoney Scribes: Jacob Bien and Noah Youngs *Unedited Notes

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES.

GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES. GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

More information

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010

More information

The Kadison-Singer Conjecture

The Kadison-Singer Conjecture The Kadison-Singer Conjecture John E. McCarthy April 8, 2006 Set-up We are given a large integer N, and a fixed basis {e i : i N} for the space H = C N. We are also given an N-by-N matrix H that has all

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Estimating Covariance Structure in High Dimensions

Estimating Covariance Structure in High Dimensions Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information