Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
|
|
- Elwin Turner
- 5 years ago
- Views:
Transcription
1 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
2 Outline 1. Why estimate covariance matrices? 2. Some examples 3. Pathologies of empirical covariance matrix 4. Sparsity in various guises 5. A brief review of methods 6. Asymptotic theory for banding, thresholding, SPICE (LASSO penalized Gaussian log likelihood) 7. Simulations and data 8. Some future directions
3 Model X 1,..., X n : p vectors, I.I.D. F. Base model: F = N p (µ, Σ) More generally: E F X 2 < µ(f ) = E F X Σ(F ) = E F (X µ)(x µ) T Asymptotic theory: n, p, Σ T p,n Extensions to Stationary X 1,..., X n, Wei Biao Wu (2007).
4 Why Estimate Covariance Matrix? Principal component analysis (PCA) Linear or quadratic discriminant analysis (LDA/QDA) Inferring independence and conditional independence in Gaussian models(graphical models) Inference about the mean (e.g. longitudinal mean response curve) Covariance itself is usually not the end goal: PCA: Estimation of the eigenstructure LDA/QDA, Graphical models: Estimation of inverses
5 Covariance Matrices in Climate Research Example 1 Data courtesy of Serge Guillas Empirical Orthogonal Functions(EOF) aka principal components. X 1,..., X n : p dimensional stationary vector time series. Monthly Jan Temp Latitude longitude calls 5 o 5 o grid p=2592 n=157
6 Example 1 EOF pattern #1 (clim.pact)
7 Example 1 X k = {X (i, j) : i = latitude, j = longitude} X n p = ( X T 1,..., XT n E(X) = 0 Σ = Var(X) = E ) T ( XX T ) = p λ j e j e T j j=1 e 1,..., e p : Principal components. λ 1 > > λ p : Eigenvalues. Goal : Estimate, interpret e j, j = 1,..., K such that K j=1 λ j p j=1 λ j large.
8 Covariance Matrices in Climate Research Example 2 Data assimilation X j = ave. pressure, temperature,... in 50km 50km variable block of atmosphere, J 10 7, computer model. X i = X(t i ) i = 1,..., T. Y(t i ) : Data vectors. Ensemble: X U j (t), 1 j n. Data assimilated : X F j (t), 1 j n uses ˆΣ 1 as estimate of true [ Var(X U 1 )] 1 for Kalman gain.
9 Pathologies of Empirical Covariance Matrix Observe X 1,..., X n, i.i.d. p-variate random variables ˆΣ = 1 n n ( Xi X ) ( X i X ) T i=1 MLE, for Gaussian unbiased (almost), well-behaved (and well studied) for fixed p, n. But very noisy if p is large. Singular if p > n, so ˆΣ 1 is not uniquely defined. Computational issues with ˆΣ 1 for large p. LDA completely breaks down if p/n Eigenstructure can be inconsistent if p/n γ > 0.
10 Eigenvalues Description of spreading phenomenon: Empirical distribution function: for eigenvalues {ˆl i } p j=1 G p (t) = p 1 #{ˆl j t} G(t) g(t)dt. Marčenko-Pastur, (67), For F = N p (0, I ), p/n γ 1.5 n=4p n=p g MP (b+ t)(t b ) (t) =, 2πγt b ± = (1 ± γ)
11 Eigenvectors D.Paul (2006) For N p (0, Σ) λ Σ = diag(λ, 1,..., 1) = Eigenvector ê ˆλ, e λ,, 1 < λ < 1 + γ, e ê 2 a.s. 2
12 Fisher s LDA For classification into two populations using: X 1,..., X n N p (, Σ), X n+1,..., X 2n N p (, Σ) Minimax behaviour over T p,n = { Σ : T Σ 1 c > 0 } of LDA based on MLE Σ 1 (Moore-Penrose for p 2 n 2 ) is equivalent to coin tossing if p n (Bickel - Levina (2004)) True for any rule!. What if T consists of sparse matrices?
13 Eigenstructure Sparsity Write, Σ = Σ 0 + σ 2 I p, where Σ 0 = M λ j θ j θ t j j=1 and λ 1... λ M > 0, {θ j } orthonormal. Equivalent : M X i = µ+ λj v ji θ j +σz i, i = 1,..., n, v ji i.i.d. N (0, 1) j=1
14 Eigenstructure Sparsity defined p, n large but, (i) M small fixed. (ii) θ j sparse well approximated by θ js, θ js 0 s, s small. v js 0 # of nonzero coordinates of v.
15 Label Dependent Sparsity of Σ Write Σ = σ ij, σ ij small (effectively 0) for i j large More generally, given metric m on J, σ ij small if m(i, j) large Note: σ ij = 0 X i X j under Gaussianity.
16 Label Dependent Sparsity of Σ 1 If Σ 1 = σ ij, σ ij small for many (i, j) pairs if i j is large. Note: σ ij = 0 X i X j {X k : k i, j} for Gaussian case.
17 Examples of Label Dependent Sparsity (i) X stationary σ(i, j) = σ( i j ) spectral density f, 0 < ε f 1 ε < Ergodic ARMA processes satisfy (ii) T = S + K X = Y + Z Y, Z independent, S Y, K Z, S = [s(i j)] as in (i), K Hilbert-Schmidt: K 2 p (i, j) < (Z m 0, non stationary) i,j (i) s 2 (i) < i S = 0 Σ singular but correspond to functional data analysis A typical goal: Estimate Σ 1 for prediction.
18 Permutation Invariant Sparsity of Σ or Σ 1 a) Each row of Σ sparse or sparsely approximable e.g. If σ i = {σ i,j : 1 j p}, σ i 0 s for all i. b) Each row of Σ 1 sparsely approximable. a) roughly implies b) if λ min (Σ) δ > 0. A more general notion (El Karoui (2007) AS to appear). Roughly) The number of loops of length k in the undirected graph: i j iff σ ij 0 grows more slowly than p k/2 as k. The corresponding definition for Σ 1 is not equivalent in general.
19 Permutation Invariant Sparsity of Σ 1 Σ 1 corresponds to a graphical model. N (i) = { j : σ ij 0, j i } Sparsity means neighbourhood size p. Example: Gene networks Goal : Determine N (i) i = 1,..., p.
20 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Eigenstructure sparsity a) Sparse PCA d Aspremont et al., SIAM Review (2007), Hastie, Tibshirani, Zhou (2007). PCA with Lasso penalty on coordinates of eigenvectors Focus on computation, no asymptotics
21 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Eigenstructure sparsity continued b) Johnstone and Lu, JASA (2006). Sparse PCA for Σ corresponding to signals X = (X (t 1),..., X (t p)) T Represent in wavelet basis Use only k << p coefficients to represent data to whose covariance matrix PCA is applied Asymptotics for large p and examples
22 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Label Dependent Sparsity Banding Bickel and Levina (2004), Bernoulli Bickel and Levina (2007), Annals of Statistics (To appear) a) Banding Σ For any M m ij, B k (M) m ij 1( i j k) Σ 1 n (X i n X)(X i X) T Estimate: B k ( Σ), k = o(p). i=1
23 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Label Dependent Sparsity continued b) Banding Σ 1 Σ 1 = AA T, A Lower triangular Estimate: Σ 1 ÂkÂT k, where Âk is obtained by regression. Pourahmadi and Wu (2003), Biometrika. Huang,Liu,Pourahmadi (2007)
24 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Permutation Invariant Sparsity (a) Thresholding For any M = m ij, T t (M) m ij 1( m ij t) Estimate: T t ( Σ), t = o(1) El Karoui (2007) Bickel and Levina (2007).
25 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Permutation Invariant Sparsity continued b) SPICE Estimate of Σ 1 : Σ 1 sparse F (P, M) Trace(MP) log det(p) + λ P 1, where M 1 = i,j m ij { } a) ˆP (1) arg min F (P, ˆΣ) : P 0 Yuan, Li (2007), Biometrika. Fixed p asymptotics Banerjee et al. (2006), ICML computation { } b) ˆR (2) arg min F (P, ˆR) : R 0, r ii = 1, all i ˆP (2) ˆD ˆR ˆD, ˆD diag(ˆσ ii). ˆR ˆD 1 Σ ˆD 1 a),b): Rothman et al. (2007), Submitted to JRSS(B) Asymptotics
26 Construction of Estimates of Σ or Parameters of Σ which work well for T having sparse members Permutation Invariant Sparsity continued c) Methods for neighbourhood estimation Meinshausen and Buhlmann (2006), Zhao and Yu (2007): Regression methods + Lasso Wainwright (2006): Bounds Kalisch and Buhlmann (2007): Using the PC algorithm of Spertes et al. (2000) to roughly hierarchically test whether partial correlation coefficients are 0
27 Some Asymptotic Theory Matrix norms Operator norms M m ij p p p x r r x j r, x = (x 1,..., x p ) j=1 M (2,2) = λ 1/2 max(mm T ) p M (1,1) = max j m ij M (, ) = max i i=1 p m ij j=1 M max m ij i,j M 2 F i,j m 2 ij : Frobenius norm
28 For any operator norm, Properties AB A B Henceforth, M (2,2) M. If M p p is symmetric, { λ (M) M = Max, M [ ] 1/2. M (1,1) M (, ) M M F. Max λ (M) Min }.
29 Basic Property of Given A n, B n symmetric A n B n 0, suppose λ 1 (B n ) > λ 2 (B n ) > > λ k (B n ) > λ k+1 (B n ) and define λ j (A n ) analogously. Suppose λ j+1 (B n ) < λ j (B n ), 1 j k Dimension B n arbitrary, k, > 0 fixed. Then, a) λ j (A n ) λ j (B n ) = O( (j 1) A n B n ) b) If E ja respectively E jb is projection operator onto eigenspace corresponding to λ j, then E ja E jb = O( j A n B n )
30 B-L (2006) Main Result I Banded estimator : ˆΣ k,p (i, j) = ˆΣ p (i, j) 1( i j k) Let Theorem 1 U(ɛ 0, α, C) = { Σ : 0 < ε 0 λ min (Σ) λ max (Σ) 1/ε 0, max { σ ij : i j > k} Ck α for all k 0 }. j i If X is Gaussian and k n ( n 1 log p ) 1 2(α+1), then, uniformly on Σ U(ε 0, α, C), ˆΣ kn,p Σ p = O P ( (n 1 log p ) α 2(α+1) ) = ˆΣ 1 k n,p Σ 1 p The banded estimator and its inverse are consistent if log p n 0 Remark λ min ε 0 not needed if only convergence in to Σ is needed.
31 An Approximation Theorem (Bickel and Lindner (2007)) If m B k A δ, (B k banded of width 2k), A = M <, A 1 = m 1 <, K M m, There exists N(m, M, δ), B kn such that B kn A 1 1 c(m, M, δ)δ, M where c(m, M, δ) tends to c(k) as δ 0. B kn has a simply computable form.
32 Thresholding Theorem(Thresholding) (B-L Submitted to AS) p If U t (q, c 0 (p), M) = Σ : σ ii M 0, σ ij q c 0 (p), all i j=1 log p 0 q < 1, t n = M n, ) ( ) ) log p (1 q)/2 T tn (ˆΣ Σ = O p (c 0 (p) n, Note q = 0 c 0 (p) : non zero entries per row More subtle consistency results : N. El Karoui (2007).
33 SPICE (Rothman et al.) If S { (i, j) : σ ij 0 }, S s, Σ 1 m 1, Σ M and X is Gaussian, ˆP (1) Σ 2 F ( = O p (p + s) log p ) n ) ( ˆP (2) Σ 2 s log p = O p n
34 Choosing the Banding (or Thresholding or SPICE) Ideally want to minimize risk Parameter R(k) = E ˆΣ k Σ Estimate via a resampling scheme: Split the data into two samples of size n 1, n 2, N times at random Let ˆΣ (ν) 1, ˆΣ (ν) 2 be the two sample covariance matrices from the ν-th split. The risk can be estimated by ˆR(k) = 1 N (ˆΣ (ν) 1 N ) k ˆΣ (ν) 2 ν=1 We used n 1 = n/3, N = 50, and the L 1 matrix norm instead of L 2. Oracle properties for Frobenius norm (Bickel and Levina (2007), Annals of Statistics submitted).
35 Covariance of AR(1): Σ U Simulation Examples σ ij = ρ i j n = 100, p = 10, 100, 200, ρ = 0.1, 0.5, 0.9. Fractional Gaussian noise (FGN): long-range dependence, not in U σ ij = 1 2 [( i j + 1) 2H 2 i j 2H + ( i j 1) 2H] H [0.5, 1] is the Hurst parameter H = 0.5 is white noise; H = 1 is perfect dependence n = 100, p = 10, 100, 200, H = 0.5, 0.6, 0.7, 0.8, 0.9.
36 True and Estimated Risk for AR(1) 2 p = 10, ρ = p = 10, ρ = p = 10, ρ = True risk Estimated risk k k k p = 100, ρ = p = 100, ρ = p = 100, ρ = k k k p = 200, ρ = k p = 200, ρ = k p = 200, ρ = k
37 (1, 1) Loss and Choice of k for AR(1) Mean(SD) p ρ k 1 ˆk ˆΣˆk Loss ˆΣ k (0.5) 0.0(0.2) (0.8) 2.0(0.6) (0.7) 8.9(0.3) (0.4) 0.1(0.3) (0.7) 2.3(0.5) (4.5) 15.9(2.6) (0.4) 0.2(0.4) (0.7) 2.7(0.5) (4.5) 16.6(2.4) Table: AR(1): Oracle and estimated k and the corresponding loss values. k 1 = argmin ˆΣ k Σ (1,1). k ˆΣ
38 Operator Norm Results for Various Estimates and Situations p ρ Sample Band Band-P Thresh SPICE (0.01) 0.35(0.01) 0.35(0.01) 0.40(0.01) 0.35(0.01) (0.02) 0.64(0.02) 0.74(0.02) 0.80(0.02) 0.72(0.02) (0.05) 1.02(0.05) 1.02(0.05) 1.02(0.05) 1.01(0.05) (0.02) 0.45(0.01) 0.45(0.01) 0.50(0.01) 0.45(0.01) (0.03) 0.95(0.01) 2.04(0.00) 1.96(0.01) 1.52(0.01) (0.12) 4.45(0.09) 6.03(0.12) 6.18(0.12) 5.57(0.12) (0.02) 0.49(0.01) 0.49(0.01) 0.51(0.01) 0.50(0.01) (0.03) 1.00(0.01) 2.06(0.00) 2.04(0.00) 1.60(0.01) (0.16) 5.24(0.09) 9.76(0.16) 8.80(0.10) 7.50(0.10) Table: Operator Norm Loss, Averages and SEs over 100 replications. AR(1)
39 Operator Norm Results for Various Estimates and Situations p H Sample Band Band-P Thresh SPICE (0.01) 0.27(0.01) 0.27(0.01) 0.35(0.01) 0.28(0.01) (0.02) 0.88(0.02) 1.03(0.03) 0.88(0.04) 1.05(0.05) (0.04) 0.96(0.04) 0.96(0.04) 0.96(0.04) 1.13(0.05) (0.01) 0.38(0.01) 0.38(0.01) 0.46(0.01) 0.39(0.01) (0.05) 4.46(0.02) 5.37(0.00) 5.36(0.00) 5.30(0.02) (0.28) 8.03(0.28) 8.02(0.28) 8.02(0.28) 14.18(0.70) (0.02) 0.44(0.01) 0.44(0.01) 0.46(0.01) 0.45(0.01) (0.06) 6.46(0.01) 7.40(0.00) 7.40(0.00) 7.39(0.00) (0.43) 14.65(0.46) 14.34(0.43) 14.34(0.43) 30.16(0.60) Table: Operator Norm Loss, Averages and SEs over 100 replications. FGN
40 Example 1 continued EOF pattern #1 (Thresholding)
41 Example 1 continued EOF pattern #2 (Thresholding)
42 Some Important Directions Choice of k or other regularization parameters. Canonical correlation regularized estimation. Independent component analysis regularization. Estimation of parameters governing independence and conditional independence in graphical models.
Permutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationSparse estimation of high-dimensional covariance matrices
Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationTuning Parameter Selection in Regularized Estimations of Large Covariance Matrices
Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationComputationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor
Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationarxiv: v1 [math.st] 31 Jan 2008
Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.
More informationTheory and Applications of High Dimensional Covariance Matrix Estimation
1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860
More informationJournal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts
Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA
More informationSmall sample size in high dimensional space - minimum distance based classification.
Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationEstimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation
Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationA path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)
A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) Guilherme V. Rocha, Peng Zhao, Bin Yu July 24, 2008 Abstract Given n observations of a p-dimensional random
More informationDimension Reduction in Abundant High Dimensional Regressions
Dimension Reduction in Abundant High Dimensional Regressions Dennis Cook University of Minnesota 8th Purdue Symposium June 2012 In collaboration with Liliana Forzani & Adam Rothman, Annals of Statistics,
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationPenalized versus constrained generalized eigenvalue problems
Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationIndirect multivariate response linear regression
Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationarxiv: v2 [math.st] 7 Aug 2014
Sparse and Low-Rank Covariance Matrices Estimation Shenglong Zhou, Naihua Xiu, Ziyan Luo +, Lingchen Kong Department of Applied Mathematics + State Key Laboratory of Rail Traffic Control and Safety arxiv:407.4596v2
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationRandom Matrices and Multivariate Statistical Analysis
Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical
More informationLearning Local Dependence In Ordered Data
Journal of Machine Learning Research 8 07-60 Submitted 4/6; Revised 0/6; Published 4/7 Learning Local Dependence In Ordered Data Guo Yu Department of Statistical Science Cornell University, 73 Comstock
More informationIncorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests
Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationEstimating Covariance Structure in High Dimensions
Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,
More informationRegularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics
More informationMP 472 Quantum Information and Computation
MP 472 Quantum Information and Computation http://www.thphys.may.ie/staff/jvala/mp472.htm Outline Open quantum systems The density operator ensemble of quantum states general properties the reduced density
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationarxiv: v1 [stat.me] 16 Feb 2018
Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationSparse Covariance Matrix Estimation with Eigenvalue Constraints
Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,
More informationA Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models
A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers
More informationarxiv: v2 [math.st] 2 Jul 2017
A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationGlobal Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures
Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures T. Tony Cai 1 1 Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, USA, 19104;
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationTuning-parameter selection in regularized estimations of large covariance matrices
Journal of Statistical Computation and Simulation, 2016 Vol. 86, No. 3, 494 509, http://dx.doi.org/10.1080/00949655.2015.1017823 Tuning-parameter selection in regularized estimations of large covariance
More informationSparse statistical modelling
Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationEstimation of the Global Minimum Variance Portfolio in High Dimensions
Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationNon white sample covariance matrices.
Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions
More informationSparse Principal Component Analysis Formulations And Algorithms
Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse
More informationInference in high-dimensional graphical models arxiv: v1 [math.st] 25 Jan 2018
Inference in high-dimensional graphical models arxiv:1801.08512v1 [math.st] 25 Jan 2018 Jana Janková Seminar for Statistics ETH Zürich Abstract Sara van de Geer We provide a selected overview of methodology
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More information