Permutation-invariant regularization of large covariance matrices. Liza Levina
|
|
- Sybil Armstrong
- 5 years ago
- Views:
Transcription
1 Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work with Adam Rothman, Amy Wagaman, Ji Zhu (University of Michigan) Peter Bickel (UC Berkeley)
2 Liza Levina Permutation-invariant covariance regularization 2/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
3 Liza Levina Permutation-invariant covariance regularization 3/42 Why estimate covariance? Principal component analysis (PCA) Linear or quadratic discriminant analysis (LDA/QDA) Inferring independence and conditional independence in Gaussian graphical models Inference about the mean (e.g., longitudinal mean response curve) Covariance itself is not always the end goal: PCA requires estimation of the eigenstructure LDA/QDA and conditional independence require the inverse (a.k.a. the concentration or precision matrix)
4 Liza Levina Permutation-invariant covariance regularization 4/42 What s wrong with the sample covariance? Observe X 1,...,X n, i.i.d. p-variate random variables ˆΣ = 1 n ( Xi n X )( X i X ) T i=1 MLE, unbiased (almost), well-behaved (and well studied) for fixed p, n. But very noisy if p is large. Eigenvalues overdispersed unless p/n 0 (Marčenko-Pastur, 1967; Wachter, 1978; Geman, 1980; Bai and Yin, 1993; Johnstone, 2001; Paul, 2004; el Karoui, 2007) Eigenvectors are not consistent (Johnstone and Lu, 2004; Paul, 2006) LDA breaks down if p/n (Bickel and Levina, 2004) Singular if p > n
5 Liza Levina Permutation-invariant covariance regularization 5/42 Desirable features for a covariance estimator Positive definite and well-conditioned Inverse easily available (in some applications) Sparse covariance or inverse (in some applications) Ultimately, consistency and good rates for large p
6 Liza Levina Permutation-invariant covariance regularization 6/42 Two general classes of estimators Variables have a natural notion of distance (time series, spatial data, longitudinal data, spectroscopy) exploit weaker dependence between distant variables The variable order is meaningless (genetics, finance, social sciences) estimators should be invariant to variable permutations
7 Liza Levina Permutation-invariant covariance regularization 7/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
8 Liza Levina Permutation-invariant covariance regularization 8/42 Estimators for ordered variables Exploit weaker dependence between distant variables Banding or tapering the covariance (Bickel and Levina, 2008; Furrer and Bengsston, 2007): make the entries far away from the diagonal smaller The concentration matrix is usually regularized via its Cholesky factor (Wu and Pourahmadi, 2003; Bickel and Levina, 2008; Huang et al., 2006; Levina et al., 2008).
9 Liza Levina Permutation-invariant covariance regularization 9/42 Bickel and Levina (2008) A result for banding/tapering Replace ˆΣ with ˆΣ R, where means Schur (element-wise) product If R is positive definite, so is ˆΣ R Banding: R k (i, j) = 1( i j k) gives B k (ˆΣ) (not necessarily positive definite) Other more smooth tapers guarantee positive definiteness
10 Liza Levina Permutation-invariant covariance regularization 10/42 All results in operator norm, a.k.a. matrix 2-norm: for symmetric M, M = max i λ i (M) Convergence in operator norm implies convergence of eigenvalues and eigenvectors Result uniform over a class of approximately bandable covariance matrices as p, n U(α, M, C) = { Σ :λ max (Σ) M, max σ ij Ck α for all k 0 }. j i: i j >k
11 Liza Levina Permutation-invariant covariance regularization 11/42 Theorem 1: If X is Gaussian and k on Σ U(α, M, C), B k (ˆΣ p ) Σ p = O P ( ) 1 α+1 log p n, then, uniformly ( log p n ) α α+1 For approximately bandable matrices, the banded estimator is consistent if log p n 0. Theorem also holds for general tapering If we assume λ min (Σ) ε > 0, the same rate holds for the inverse. In practice the tuning parameter k is selected by cross-validation
12 Liza Levina Permutation-invariant covariance regularization 12/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
13 Liza Levina Permutation-invariant covariance regularization 13/42 Covariance thresholding Bickel and Levina (2007); El Karoui (2007). If the variable order is meaningless, estimators should be invariant to variable permutations Consider thresholding as a permutation-invariant analogue of banding: T t (ˆΣ) = [ˆσ ij 1( ˆσ ij t)] This has been done in practice for a long time
14 Liza Levina Permutation-invariant covariance regularization 14/42 A class of approximately sparse matrices (a permutation-invariant analogue of approximately bandable ): for 0 q < 1 U τ ( q, M, c0 (p) ) = {Σ : σ ii M, p σ ij q c 0 (p), for all i} j=1 If q = 0, U τ (0, M, c 0 (p)) is a class of sparse matrices. U U τ for appropriately chosen constants (approximately bandable matrices are approximately sparse).
15 Liza Levina Permutation-invariant covariance regularization 15/42 ( Theorem 2: Suppose X is Gaussian. Uniformly on U τ q, c0 (p), M ), for sufficiently large M, if t = M log p log p n and n = o(1), then ( ) 1 q log p T t (ˆΣ p ) Σ p = O P c 0 (p) n For approximately sparse matrices, the thresholded estimator is consistent if log p n 0 (and c 0(p) does not grow too fast). If we assume λ min (Σ) > ε > 0, the same rate holds for the inverse. Threshold chosen by cross-validation
16 Liza Levina Permutation-invariant covariance regularization 16/42 Climate data example January mean temperatures from 1850 to 2006 (n = 157) The region covered by p = 2592 recording stations extends from to degrees longitude, and from to 87.5 latitude Not all the stations were in place for the entire period Look at EOFs (empirical orthogonal functions), the principal components of the spatial covariance matrix
17 Liza Levina Permutation-invariant covariance regularization 17/42 Sample Covariance EOFs EOF #1 (14.82% Variance) EOF #2 (8.68% Variance) Latitude Longitude Longitude
18 Liza Levina Permutation-invariant covariance regularization 18/42 Thresholded Covariance EOFs EOF #1 (8.49% Variance) EOF #2 (7.82% Variance) Latitude Longitude Longitude
19 Liza Levina Permutation-invariant covariance regularization 19/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
20 Liza Levina Permutation-invariant covariance regularization 20/42 Generalized thresholding Rothman, Levina, Zhu (2008) Theorem 2 applies to hard thresholding In wavelets and regression, smoother penalties often do better We define generalized thresholding by a function s t : R R s.t. (i) s t (z) z (shrinkage) (ii) s t (z) = 0 for z t (thresholding) (iii) s t (z) z t for all z R and t 0 (limit shrinkage) Also makes sense to have s t (z) = sign(z)s t ( z ). The covariance estimator is S t (ˆΣ) = [s t (ˆσ ij )]
21 Liza Levina Permutation-invariant covariance regularization 21/42 Examples of Generalized Thresholding Hard thresholding: s t (z) = z 1( z > t) Soft thresholding: (Donoho and Johnstone, 1994, 1995) s t (z) = sign(z)( z t) + Equivalent to minimizing quadratic loss with lasso penalty: S t (ˆΣ) = arg min S (s ij ˆσ ij ) 2 + t i,j i,j s ij
22 Liza Levina Permutation-invariant covariance regularization 22/42 Comparison of generalized thresholding operators s( z ) Hard Adaptive Lasso Soft SCAD z
23 Liza Levina Permutation-invariant covariance regularization 23/42 More examples SCAD (Fan, 1997; Fan and Li, 2001) Linear interpolation between soft and hard thresholding Equivalent to quadratic loss with the SCAD penalty Adaptive Lasso (Zou, 2006), goes back to Breiman (1995) Equivalent to quadratic loss with weighted lasso penalty, S t (ˆΣ) = arg min S (s ij ˆσ ij ) 2 + t i,j i,j w ij s ij with weights w ij = ˆσ ij η, which penalize small elements more.
24 Liza Levina Permutation-invariant covariance regularization 24/42 Theorem 3: (a) Under the same conditions, the hard thresholding theorem holds for generalized thresholding. (b) Sparsistency : with probability tending to 1, s t (ˆσ ij ) = 0 for all (i, j) such that σ ij = 0, (c) If we additionally assume that all non-zero elements satisfy σ ij > τ, where n(τ t), we also have, with probability tending to 1, s t (ˆσ ij ) 0 for all (i, j) such that σ ij 0. Tuning parameters have to be selected by cross-validation
25 Liza Levina Permutation-invariant covariance regularization 25/42 Some generalized thresholding results SCAD and adaptive lasso do somewhat better than hard and soft thresholding as measured by operator loss Sparsity: look at true positive rate (fraction of non-zeros estimated as non-zero) vs. false positive rate (fraction of zeros estimated as non-zero). Example: MA(1) (tri-diagonal matrix, σ ii = 1, σ i,i 1 = 0.3)
26 Liza Levina Permutation-invariant covariance regularization 26/42 p = 30 p = 200 True Positive Rate Soft Hard Adapt. Lasso SCAD True Positive Rate Soft Hard Adapt. Lasso SCAD False Positive Rate False Positive Rate
27 Liza Levina Permutation-invariant covariance regularization 27/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
28 Liza Levina Permutation-invariant covariance regularization 28/42 Discovering Variable Orderings Wagaman and Levina (2007) Rate comparisons, simulations and intuition suggest that if a structured order exists, it is better to use it than to ignore it Goal: reorder variables to make the covariance matrix as close to bandable as possible. Main Idea: use correlations as a measure of similarity and embed the variables in 1-d. The coordinates provide an ordering.
29 Liza Levina Permutation-invariant covariance regularization 29/42 The Isomap algorithm The classical tool is multi-dimensional scaling (MDS) An alternative: the Isomap (Tenenbaum et al, 2000) (developed for manifold embeddings). Given a dissimilarity measure d(i, j) (we use 1 ˆρ ij ) 1. For each point, find its r nearest neighbors and construct a neighborhood graph with dissimilarities as edge weights. 2. Estimate the geodesic distance d r (i, j) between each pair of points i, j by computing the shortest-path graph distance from i to j. 3. Apply metric MDS to the estimated geodesic distances.
30 Liza Levina Permutation-invariant covariance regularization 30/42 The Isoband estimator 1. Construct the neighborhood graph using d(i, j) = 1 ˆρ ij. 2. Apply Isomap to each connected component of the graph 3. Reorder the variables within each component according to their 1-d embedding coordinates and apply banding 4. Concatenate components into a block-diagonal estimator
31 Liza Levina Permutation-invariant covariance regularization 31/42 A bootstrap approach to improve the nearest neighbor graph There is noise in the k-nn links based on sample correlations Main idea: create T bootstrap replicates and only include an edge if it appears in at least ct of the replicates, where 0 < c < 1 is a tuning parameter (we use c = 0.5). Similar to bagging Improves detection of independent blocks and ordering, but adds computational cost
32 Liza Levina Permutation-invariant covariance regularization 32/42 A simulation example AR(1): σ ij = ρ i j, n = 100, ρ = 0.7, average (SE) operator norm loss over 50 replications p Sample Band. Band.Perm. MDS Isoband Bootst. Thresh (.05) 0.89(.04) 0.84(.05) 0.92(.04) 0.87(.04) 0.90(.04) 0.87(.05) (.06) 1.65(.04) 4.63(.01) 3.41(.05) 1.82(.04) 1.67(.04) 2.85(.03) (.07) 1.79(.04) 4.67(.00) 3.97(.03) 2.10(.06) 1.85(.05) 3.02(.02)
33 Liza Levina Permutation-invariant covariance regularization 33/42 Assessing sparsity: a block-diagonal example Three AR blocks of size 50,30,20, with ρ = 0.7,0.8,0.9 Heatmap of percentage of times each element was estimated as zero (black is 0%, white is 100%). As benchmark, apply banding to known blocks separately, each one in correct order.
34 Liza Levina Permutation-invariant covariance regularization 34/ (a) Banding known blocks (b) Thresholding (c) Isoband (d) Bootstrap
35 Liza Levina Permutation-invariant covariance regularization 35/42 Outline Introduction Motivation: a result for ordered variables Covariance Thresholding Generalized Thresholding Discovering Variable Orderings Regularization of the inverse
36 Liza Levina Permutation-invariant covariance regularization 36/42 Estimators of the inverse invariant under variable permutations Inverse Ω = Σ 1 (a.k.a. concentration matrix or precision matrix); needed in graphical models, LDA/QDA. Graphical models mostly focus on testing/finding zeros rather than estimating the matrix (Drton and Perlman, 2004; Dobra et al, 2004; Meinshausen and Bühlmann, 2006; Kalisch and Bühlmann, 2007) Testing for zeros can be followed by constrained estimation (Chaudhuri, Drton and Richardson, 2007), but hard to analyze the resulting estimator
37 Liza Levina Permutation-invariant covariance regularization 37/42 A Sparse Permutation-Invariant Estimator The negative log-likelihood up to a constant: l(x; Ω) = tr(ωˆσ) log Ω Add lasso penalty on off-diagonal elements of Ω force regularization and sparsity. SPICE: ˆΩλ = arg min Ω 0 { } tr(ωˆσ) log Ω + λ Ω jj j j The optimization problem has been studied by several authors; Yuan and Lin (2007) provided a fixed p analysis.
38 Liza Levina Permutation-invariant covariance regularization 38/42 High-Dimensional Analysis of SPICE Rothman, Bickel, Levina, Zhu (2007) To get the best rate, standardize first, run SPICE on the correlation matrix, then multiply variances back in Ω λ Theorem 4: Let Ω = Σ 1, and take λ (A1) 0 < ε 0 λ min (Σ) λ max (Σ) ε 1 0 log p n. Assume (A2) card{(i, j) : Ω ij 0, i j} s. Then, ( ) (1 + s) log p Ω λ Ω = O P n SPICE is consistent as long as s log p n 0. Compare to c 0 (p) log p n for thresholding.
39 Liza Levina Permutation-invariant covariance regularization 39/42 Solving the optimization problem Optimization is non-trivial and computationally expensive Yuan and Lin (2007): Maxdet algorithm (interior point optimization) Banerjee et al. (2005): Nesterov s convex optimization Our approach (Rothman et al 2008): parametrize via the Cholesky factor to enforce positive definiteness, use local quadratic approximation and coordinate-wise descent Friedman, Hastie, and Tibshirani (2008): apply the coordinate-wise descent lasso algorithm (R package glasso)
40 Liza Levina Permutation-invariant covariance regularization 40/42 Colon tumor classification example 40 colon adenocarcinoma and 22 healthy tissue samples (Alon et al, 1999) Affymetrix gene expression data Top p genes selected by two-sample t-test (out of 2000) 100 random splits into training (2/3) and test (1/3) sets LDA classification Consider two ways of selecting the tuning parameter
41 Liza Levina Permutation-invariant covariance regularization 41/42 Colon tumor classification errors Tuning parameter for SPICE chosen by (A) 5-fold CV on the training data maximizing the likelihood; (B) 5-fold CV on the training data minimizing the classification error; (C) minimizing the classification error on the test data p Naive Bayes A B C (0.8) 13.5(0.6) 14.6(0.6) 9.7(0.5) (0.8) 12.1(0.7) 14.7(0.7) 9.0(0.6) (0.8) 18.7(0.8) 16.9(0.9) 9.1(0.5) (1.0) 18.3(0.7) 18.0(0.7) 10.2(0.5)
42 Liza Levina Permutation-invariant covariance regularization 42/42 Conclusions Regularization is necessary High-dimensional asymptotics give trade-offs of dimension vs. sample size vs. sparsity and conditions on the covariance model Structure helps regularization Efficient optimization and low computational complexity are very important
43 Liza Levina Permutation-invariant covariance regularization 43/42 Thank you
Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationSparse estimation of high-dimensional covariance matrices
Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationarxiv: v1 [math.st] 31 Jan 2008
Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationTuning Parameter Selection in Regularized Estimations of Large Covariance Matrices
Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia
More informationEstimating Covariance Structure in High Dimensions
Estimating Covariance Structure in High Dimensions Ashwini Maurya Michigan State University East Lansing, MI, USA Thesis Director: Dr. Hira L. Koul Committee Members: Dr. Yuehua Cui, Dr. Hyokyoung Hong,
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationA path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)
A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE) Guilherme V. Rocha, Peng Zhao, Bin Yu July 24, 2008 Abstract Given n observations of a p-dimensional random
More informationComputationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor
Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationRegularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationEstimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation
Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationPenalized versus constrained generalized eigenvalue problems
Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationSimultaneous variable selection and class fusion for high-dimensional linear discriminant analysis
Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationSPARSE ESTIMATION OF LARGE COVARIANCE MATRICES VIA A NESTED LASSO PENALTY
The Annals of Applied Statistics 2008, Vol. 2, No. 1, 245 263 DOI: 10.1214/07-AOAS139 Institute of Mathematical Statistics, 2008 SPARSE ESTIMATION OF LARGE COVARIANCE MATRICES VIA A NESTED LASSO PENALTY
More informationTheory and Applications of High Dimensional Covariance Matrix Estimation
1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMixed and Covariate Dependent Graphical Models
Mixed and Covariate Dependent Graphical Models by Jie Cheng A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of
More informationJournal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts
Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA
More informationVariable Selection and Weighting by Nearest Neighbor Ensembles
Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationTopics in High-dimensional Unsupervised Learning
Topics in High-dimensional Unsupervised Learning by Jian Guo A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of
More informationSparse Covariance Matrix Estimation with Eigenvalue Constraints
Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationarxiv: v1 [stat.me] 16 Feb 2018
Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationGRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES.
GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationRobust Variable Selection Through MAVE
Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationComputational Statistics and Data Analysis. SURE-tuned tapering estimation of large covariance matrices
Computational Statistics and Data Analysis 58(2013) 339 351 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
More information