High-dimensional covariance estimation based on Gaussian graphical models
|
|
- Brianna Lang
- 5 years ago
- Views:
Transcription
1 High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26, 2011 Joint work with Philipp Rütimann, Min Xu, and Peter Bühlmann
2 Problem definition Want to estimate the covariance matrix for Gaussian Distributions: e.g., stock prices Take a random sample of vectors X (1),..., X (n) i.i.d. N p (0,Σ 0 ), where p is understood to depend on n Let Θ 0 := Σ 1 0 denote the concentration matrix Sparsity: certain elements of Θ 0 are assumed to be zero Task: Use the sample to obtain a set of zeros, and then an estimator for Θ 0 (Σ 0 ) upon a given pattern of zeros Show consistency in predictive risk and in estimating Θ 0 and Σ 0 when n, p
3 Gaussian graphical model: representation Let X be a p-dimensional Gaussian random vector X (X 1,..., X p ) N(0,Σ 0 ), where Σ 0 = Θ 1 0 In Gaussian graphical model G(V, E 0 ), where V = p: A pair (i, j) is NOT contained in E 0 (θ 0,ij = 0) iff X i X j {X k ; k V \{i, j}} Define Predictive Risk with Σ 0 as R(Σ) = tr(σ 1 Σ 0 )+log Σ 2E 0 (log f Σ (X)) where the Gaussian Log-likelihood function using Σ 0 is log f Σ (X) = p 2 log 2π 1 2 log Σ 1 2 X T Σ 1 X
4 Penalized maximum likelihood estimators To estimate a sparse model (i.e., Θ 0 0 is small), recent work has considered l 1 -penalized maximum likelihood estimators: let Θ 1 = vecθ 1 = i j θ ij, { } Θ n = arg min tr(θ Ŝ n ) log Θ +λ n Θ 1, where Θ 0 Ŝ n = n 1 n r=1 X(r) (X (r) ) T is the sample covariance The graph Ĝn is determined by the non-zeros of Θ n References: Yuan-Lin 07, d Aspremont-Banerjee-El Ghaoui 08, Friedman-Hastie-Tibshirani 08, Rothman et al 08, Z-Lafferty-Wasserman 08, and Ravikumar et. al. 08
5 Predictive risks Fix a point of interest with f 0 = N(0,Σ 0 ) For a given L n, consider a constrained set of positive definite matrices: Γ n = {Σ : Σ 0, Σ 1 1 L n } Define the oracle estimator as Σ = arg min Σ Γn R(Σ) Recall R(Σ) = tr(σ 1 Σ 0 )+log Σ Define Σ n as the minimizer of R n (Σ) subject to Σ Γ n, { Σ n = arg min trσ 1 } Ŝ n + log Σ Σ Γ n }{{} R n(σ) R n (Σ) is the negative Gaussian log-likelihood function and Ŝ n is the sample covariance
6 Risk consistency Persistence Theorem: Let p < n ξ, for some ξ > 0. Given Γ n = {Σ : Σ 0, Σ 1 1 L n }, where L n = o( n log n ), n. Then R( Σ n ) R(Σ n) P 0, where R(Σ) = tr(σ 1 Σ 0 )+log Σ and Σ n = arg min Σ Γ n R(Σ) L n = n 1 2 logn n= o + o +o +o o + Persistency answers the asymptotic question: How large may the set Γ n be, so that it is still possible to select empirically a predictor whose risk is close to that of the best predictor in the set (see Greenshtein-Ritov 04)
7 Non-edges act as the constraints Suppose we obtain an edge set E such that E 0 E: Define the estimator for the concentration matrix Θ 0 as: ) Θ n (E) = argmin Θ ME (tr(θŝn) log Θ, where M E = {Θ 0 and θ ij = 0 (i, j) E, and i j} Theorem. Assume that 0 < ϕ min (Σ 0 ) < ϕ max (Σ 0 ) <. Suppose that E 0 E and E \ E 0 = O(S), where S = E 0. Then, Θ n (E) Θ 0 F = O P ( (p+s) log max(n, p)/n ) This is the same rate as Rothman et al 08 for the l 1 -penalized likelihood estimate
8 Get rid of the dependency on p Theorem. Assume that 0 < ϕ min (Σ 0 ) < ϕ max (Σ 0 ) <. Assume that Σ 0,ii = 1, i. Suppose we obtain an edge set E such that E 0 E and E \ E 0 = O(S), where S := E 0 = p i=1 si. Then, Θ n (E) Θ 0 F = O P ( S log max(n, p)/n ) In the likelihood function, Ŝn will be replaced by the sample correlation matrix Γ n = diag(ŝn) 1/2 (Ŝn)diag(Ŝn) 1/2
9 Main questions: How to select an edge set E so that we estimate Θ 0 well? What assumptions do we need to impose on Σ 0 or Θ 0? How does n scale with p, E, or the maximum node degree deg(g)? What if some edges have very small weights? How to ensure that E \ E 0 is small? How does the edge-constrained maximum likelihood estimate behave with respect to E 0 \ E and E \ E 0?
10 Outline Introduction The regression model The method Theoretical results Conclusion
11 A Regression Model We assume a multivariate Gaussian model X = (X 1,..., X p ) N p (0,Σ 0 ), where Σ 0,ii = 1 Consider a regression formulation of the model: For all i = 1,...,p X i = j i β i j X j + V i where β i j = θ 0,ij /θ 0,ii, and V i N(0,σ 2 V i ) is independent of {X j ; j i} for which we assume that there exists v 2 > 0 such that for all i, Var(V i ) = 1/θ 0,ii v 2 Recall X i X j {X k ; k V \{i, j}} θ 0,ij = 0 β j i = 0 and β i j = 0
12 Want to recover the support of β i Take a random sample of size n, and use the sample to estimate β i, i; that is, we have for each variable X i, X i n = X.\i n (p 1) β i p 1 + ǫ n, where we assume p > n, that is, given high-dimensional data X Lasso (Tibshirani 96), a.k.a. Basis Pursuit (Chen, Donoho, and Saunders 98, and others): β i = arg min β X i X \i β 2 /2n+λ n β 1
13 Meinshausen and Bühlmann 06 Perform p regressions using the Lasso to obtain p vectors of regression coefficients β 1,..., β p where for each i, β i = { β j i ; j {1,..., p}\i} Then estimate the edge set by the OR rule, estimate an edge between nodes i and j β i j 0 or β j i 0 Under sparsity and Neighborhood Stability conditions, they show P(Ên = E 0 ) 1 as n
14 Sparsity At row i, define s0,n i as the smallest integer such that: p j=1,j i min{θ 2 0,ij,λ2 θ 0,ii } s i 0,n λ2 θ 0,ii The essential sparsity s0,n i at row i counts all (i, j) such that θ 0,ij λ θ 0,ii βj i λσ V i Define S 0,n = p i=1 si 0,n as the essential sparsity of the graph, which counts all (i, j) such that θ 0,ij λ min( θ 0,ii, θ 0,jj ) βj i λσ V i or β j i λσ V j Aim to keep 2S 0 edges in E
15 Defining 2s 0 Let 0 s 0 s be the smallest integer such that p 1 i=1 min(β2 i,λ 2 σ 2 ) s 0 λ 2 σ 2, where λ = 2 log p/n If we order the β j s in decreasing order of magnitude β 1 β 2... β p 1, then β j < λσ j > s 0 Value s 0 2s 0 s p = 512 n = 500 s = 96 σ = 1 λ n = logp n σ 2logp n σ logp n This notion of sparsity has been used in linear regression (Candès-Tao 07, Z09,10) σ n
16 Selection: individual neighborhood We use the Lasso in combination with thresholding (Z09, Z10) for inferring the graph: Let λ = 2 log p/n For each of the nodewise regressions, obtain an estimator β i init using the Lasso with penalty parameter λ n λ, β i init = argmin β i n r=1 (X (r) i j i β i j X(r) j ) 2 +λ n βj i i, j i Threshold βinit i with τ λ to get the Zero set: Let D i = {j : j i, < τ} β i j,init
17 Selection: joining the neighborhoods Define the total zeros as: D = {(i, j) : i j : (i, j) D i D j } Select edge set E := {(i, j) : i, j = 1,...,p, i j,(i, j) D} That is, edge set is the joint neighborhoods across all nodes in the graph This reflects the idea that the essential sparsity S 0,n of the graph counts all (i, j) such that θ 0,ij λ min( θ 0,ii, θ 0,jj )
18 Example: a star graph Construct Σ 0 from a model used in Ravikumar et. al. 08: Σ 0 = 1 ρ ρ ρ... 0 ρ 1 ρ 2 ρ ρ ρ 2 1 ρ ρ ρ 2 ρ p p
19 Example: original graph p = 128, n = 96, s = 8,ρ = 0.5 λ n = 2 2 log p/n, τ = log p/n
20 Example: estimated graph with n = 96 λ n = 2 2 log p/n
21 Example: estimated graph λ n = 2 2 log p/n
22 Example: estimated graph λ n = 2 2 log p/n
23 Example: estimated graph λ n = 2 2 log p/n
24 Example: estimated graph λ n = 2 2 log p/n
25 Example: estimated graph λ n = 2 2 log p/n
26 Example: estimated graph λ n = 2 2 log p/n
27 Example: estimated graph λ n = 2 2 log p/n
28 Example: estimated graph τ = log p/n
29 Gelato: estimation of edge weight Given a graph with edge set E, we estimate the concentration matrix by maximum likelihood: Denote the sample correlation matrix by Γ n : Γ n = diag(ŝn) 1/2 (Ŝn)diag(Ŝn) 1/2 The estimator for the concentration matrix Θ 0 is: ) Θ n (E) = argmin Θ Mp,E (tr(θ Γ n ) log Θ, where M p,e = {Θ R p p ; Θ 0 and θ ij = 0 for all (i, j) D} and D := {(i, j) : i, j = 1,...,p,(i, j) E and i j}
30 Likelihood equations Let diag(ŝn) 1/2 = { σ 1,..., σ p } The following relationships hold for the maximum likelihood estimate Θ n and Σ n = ( Θ n ) 1 : Σ n,ii Σ n,ij Θ n,ij = 1, i = 1,...,p = Γ n,ij = Ŝn,ij/ σ i σ j, (i, j) E and = 0, (i, j) D This is also known as positive definite matrix completion problem
31 Set of assumptions Let c, C be some absolute contants. (A0) The size of the neighborhood for each node i V is bounded by an integer s < p and the sample size satisfies n Cs log(cp/s). (A1) The dimension and number of sufficiently strong edges S 0,n satisfy: p = o(e cn ) for some 0 < c < 1 and S 0,n = o(n/ log max(n, p)) (n ). (A2) The minimal and maximal eigenvalues of Σ 0 are bounded, and Σ 0,ii = 1 for all i.
32 The main theorem: selection Assume that (A0), and (A2) hold. Let λ = 2 log p/n. Let d, C, D depend on sparse and restrictive eigenvalues of Σ 0. Let λ n = dλ and τ = Dλ n be chosen appropriately chosen Denote the estimated edge by E = Ên(λ n,τ) Then with high probability, E 2S 0,n, where E \ E 0 S 0,n and Θ0,D F Cλ n min{s 0,n ( max i=1,...p θ2 0,ii ), s 0 diag(θ 0 ) 2 F } where s 0 = max i s0,n i denotes the maximum essential node degree
33 Example: p = 128, s = 12, ρ = 0.5 λ n = 2 2 log p/n, τ = f 2 log p/n FPR FNR Lasso f = 0.30 f = 0.35 f = Lasso f = 0.30 f = 0.35 f = n n
34 The main theorem: estimation Assume that in addition, (A1) holds. Then for Θ n and Σ n = ( Θ n ) 1, ) Θ n Θ 0 F = O P ( S 0,n log max(n, p)/n ) Σ n Σ 0 F = O P ( S 0,n log max(n, p)/n R( Θ n ) R(Θ 0 ) = O P ( S0,n log max(n, p)/n ) 2 2 ( So Θ n Θ 0, Σ n Σ 0 = O p S0,n log max(n, p)/n )
35 Obtaining an edge set E Let S i = {j : j i, β i j 0} and s i = S i. Let D,λn, C be the same as in the main Theorem For each of nodewise regressions, we apply the same thresholding rule to obtain a subset I i as follows, I i = {j : j i, τ = Dλ n }, and D i β i j,init := {1,..., i 1, i + 1,..., p}\i i Then we have with high probability, I i 2s0 i and Ii S i S i +s0 i and Θ i 0,D Cλ n θ 0,ii s i β 2 0 i 2 D Cλ n s0 i Proof follows from results in Z10 on the Thresholded Lasso estimator
36 Oracle inequalities for the Lasso Theorem (Z 10). Under (A0) and (A2), for all nodewise regressions, the Lasso estimator achieves squared l 2 loss of O P (s 0 σ 2 log p/n). Value s 0 2s 0 s p = 512 n = 500 s = 96 σ = 1 λ n = logp n σ 2logp n σ logp n σ n
37 Constructing a pivot point Now clearly by the OR rule, we have and E = {(i, j) : j I i, i = 1,...,p} p E I i p 2s0 i = 2S 0 i=1 i=1 Given a 2S 0 -sparse set of edges E, Define a sparse approximation Θ 0 of Θ 0 which is identical to Θ 0 on E and the diagonal, and zero elsewhere: Θ 0 = diag(θ 0 )+Θ 0,E = diag(θ 0 )+Θ 0,E E0 0 Θ 0 = p+2 E E 0 p+4s 0 with s+1-sparse row (column) vectors
38 Θ 0 as a sparse approximation The bias is small: Θ 0 Θ F 0 C max i=1,...p (θ 0,ii) S 0 log p/n For q = 1, 2,, Θ 0 Θ q 0 C max i=1,...p (θ 0,ii) s 0 λ n s Note that each row vector will be 2s sparse if we apply the AND rule, however, at the cost of a larger bias
39 Θ 0 as a pivot The sparsity and the small bias allow us to bound Θ n (E) Θ 0 F = O P S 0,n log max(n, p)/n }{{} r n where we use the fact that both the estimator Θ n (E) and the pivot Θ 0 are sparse By the triangle inequality, we conclude that Θ n (E) Θ 0 F Θ n (E) Θ 0 F + Θ 0 Θ 0 ) F = O P ( S 0,n log max(n, p)/n
40 Generalization on the estimation step Assume that (A1) and (A2) hold. Let σ 2 max := max i Σ 0,ii < and σ 2 min := min i Σ 0,ii > 0. Let W = diag(σ 0 ) 1/2. Suppose that we obtain an edge set E such that E = lin(s 0,n ) is a linear function in S 0,n For Θ 0 = diag(θ 0 )+Θ 0,E Θ 0 Θ 0 F C 2S 0,n log(p)/n We note that this is equivalent to assume Ω 0 Ω 0 F C 2S 0,n log(p)/n where Ω 0 = WΘ 0 W and Ω 0 = W( Θ 0 )W
41 Generalization on the estimation step Theorem. Suppose the sample size satisfies for a sufficiently large constant M, n > MS 0,n log max(n, p). Then Ω n (E) Ω 0 F O p ( 2S 0,n log max p, n/n) where Ω n (E) is the maximum likelihood estimator based on the sample correlation matrix Γ n : ) Ω n (E) = argmin Ω Mp,E (tr(ω Γ n ) log Ω
42 Generalization on the estimation step Given Ŵ = diag(ŝn) 1/2 and Ω n (E), compute Θ n = Ŵ 1 Ωn Ŵ 1 and Σ n = Ŵ( Ω n (E)) 1 Ŵ for which the following hold: Σ n,ij = Ŝn,ij (i, j) E {(i, i) : i = 1,...,p} Θ n,ij = 0, (i, j) D. Following the bound on Ω n (E) and arguments in Rothman et al. (2008), ) Θ n Θ 0 2 = O P ( S 0,n log max(n, p)/n
43 Generalization on the estimation error For the Frobenius norm and the risk to converge to zero, (A1) is to be replaced by: p n c for some 0 < c < 1 and p+s 0,n = o(n/ log max(n, p)) as n In this case, we have Θ n Θ 0 F ) = O P ( (p+s 0,n ) log max(n, p)/n Σ n Σ 0 F ) = O P ( (p+s 0,n ) log max(n, p)/n R( Θ n ) R(Θ 0 ) = O P ( (p+s0,n ) log max(n, p)/n ) We could achieve these rates with ) Θ n (E) = argmin Θ Mp,E (tr(θŝn) log Θ
44 Conclusion Gelato separates the tasks of model selection and (inverse) covariance estimation Thresholding plays a key role in obtaining a sparse approximation of the graph with a small bias using a very small amount of sample With stronger conditions on the sample size, convergence rates in terms of operator and frobenius norms, and KL divergence are established The method is feasible in high dimensions: p > n is allowed
45 Related work on inverse/covariance estimation Regression based selection/estimation: Meinshausen-Bühlmann 06, Peng et. al. 09, Yuan 10, Verzelen 10, Cai-Liu-Luo 11 Penalized likelihood method based on l 1 -norm penalty: Yuan-Lin 07, d Aspremont-Banerjee-El Ghaoui 08, Friedman-Hastie-Tibshirani 07, Rothman et. al. 08, Zhou-Lafferty-Wasserman 08, Ravikumar et. al. 08 Nonconvex: Lam-Fan 09 Sparse covariance selection/estimation: Bickel and Levina 06, 08; El Karoui 08, Levina and Vershynin 10 and more...
46 References RUDELSON, M. and ZHOU, S. (2011). Reconstruction from anisotropic random measurements. ArXiv: ZHOU, S. (2009). Restricted eigenvalue conditions on subgaussian random matrices. ArXiv: v2. ZHOU, S. (2009). Thresholding procedures for high dimensional variable selection and statistical estimation. In Advances in Neural Information Processing Systems 22. MIT Press. ZHOU, S. (2010). Thresholded lasso for high dimensional variable selection and statistical estimation. ArXiv: ZHOU, S., RÜTIMANN, P., XU, M. and BÜHLMANN, P. (2011). High-dimensional covariance estimation based on gaussian graphical models. ArXiv: v2.
High-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationThe Nonparanormal skeptic
The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationInference in high-dimensional graphical models arxiv: v1 [math.st] 25 Jan 2018
Inference in high-dimensional graphical models arxiv:1801.08512v1 [math.st] 25 Jan 2018 Jana Janková Seminar for Statistics ETH Zürich Abstract Sara van de Geer We provide a selected overview of methodology
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationEstimating Sparse Precision Matrix with Bayesian Regularization
Estimating Sparse Precision Matrix with Bayesian Regularization Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement Graphical
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationTuning Parameter Selection in Regularized Estimations of Large Covariance Matrices
Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationarxiv: v2 [econ.em] 1 Oct 2017
Estimation of Graphical Models using the L, Norm Khai X. Chiong and Hyungsik Roger Moon arxiv:79.8v [econ.em] Oct 7 Naveen Jindal School of Management, University of exas at Dallas E-mail: khai.chiong@utdallas.edu
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0076.R2 Title Graph Estimation for Matrix-variate Gaussian Data Manuscript ID SS-2017-0076.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0076
More informationAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMixed and Covariate Dependent Graphical Models
Mixed and Covariate Dependent Graphical Models by Jie Cheng A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationThe adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)
Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationarxiv: v1 [stat.me] 16 Feb 2018
Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationarxiv: v2 [math.st] 2 Jul 2017
A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationarxiv: v1 [math.st] 31 Jan 2008
Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationProximity-Based Anomaly Detection using Sparse Structure Learning
Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /
More informationDimension Reduction in Abundant High Dimensional Regressions
Dimension Reduction in Abundant High Dimensional Regressions Dennis Cook University of Minnesota 8th Purdue Symposium June 2012 In collaboration with Liliana Forzani & Adam Rothman, Annals of Statistics,
More informationRegularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationarxiv: v2 [math.st] 7 Aug 2014
Sparse and Low-Rank Covariance Matrices Estimation Shenglong Zhou, Naihua Xiu, Ziyan Luo +, Lingchen Kong Department of Applied Mathematics + State Key Laboratory of Rail Traffic Control and Safety arxiv:407.4596v2
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationarxiv: v1 [math.st] 8 Jan 2008
arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department
More informationGeneral principles for high-dimensional estimation: Statistics and computation
General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin
More informationEstimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation
Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationAn algorithm for the multivariate group lasso with covariance estimation
An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract
More informationSparse Covariance Matrix Estimation with Eigenvalue Constraints
Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationMarkov Network Estimation From Multi-attribute Data
Mladen Kolar mladenk@cs.cmu.edu Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 117 USA Han Liu hanliu@princeton.edu Department of Operations Research and Financial Engineering,
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationHigh-dimensional variable selection via tilting
High-dimensional variable selection via tilting Haeran Cho and Piotr Fryzlewicz September 2, 2010 Abstract This paper considers variable selection in linear regression models where the number of covariates
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationAdaptive First-Order Methods for General Sparse Inverse Covariance Selection
Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationSparse estimation of high-dimensional covariance matrices
Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationStability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationTheory and Applications of High Dimensional Covariance Matrix Estimation
1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications
More informationDelta Theorem in the Age of High Dimensions
Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More information