DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. By Zhao Ren and Harrison H. Zhou Yale University

Similar documents
SYMMETRIC POSITIVE SEMI-DEFINITE SOLUTIONS OF AX = B AND XC = D

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS

ECE534, Spring 2018: Final Exam

ECE534, Spring 2018: Solutions for Problem Set #2

A Central Limit Theorem for Belief Functions

Hybridized Heredity In Support Vector Machine

Unit 5. Hypersurfaces

John H. J. Einmahl Tilburg University, NL. Juan Juan Cai Tilburg University, NL

Lecture 12: February 28

Confidence Intervals

Iterative Techniques for Solving Ax b -(3.8). Assume that the system has a unique solution. Let x be the solution. Then x A 1 b.

Bounds for the Extreme Eigenvalues Using the Trace and Determinant

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

ON SOME NEW SEQUENCE SPACES OF NON-ABSOLUTE TYPE RELATED TO THE SPACES l p AND l I. M. Mursaleen and Abdullah K. Noman

Lecture 24: Variable selection in linear models

On Nonsingularity of Saddle Point Matrices. with Vectors of Ones

A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS. S. S. Dragomir 1. INTRODUCTION

UNIFORM RATES OF ESTIMATION IN THE SEMIPARAMETRIC WEIBULL MIXTURE MODEL. BY HEMANT ISHWARAN University of Ottawa

Stochastic Matrices in a Finite Field

arxiv: v4 [cs.it] 13 Feb 2015

A Note on Sums of Independent Random Variables

Estimation Theory Chapter 3

Algebra of Least Squares

Spreading Processes and Large Components in Ordered, Directed Random Graphs

Composite Quantile Generalized Quasi-Likelihood Ratio Tests for Varying Coefficient Regression Models Jin-ju XU 1 and Zhong-hua LUO 2,*

Supplemental Material: Proofs

Efficient GMM LECTURE 12 GMM II

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Self-normalized deviation inequalities with application to t-statistic

THE INTEGRAL TEST AND ESTIMATES OF SUMS

Dimension of a Maximum Volume

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

Statistical Inference Based on Extremum Estimators

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Random Variables, Sampling and Estimation

A Note on Bilharz s Example Regarding Nonexistence of Natural Density

Solutions to Problem Set 7

INFINITE SEQUENCES AND SERIES

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Round-off Errors and Computer Arithmetic - (1.2)

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series

Final Solutions. 1. (25pts) Define the following terms. Be as precise as you can.

arxiv: v2 [stat.ml] 5 Jun 2018

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

MINIMAX ESTIMATION OF LARGE COVARIANCE MATRICES UNDER l 1 -NORM

arxiv: v1 [math.st] 15 Jan 2014

Confidence intervals for proportions

6.3 Testing Series With Positive Terms

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Proposition 2.1. There are an infinite number of primes of the form p = 4n 1. Proof. Suppose there are only a finite number of such primes, say

Robust Lasso with missing and grossly corrupted observations

Weak and Strong Convergence Theorems of New Iterations with Errors for Nonexpansive Nonself-Mappings

10.6 ALTERNATING SERIES

Chapter 6 Infinite Series

b i u x i U a i j u x i u x j

Sieve Estimators: Consistency and Rates of Convergence

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

An operator equality involving a continuous field of operators and its norm inequalities

SYSTEMS ANALYSIS. I. V. Sergienko, E. F. Galba, and V. S. Deineka UDC :

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

New Definition of Density on Knapsack Cryptosystems

Nonlinear Gronwall Bellman Type Inequalities and Their Applications

Chapter Vectors

Math Solutions to homework 6

13.1 Shannon lower bound

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

PROBLEM SET I (Suggested Solutions)

Classification of DT signals

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value

Problem Set 4 Due Oct, 12

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

tests 17.1 Simple versus compound


September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

Chapter 2. Periodic points of toral. automorphisms. 2.1 General introduction

A Note on Matrix Rigidity

Sequences and Series of Functions

arxiv: v1 [math.pr] 13 Oct 2011

Lecture 8: October 20, Applications of SVD: least squares approximation

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Fall 2013 MTH431/531 Real analysis Section Notes

Estimation with Overidentifying Inequality Moment Conditions Technical Appendix

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

ON SUPERSINGULAR ELLIPTIC CURVES AND HYPERGEOMETRIC FUNCTIONS

Songklanakarin Journal of Science and Technology SJST R1 Teerapabolarn

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Kinetics of Complex Reactions

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Math 116 Second Midterm November 13, 2017

Differentiable Convex Functions

The Method of Least Squares. To understand least squares fitting of data.

Read carefully the instructions on the answer book and make sure that the particulars required are entered on each answer book.

Matrices and vectors

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Solutions to Tutorial 3 (Week 4)

Section 5.5. Infinite Series: The Ratio Test

Transcription:

Submitted to the Aals of Statistics DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION By Zhao Re ad Harriso H. Zhou Yale Uiversity 1. Itroductio. We would like to cogratulate the authors for their refreshig cotributio to this high-dimesioal latet variables grahical model selectio roblem. The roblem of covariace ad cocetratio matrices is fudametally imortat i several classical statistical methodologies ad may alicatios. Recetly, sarse cocetratio matrices estimatio had received cosiderable attetio, artly due to its coectio to sarse structure learig for Gaussia grahical models. See, for examle, Meishause ad Bühlma (2006) ad Ravikumar et al. (2008). Cai, Liu & Zhou (2012) cosidered rate-otimal estimatio. The authors exteded the curret scoe to iclude latet variables. They assume that the fully observed Gaussia grahical model has a aturally sarse deedece grah. However, there are oly artial observatios available for which the grah is usually o loger sarse. Let X be ( + r) variate Gaussia with a sarse cocetratio matrix S(O,H). We oly observe X O, out of the whole + r variables, ad deote its covariace matrix by Σ O. I this case, usually the cocetratio matrix (Σ O ) 1 are ot sarse. Let S be the cocetratio matrix of observed variables coditioed o latet variables, which is a submatrix of S(O,H) ad hece has a sarse structure, ad let L be the summary of the margializatio over the latet variables ad its rak corresods to the umber of latet variables r for which we usually assume it is small. The authors observed (Σ O ) 1 ca be decomosed as the differece of the sarse matrix S ad the rak r matrix L, i.e., (Σ O ) 1 = S L. The followig traditioal wisdoms the authors aturally roosed a regularized maximum likelihood aroach to estimate both the sarse structure S ad the low rak art L, mi tr ((S L) (S,L):S L 0, L 0 Σ O) log det (S L) + χ (γ S 1 + tr (L)) where Σ O is the samle covariace matrix, S 1 = i,j s ij, ad γ ad χ are regularizatio tuig arameters. Here tr (L) is the trace of L. The The research was suorted i art by NSF Career Award DMS-0645676 ad NSF FRG Grat DMS-0854975. 1

2 otatio A 0 meas A is ositive defiite, ad A 0 deotes that A is o-egative. There is a obvious idetifiability roblem if we wat to estimate both the sarse ad low rak comoets. A matrix ca be both sarse ad low rak. By exlorig the geometric roerties of the taget saces for sarse ad low rak comoets, the authors gave a beautiful sufficiet coditio for idetifiability, ad the rovided very much ivolved theoretical justificatios based o the sufficiet coditio, which is beyod our ability to digest them i a short eriod of time i the sese that we do t fully uderstad why those techical assumtios were eeded i the aalysis of their aroach. Thus we decided to look at a relatively simle but otetially ractical model, with the hoe to still cature the essece of the roblem, ad see how well their regularized rocedure works. Let 1 1 deotes the matrix l 1 orm, i.e., S 1 1 = max 1 i j=1 s ij. We assume that S is i the followig uiformity class, (1) U (s 0 (), M ) = S = (s ij) : S 0, S 1 1 M, max 1 {s ij 0} s 0 () 1 i, where we allow s 0 () ad M to grow as ad icrease. This uiformity class was cosidered i Ravikumar et al. (2008) ad Cai, Liu ad Luo (2011). For the low rak matrix L, we assume that the effect of margializatio over the latet variables sreads out, i.e. the low rak matrix L has row/colum saces that are ot closely aliged with the coordiate axes to resolve the idetifiability roblem. Let the eige-decomositio of L be as follows (2) L = r 0 () i=1 λ i u i u T i, where r 0 () is the rak of L. We assume that there exists a uiversal costat c 0 such that u i c0 for all i, ad L 1 1 is bouded by M which ca be show to be bouded by c 0 r 0. A similar icoherece assumtio o u i was used i Cadès ad Recht (2008). We further assume that (3) λ max (Σ O) M, ad λ mi (Σ O) 1/M for some uiversal costat M. As discussed i the aer, the goals i latet variable model selectio are to obtai the sig cosistecy for the sarse matrix S as well as the rak cosistecy for the low rak semi-ositive defiite matrix L. j=1

Deote the miimum magitude of ozero etries of S by θ, i.e., θ = mi i,j s ij 1 {s ij 0}, ad the miimum ozero eigevalue of L by σ, i.e., σ = mi 1 i r0 λ i. To obtai theoretical guaratees of cosistecy results for the model described i (1), (2) ad (3), i additio to the strog irreresetability coditio which seems to be difficult to check i ractice, the authors require the followig assumtios (by a traslatio of the coditios i the aer to this model) for θ, σ ad : (1) θ /, which is eeded eve whe s 0 () is costat; (2) σ s 3 0 () / uder the additioal strog assumtios o the Fisher iformatio matrix Σ O Σ O (see the footote for Corollary 4.2); (3) s 4 0 () /. However, for sarse grahical model selectio without latet variables, either l 1 -regularized maximum likelihood aroach (see Ravikumar et al. (2008)) or CLIME (see Cai, Liu ad Luo (2011)) ca be show to be sig cosistet if the miimum magitude ozero etry of cocetratio matrix θ is at the order of (log ) / whe M is bouded, which isires us to study rate-otimalites for this latet variables grahical model selectio roblem. I this discussio, we roose a rocedure to obtai a algebraically cosistet estimate of the latet variable Gaussia grahical model uder much weaker coditio o both θ ad σ. For examle, for a wide rage of s 0 (), we oly require θ is at the order of (log ) / ad σ is at the order of / to cosistetly estimate the suort of S ad the rak of L. That meas the regularized maximum likelihood aroach could be far from beig otimal, but we do t kow yet whether the sub-otimality is due to the rocedure or their theoretical aalysis. 2. Latet Variable Model Selectio Cosistecy. I this sectio, we roose a rocedure to obtai a algebraically cosistet estimate of the latet variable Gaussia grahical model. The coditio o θ to recover the suort of S is reduced to that i Cai, Liu ad Luo (2011) which studied sarse grahical model selectio without latet variables, ad the coditio o σ is just at a order of /, which is smaller tha s 3 0 () / assumed i the aer whe s 0 (). Whe M is bouded, our results ca be show to be rate-otimal by lower bouds stated i Remarks 2 ad 4 for which we are ot givig roofs due to the limitatio of the sace. 2.1. Sig Cosistecy Procedure of S. We roose a CLIME-like estimator of S by solvig the followig liear otimizatio roblem, mi S 1 subject to Σ OS I τ, S R, 3

4 where Σ O = ( σ ij) is the samle covariace matrix. The tuig arameter log τ is chose as τ = C 1 M for some large costat C 1. Let Ŝ1 = ŝ 1 ij be the solutio. The CLIME-like estimator Ŝ = (ŝ ij) is obtaied by symmetrizig Ŝ1 as follows, ŝ ij = ŝ ji = ŝ 1 ij1 { ŝ 1 ij ŝ 1 } ji +ŝ 1 ji 1 { ŝ 1 ij > ŝ 1 ji}. I other words, we take the oe with smaller magitude betwee ŝ 1 ij ad ŝ1 ji. We defie a thresholdig estimator S = ( s ij ) with (4) s ij = s ij 1 { s ij > 9M τ } to estimate the suort of S. Theorem 1 Suose that S U (s 0 (), M ), (5) (log )/ = o(1), ad L M τ. With robability greater tha 1 C s 6 for some costat C s deedig o M oly, we have Ŝ S 9M τ. Hece if the miimum magitude of ozero etries θ > 18M τ, we obtai the sig cosistecy sig S = sig (S ). I articular, if M is i the costat level, the to cosistetly recover the suort of S, we oly eed that θ (log )/. Proof. The roof is similar to the Theorem 7 i Cai, Liu ad Luo (2011). The subgaussia coditio with sectral orm uer boud M imlies that each emirical covariace σ ij satisfies the followig large deviatio result ( P ( σ ij σ ij > t) C s ex 8 ) C2 2 t 2, for t ϕ, where C s, C 2 ad ϕ oly deeds o M. See, for examle, Bickel ad Levia (2008). I articular for t = C 2 (log ) / which is less tha ϕ by our assumtio we have (6) P (Σ O Σ O > t) i,j P ( σ ij σ ij > t) 2 C s 8. Let A = {Σ O Σ O C 2 (log )/ }.

5 Equatio (6) imlies P (A) 1 C s 6. O evet A, we will show (7) (S L ) Ŝ1 8M τ, which immediately yield S Ŝ (S L ) Ŝ1 + L 8M τ + M τ = 9M τ. Now we establish Equatio (7). O evet A, for some large costat C 1 2C 2, the choice of τ yields (8) 2M Σ O Σ O τ. By the matrix l 1 orm assumtio, we could obtai that (9) (Σ O) 1 1 1 S 1 1 + L 1 1 2M. From (8) ad (9) we have Σ O (S L ) I = (Σ O Σ O) (Σ O) 1 Σ O Σ (Σ O O ) 1 1 1 τ, which imlies (10) Σ O (S L ) Σ OŜ1 Σ O (S L ) I + Σ OŜ1 I 2τ. From the defiitio of Ŝ1 we obtai that (11) Ŝ1 S L 1 1 2M, 1 1 which, together with Equatios (8) ad (10), imlies ( Ŝ1) ( ) Σ O (S L ) Σ O (S L ) Ŝ1 + (Σ O Σ O) (S L ) Ŝ1 2τ + Σ O Σ (S O L ) Ŝ1 2τ + 4M Σ O Σ O 4τ. 1 1 Thus we have (S L ) Ŝ1 (Σ O) 1 Σ 1 1 O ((S Ŝ1) L ) 8M τ.

6 Remark 1 By the choice of our τ ad the eige-decomositio of L, the coditio L M τ holds whe r 0 ()C 0 / C 1 M 2 (log ) /, i.e., 2 log r0 2 ()M 4. If M is slowly icreasig (for istace 1/4 τ for ay small τ > 0), the miimum requiremet θ M 2 (log ) / is weaker tha θ / required i Corollary 4.2. Furthermore, it ca be show that the otimal rate of miimum magitude of ozero etries for sig cosistecy is θ M (log )/as i the Cai, Liu ad Zhou (2012). Remark 2 Cai, Liu ad Zhou (2012) showed the miimum requiremet for θ, θ M (log )/ is ecessary for sig cosistecy for sarse cocetratio matrices. Let U S (c) deote the class of cocetratio matrices defied i (1) ad (2), satisfyig assumtio (5) ad θ > cm (log )/. We ca show that there exists some costat c 1 > 0 such that for all 0 < c < c 1, ) lim if P sig (Ŝ sig (S ) > 0 su (Ŝ, ˆL) U S (c) similar to Cai, Liu ad Zhou (2012). 2.2. Rak Cosistecy Procedure of L. I this sectio we roose a rocedure to estimate L ad its rak. We ote that with high robability Σ O is ivertible, the defie ˆL = (Σ O ) 1 S, where S is defied i (4). Deote the eige-decomositio of ˆL by { } i=1 λ i(ˆl)υ i υi T, ad let λ i( L) = λ i (ˆL)1 λ i (ˆL) > C 3 where costat C 3 will be secified later. Defie L = i=1 λ i( L)υ i υi T. The followig theorem shows that estimator L is a cosistet estimator of ) L uder the sectral orm ad with high robability rak (L ) = rak ( L. Theorem 2 Uder the coditios i Theorem 1, we assume that (12) 1 16 2M, ad M 2 2 s 0 () log. The there exists some costat C 3 such that ˆL L C3 with robability greater tha 1 2e C s 6. Hece if σ > 2C 3 ), we have rak (L ) = rak ( L with high robability.

Proof. From the Corollary 5.5 of the aer ad our assumtio o the samle size, we have ( P Σ O Σ O ) 128M 2 ex ( ). Note that λ mi (Σ O ) 1/M, ad 128M 1/ (2M) uder the assumtio (12), the λ mi (Σ O ) 1/ (2M) with high robability, which yields the same rate of covergece for the cocetratio matrix, sice (13) (Σ O) 1 (Σ O) 1 (Σ O ) 1 (Σ O ) 1 Σ O Σ O 2M 2 128M = 16 2M 3. From Theorem 1 we kow sig S = sig (S ), ad S S 9M τ with robability greater tha 1 C s 6. Sice B B 1 1 for ay symmetric matrix B, we the have (14) S S S S log s 0 () 9M τ = 9C 1 M 2 s 0 () 1 1. Equatios (13) ad (14), together the assumtio M 2 s 0 () 7 log, imly ˆL L (Σ O ) 1 (Σ O) 1 + S S 16 log 2M 3 +9C 1M 2 s 0 () C 3 with robability greater tha 1 2e C s 6. Remark 3 We should emhasize the fact that i order to cosistetly esti- mate the rak of L we eed oly that σ > 2C 3, which is smaller tha s 3 0 () required i the aer (see the footote for Corollary 4.2), as log as M 2 s 0 () log. I articular, we do t exlicitly costrai the rak r 0 (). Oe secial case is that M is costat ad s 0 () 1/2 τ for some small τ > 0, for which our requiremet is aer is at a order of 3(1/2 τ). but the assumtio i the

8 Remark 4 Let U L (c) deote the class of cocetratio matrices defied i (1), (2) ad (3), satisfyig assumtios (12), (5) ad σ > c. We ca show that there exists some costat c 2 > 0 such that for all 0 < c < c 2, ) lim if P rak (ˆL rak (L ) > 0. su (Ŝ, ˆL) U L (c) The roof of this lower boud is based o a modificatio of a lower boud argumet i a ersoal commuicatio of T. Toy Cai (2011). 3. Cocludig Remarks ad Further Questios. I this discussio we attemt to uderstad otimalities of results i the reset aer by studyig a relatively simle model. Our relimiary aalysis seems to idicate that their results i this aer are sub-otimal. I articular we ted to coclude that assumtios o θ ad σ i the aer ca be otetially very much weakeed. However it is ot clear to us whether the sub-otimality is due to the methodology or just its theoretical aalysis. We wat to emhasize that the relimiary results i this discussio ca be stregtheed, but for the urose of simlicity of the discussio we choose to reset weaker but simler results to hoefully shed some lights o uderstadig otimalities i estimatio. REFERENCES [1] Bickel, P. J. ad Levia, E. (2008). Regularized estimatio of large covariace matrices. A. Statist. 36 199-227. [2] Cai, T. T., Liu, W. ad Luo, X. (2011). A costraied l 1 miimizatio aroach to sarse recisio matrix estimatio. J. Amer. Statist. Assoc. 106 594-607. [3] Cai, T. T., Liu, W. ad Zhou, H. H. (2012). Otimal estimatio of large sarse recisio matrices. Mauscrit. [4] Cai, T. T. (2011). Persoal commuicatio. [5] Cadès, E. J. ad Recht, B. (2009). Exact matrix comletio via covex otimizatio. Foud. of Comut. Math. 9 717-772. [6] Meishause, N. ad Bühlma, P. (2006). High dimesioal grahs ad variable selectio with the Lasso. A. Statist. 34 1436-1462. [7] Ravikumar, P., Waiwright, M. J., Raskutti, G., ad Yu, B. (2008). High-dimesioal covariace estimatio by miimizig l 1 ealized log-determiat divergece. Prerit. Deartmet of Statistics, Yale Uiversity New Have, CT 06511 USA E-mail: zhao.re@yale.edu E-mail: huibi.zhou@yale.edu