DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. By Zhao Ren and Harrison H. Zhou Yale University

Submitted to the Aals of Statistics DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION By Zhao Re ad Harriso H. Zhou Yale Uiversity 1. Itroductio. We would like to cogratulate the authors for their refreshig cotributio to this high-dimesioal latet variables grahical model selectio roblem. The roblem of covariace ad cocetratio matrices is fudametally imortat i several classical statistical methodologies ad may alicatios. Recetly, sarse cocetratio matrices estimatio had received cosiderable attetio, artly due to its coectio to sarse structure learig for Gaussia grahical models. See, for examle, Meishause ad Bühlma (2006) ad Ravikumar et al. (2008). Cai, Liu & Zhou (2012) cosidered rate-otimal estimatio. The authors exteded the curret scoe to iclude latet variables. They assume that the fully observed Gaussia grahical model has a aturally sarse deedece grah. However, there are oly artial observatios available for which the grah is usually o loger sarse. Let X be ( + r) variate Gaussia with a sarse cocetratio matrix S(O,H). We oly observe X O, out of the whole + r variables, ad deote its covariace matrix by Σ O. I this case, usually the cocetratio matrix (Σ O ) 1 are ot sarse. Let S be the cocetratio matrix of observed variables coditioed o latet variables, which is a submatrix of S(O,H) ad hece has a sarse structure, ad let L be the summary of the margializatio over the latet variables ad its rak corresods to the umber of latet variables r for which we usually assume it is small. The authors observed (Σ O ) 1 ca be decomosed as the differece of the sarse matrix S ad the rak r matrix L, i.e., (Σ O ) 1 = S L. The followig traditioal wisdoms the authors aturally roosed a regularized maximum likelihood aroach to estimate both the sarse structure S ad the low rak art L, mi tr ((S L) (S,L):S L 0, L 0 Σ O) log det (S L) + χ (γ S 1 + tr (L)) where Σ O is the samle covariace matrix, S 1 = i,j s ij, ad γ ad χ are regularizatio tuig arameters. Here tr (L) is the trace of L. The The research was suorted i art by NSF Career Award DMS-0645676 ad NSF FRG Grat DMS-0854975. 1

2 otatio A 0 meas A is ositive defiite, ad A 0 deotes that A is o-egative. There is a obvious idetifiability roblem if we wat to estimate both the sarse ad low rak comoets. A matrix ca be both sarse ad low rak. By exlorig the geometric roerties of the taget saces for sarse ad low rak comoets, the authors gave a beautiful sufficiet coditio for idetifiability, ad the rovided very much ivolved theoretical justificatios based o the sufficiet coditio, which is beyod our ability to digest them i a short eriod of time i the sese that we do t fully uderstad why those techical assumtios were eeded i the aalysis of their aroach. Thus we decided to look at a relatively simle but otetially ractical model, with the hoe to still cature the essece of the roblem, ad see how well their regularized rocedure works. Let 1 1 deotes the matrix l 1 orm, i.e., S 1 1 = max 1 i j=1 s ij. We assume that S is i the followig uiformity class, (1) U (s 0 (), M ) = S = (s ij) : S 0, S 1 1 M, max 1 {s ij 0} s 0 () 1 i, where we allow s 0 () ad M to grow as ad icrease. This uiformity class was cosidered i Ravikumar et al. (2008) ad Cai, Liu ad Luo (2011). For the low rak matrix L, we assume that the effect of margializatio over the latet variables sreads out, i.e. the low rak matrix L has row/colum saces that are ot closely aliged with the coordiate axes to resolve the idetifiability roblem. Let the eige-decomositio of L be as follows (2) L = r 0 () i=1 λ i u i u T i, where r 0 () is the rak of L. We assume that there exists a uiversal costat c 0 such that u i c0 for all i, ad L 1 1 is bouded by M which ca be show to be bouded by c 0 r 0. A similar icoherece assumtio o u i was used i Cadès ad Recht (2008). We further assume that (3) λ max (Σ O) M, ad λ mi (Σ O) 1/M for some uiversal costat M. As discussed i the aer, the goals i latet variable model selectio are to obtai the sig cosistecy for the sarse matrix S as well as the rak cosistecy for the low rak semi-ositive defiite matrix L. j=1

Deote the miimum magitude of ozero etries of S by θ, i.e., θ = mi i,j s ij 1 {s ij 0}, ad the miimum ozero eigevalue of L by σ, i.e., σ = mi 1 i r0 λ i. To obtai theoretical guaratees of cosistecy results for the model described i (1), (2) ad (3), i additio to the strog irreresetability coditio which seems to be difficult to check i ractice, the authors require the followig assumtios (by a traslatio of the coditios i the aer to this model) for θ, σ ad : (1) θ /, which is eeded eve whe s 0 () is costat; (2) σ s 3 0 () / uder the additioal strog assumtios o the Fisher iformatio matrix Σ O Σ O (see the footote for Corollary 4.2); (3) s 4 0 () /. However, for sarse grahical model selectio without latet variables, either l 1 -regularized maximum likelihood aroach (see Ravikumar et al. (2008)) or CLIME (see Cai, Liu ad Luo (2011)) ca be show to be sig cosistet if the miimum magitude ozero etry of cocetratio matrix θ is at the order of (log ) / whe M is bouded, which isires us to study rate-otimalites for this latet variables grahical model selectio roblem. I this discussio, we roose a rocedure to obtai a algebraically cosistet estimate of the latet variable Gaussia grahical model uder much weaker coditio o both θ ad σ. For examle, for a wide rage of s 0 (), we oly require θ is at the order of (log ) / ad σ is at the order of / to cosistetly estimate the suort of S ad the rak of L. That meas the regularized maximum likelihood aroach could be far from beig otimal, but we do t kow yet whether the sub-otimality is due to the rocedure or their theoretical aalysis. 2. Latet Variable Model Selectio Cosistecy. I this sectio, we roose a rocedure to obtai a algebraically cosistet estimate of the latet variable Gaussia grahical model. The coditio o θ to recover the suort of S is reduced to that i Cai, Liu ad Luo (2011) which studied sarse grahical model selectio without latet variables, ad the coditio o σ is just at a order of /, which is smaller tha s 3 0 () / assumed i the aer whe s 0 (). Whe M is bouded, our results ca be show to be rate-otimal by lower bouds stated i Remarks 2 ad 4 for which we are ot givig roofs due to the limitatio of the sace. 2.1. Sig Cosistecy Procedure of S. We roose a CLIME-like estimator of S by solvig the followig liear otimizatio roblem, mi S 1 subject to Σ OS I τ, S R, 3

4 where Σ O = ( σ ij) is the samle covariace matrix. The tuig arameter log τ is chose as τ = C 1 M for some large costat C 1. Let Ŝ1 = ŝ 1 ij be the solutio. The CLIME-like estimator Ŝ = (ŝ ij) is obtaied by symmetrizig Ŝ1 as follows, ŝ ij = ŝ ji = ŝ 1 ij1 { ŝ 1 ij ŝ 1 } ji +ŝ 1 ji 1 { ŝ 1 ij > ŝ 1 ji}. I other words, we take the oe with smaller magitude betwee ŝ 1 ij ad ŝ1 ji. We defie a thresholdig estimator S = ( s ij ) with (4) s ij = s ij 1 { s ij > 9M τ } to estimate the suort of S. Theorem 1 Suose that S U (s 0 (), M ), (5) (log )/ = o(1), ad L M τ. With robability greater tha 1 C s 6 for some costat C s deedig o M oly, we have Ŝ S 9M τ. Hece if the miimum magitude of ozero etries θ > 18M τ, we obtai the sig cosistecy sig S = sig (S ). I articular, if M is i the costat level, the to cosistetly recover the suort of S, we oly eed that θ (log )/. Proof. The roof is similar to the Theorem 7 i Cai, Liu ad Luo (2011). The subgaussia coditio with sectral orm uer boud M imlies that each emirical covariace σ ij satisfies the followig large deviatio result ( P ( σ ij σ ij > t) C s ex 8 ) C2 2 t 2, for t ϕ, where C s, C 2 ad ϕ oly deeds o M. See, for examle, Bickel ad Levia (2008). I articular for t = C 2 (log ) / which is less tha ϕ by our assumtio we have (6) P (Σ O Σ O > t) i,j P ( σ ij σ ij > t) 2 C s 8. Let A = {Σ O Σ O C 2 (log )/ }.

5 Equatio (6) imlies P (A) 1 C s 6. O evet A, we will show (7) (S L ) Ŝ1 8M τ, which immediately yield S Ŝ (S L ) Ŝ1 + L 8M τ + M τ = 9M τ. Now we establish Equatio (7). O evet A, for some large costat C 1 2C 2, the choice of τ yields (8) 2M Σ O Σ O τ. By the matrix l 1 orm assumtio, we could obtai that (9) (Σ O) 1 1 1 S 1 1 + L 1 1 2M. From (8) ad (9) we have Σ O (S L ) I = (Σ O Σ O) (Σ O) 1 Σ O Σ (Σ O O ) 1 1 1 τ, which imlies (10) Σ O (S L ) Σ OŜ1 Σ O (S L ) I + Σ OŜ1 I 2τ. From the defiitio of Ŝ1 we obtai that (11) Ŝ1 S L 1 1 2M, 1 1 which, together with Equatios (8) ad (10), imlies ( Ŝ1) ( ) Σ O (S L ) Σ O (S L ) Ŝ1 + (Σ O Σ O) (S L ) Ŝ1 2τ + Σ O Σ (S O L ) Ŝ1 2τ + 4M Σ O Σ O 4τ. 1 1 Thus we have (S L ) Ŝ1 (Σ O) 1 Σ 1 1 O ((S Ŝ1) L ) 8M τ.

6 Remark 1 By the choice of our τ ad the eige-decomositio of L, the coditio L M τ holds whe r 0 ()C 0 / C 1 M 2 (log ) /, i.e., 2 log r0 2 ()M 4. If M is slowly icreasig (for istace 1/4 τ for ay small τ > 0), the miimum requiremet θ M 2 (log ) / is weaker tha θ / required i Corollary 4.2. Furthermore, it ca be show that the otimal rate of miimum magitude of ozero etries for sig cosistecy is θ M (log )/as i the Cai, Liu ad Zhou (2012). Remark 2 Cai, Liu ad Zhou (2012) showed the miimum requiremet for θ, θ M (log )/ is ecessary for sig cosistecy for sarse cocetratio matrices. Let U S (c) deote the class of cocetratio matrices defied i (1) ad (2), satisfyig assumtio (5) ad θ > cm (log )/. We ca show that there exists some costat c 1 > 0 such that for all 0 < c < c 1, ) lim if P sig (Ŝ sig (S ) > 0 su (Ŝ, ˆL) U S (c) similar to Cai, Liu ad Zhou (2012). 2.2. Rak Cosistecy Procedure of L. I this sectio we roose a rocedure to estimate L ad its rak. We ote that with high robability Σ O is ivertible, the defie ˆL = (Σ O ) 1 S, where S is defied i (4). Deote the eige-decomositio of ˆL by { } i=1 λ i(ˆl)υ i υi T, ad let λ i( L) = λ i (ˆL)1 λ i (ˆL) > C 3 where costat C 3 will be secified later. Defie L = i=1 λ i( L)υ i υi T. The followig theorem shows that estimator L is a cosistet estimator of ) L uder the sectral orm ad with high robability rak (L ) = rak ( L. Theorem 2 Uder the coditios i Theorem 1, we assume that (12) 1 16 2M, ad M 2 2 s 0 () log. The there exists some costat C 3 such that ˆL L C3 with robability greater tha 1 2e C s 6. Hece if σ > 2C 3 ), we have rak (L ) = rak ( L with high robability.

Proof. From the Corollary 5.5 of the aer ad our assumtio o the samle size, we have ( P Σ O Σ O ) 128M 2 ex ( ). Note that λ mi (Σ O ) 1/M, ad 128M 1/ (2M) uder the assumtio (12), the λ mi (Σ O ) 1/ (2M) with high robability, which yields the same rate of covergece for the cocetratio matrix, sice (13) (Σ O) 1 (Σ O) 1 (Σ O ) 1 (Σ O ) 1 Σ O Σ O 2M 2 128M = 16 2M 3. From Theorem 1 we kow sig S = sig (S ), ad S S 9M τ with robability greater tha 1 C s 6. Sice B B 1 1 for ay symmetric matrix B, we the have (14) S S S S log s 0 () 9M τ = 9C 1 M 2 s 0 () 1 1. Equatios (13) ad (14), together the assumtio M 2 s 0 () 7 log, imly ˆL L (Σ O ) 1 (Σ O) 1 + S S 16 log 2M 3 +9C 1M 2 s 0 () C 3 with robability greater tha 1 2e C s 6. Remark 3 We should emhasize the fact that i order to cosistetly esti- mate the rak of L we eed oly that σ > 2C 3, which is smaller tha s 3 0 () required i the aer (see the footote for Corollary 4.2), as log as M 2 s 0 () log. I articular, we do t exlicitly costrai the rak r 0 (). Oe secial case is that M is costat ad s 0 () 1/2 τ for some small τ > 0, for which our requiremet is aer is at a order of 3(1/2 τ). but the assumtio i the

8 Remark 4 Let U L (c) deote the class of cocetratio matrices defied i (1), (2) ad (3), satisfyig assumtios (12), (5) ad σ > c. We ca show that there exists some costat c 2 > 0 such that for all 0 < c < c 2, ) lim if P rak (ˆL rak (L ) > 0. su (Ŝ, ˆL) U L (c) The roof of this lower boud is based o a modificatio of a lower boud argumet i a ersoal commuicatio of T. Toy Cai (2011). 3. Cocludig Remarks ad Further Questios. I this discussio we attemt to uderstad otimalities of results i the reset aer by studyig a relatively simle model. Our relimiary aalysis seems to idicate that their results i this aer are sub-otimal. I articular we ted to coclude that assumtios o θ ad σ i the aer ca be otetially very much weakeed. However it is ot clear to us whether the sub-otimality is due to the methodology or just its theoretical aalysis. We wat to emhasize that the relimiary results i this discussio ca be stregtheed, but for the urose of simlicity of the discussio we choose to reset weaker but simler results to hoefully shed some lights o uderstadig otimalities i estimatio. REFERENCES [1] Bickel, P. J. ad Levia, E. (2008). Regularized estimatio of large covariace matrices. A. Statist. 36 199-227. [2] Cai, T. T., Liu, W. ad Luo, X. (2011). A costraied l 1 miimizatio aroach to sarse recisio matrix estimatio. J. Amer. Statist. Assoc. 106 594-607. [3] Cai, T. T., Liu, W. ad Zhou, H. H. (2012). Otimal estimatio of large sarse recisio matrices. Mauscrit. [4] Cai, T. T. (2011). Persoal commuicatio. [5] Cadès, E. J. ad Recht, B. (2009). Exact matrix comletio via covex otimizatio. Foud. of Comut. Math. 9 717-772. [6] Meishause, N. ad Bühlma, P. (2006). High dimesioal grahs ad variable selectio with the Lasso. A. Statist. 34 1436-1462. [7] Ravikumar, P., Waiwright, M. J., Raskutti, G., ad Yu, B. (2008). High-dimesioal covariace estimatio by miimizig l 1 ealized log-determiat divergece. Prerit. Deartmet of Statistics, Yale Uiversity New Have, CT 06511 USA E-mail: zhao.re@yale.edu E-mail: huibi.zhou@yale.edu