arxiv: v3 [math.st] 16 Jun 2015

Size: px

Start display at page:

Download "arxiv: v3 [math.st] 16 Jun 2015"

Flora Colleen Griffith
5 years ago
Views:

1 Geometric Iferece for Geeral High-Dimesioal Liear Iverse Problems T. Toy Cai, Tegyua Liag ad Alexader Rakhli arxiv: v3 [math.st] 16 Ju 2015 Departmet of Statistics The Wharto School Uiversity of Pesylvaia Abstract This paper presets a uified geometric framework for the statistical aalysis of a geeral ill-posed liear iverse model which icludes as special cases oisy compressed sesig, sig vector recovery, trace regressio, orthogoal matrix estimatio, ad oisy matrix completio. We propose computatioally feasible covex programs for statistical iferece icludig estimatio, cofidece itervals ad hypothesis testig. A theoretical framework is developed to characterize the local estimatio rate of covergece ad to provide statistical iferece guaratees. Our results are built based o the local coic geometry ad duality. The difficulty of statistical iferece is captured by the geometric characterizatio of the local taget coe through the Gaussia width ad Sudakov mioratio estimate. 1 Itroductio Drive by a wide rage of applicatios, high-dimesioal liear iverse problems such as oisy compressed sesig, sig vector recovery, trace regressio, orthogoal matrix estimatio, ad oisy matrix completio have draw sigificat recet iterest i several fields, icludig statistics, applied mathematics, computer sciece, ad electrical egieerig. These problems are ofte studied i a case-bycase fashio ad the focus so far is maily o estimatio. Although similarities i the techical aalyses have bee suggested heuristically, a geeral uified theory for statistical iferece icludig estimatio, cofidece itervals ad hypothesis testig is still yet to be developed. The research of Toy Cai was supported i part by NSF Grats DMS ad DMS , ad NIH Grat R01 CA Tegyua Liag ackowledges the support of Wikelma Fellowship. Alexader Rakhli gratefully ackowledges the support of NSF uder grat CAREER DMS

2 I this paper, we cosider a geeral liear iverse model Y = X (M) + Z (1) where M R p is the vectorized versio of the parameter of iterest, X : R p R is a liear operator, ad Z R is a oise vector. We observe (X,Y ) ad wish to recover the ukow parameter M. A particular focus is o the high-dimesioal settig where the ambiet dimesio p of the parameter M is much larger tha the sample size, i.e., the dimesio of Y. I such a settig, the parameter of iterest M is commoly assumed to have, with respect to a give atom set A, a certai low complexity structure which captures the true dimesio of the statistical estimatio problem. A umber of high-dimesioal iferece problems actively studied i the recet literature ca be see as special cases of this geeral liear iverse model. High Dimesio Liear Regressio/Noisy Compressed Sesig. I high-dimesioal liear regressio, oe observes (X,Y ) with Y = X M + Z, (2) where Y R, X R p with p, M R p is a sparse sigal, ad Z R is a oise vector. The goal is to recover the ukow sparse sigal of iterest M R p based o the observatio (X,Y ) through a efficiet algorithm. May estimatio methods icludig l 1 -regularized procedures such as the Lasso ad Datzig Selector have bee developed ad aalyzed. See, for example, Tibshirai (1996); Cadès ad Tao (2007); Bickel et al. (2009); Bühlma ad va de Geer (2011) ad the refereces therei. Cofidece itervals ad hypothesis testig for high-dimesioal liear regressio have also bee actively studied i the last few years. A commo approach is to first costruct a de-biased Lasso or de-biased scaled-lasso estimator ad the make iferece based o the asymptotic ormality of low-dimesioal fuctioals of the de-biased estimator. See, for example, Bühlma (2013); Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014). Trace Regressio. Accurate recovery of a low-rak matrix based o a small umber of liear measuremets has a wide rage of applicatios ad has draw much recet attetio i several fields. See, for example, Recht et al. (2010); Koltchiskii (2011); Rohde et al. (2011); Koltchiskii et al. (2011); Cadès ad Pla (2011). I trace regressio, oe observes (X i,y i ), i = 1,..., with Y i = Tr(X T i M) + Z i, (3) where Y i R, X i R p 1 p 2 are measuremet matrices, ad Z i are oise. The goal is to recover the ukow matrix M R p 1 p 2 which is assumed to be of low rak. Here the dimesio of the parameter M is p p 1 p 2. A umber of costraied ad pealized uclear miimizatio methods have bee itroduced ad studied i both the oiseless ad oisy settigs. See the aforemetioed refereces for further details. 2

3 Sig Vector Recovery. The settig of sig vector recovery is similar to the oe for the high-dimesioal regressio except the sigal of iterest is a sig vector. More specifically, i sig vector recovery, oe observes (X,Y ) with Y = X M + Z (4) where Y R, X R p, M {+1, 1} p is a sig vector, ad Z R is a oise vector. The goal is to recover the ukow sig sigal of iterest M. Exhaustive search over the parameter set is computatioally prohibitive. The oiseless case of (4), kow as the geeralized multi-kapsack problem (Khuri et al., 1994; Magasaria ad Recht, 2011), ca be solved through a iteger program which is kow to be computatioally difficult eve for checkig the uiqueess of the solutio, see (Prokopyev et al., 2005; Valiat ad Vazirai, 1986). Orthogoal Matrix Recovery. I some applicatios the matrix of iterest i trace regressio is kow to be a orthogoal/rotatio matrix (Te Berge, 1977; Gower ad Dijksterhuis, 2004). More specifically, i orthogoal matrix recovery, we observe (X i,y i ), i = 1,..., as i the trace regressio model (3) where X i R m m are measuremet matrices ad M R m m is a orthogoal matrix. The goal is to recover the ukow M usig a efficiet algorithm. Computatioal difficulties come i because of the o-covex costrait. See Chadrasekara et al. (2012). Matrix Completio. Matrix completio aims to recover a low-rak matrix based o observatios of a subset of etries. It ca be viewed as a special case of the trace regressio model (3) with the measuremet matrices of the form e ik e for k = 1,...,, where e j k i is the i th stadard basis vector, ad i 1,,i ad j 1,, j are radomly draw with replacemet from {1,, p 1 } ad {1,, p 2 }, respectively. That is, the idividual etries of the matrix M are observed at radomly selected positios. The goal is to recover the low-rak matrix M based o the partial observatios Y. See Cadès ad Recht (2009); Recht (2011) for matrix recovery i the oiseless case ad Cades ad Pla (2010); Chatterjee (2012); Cai ad Zhou (2013) for the oisy case. Other high-dimesioal iferece problems that are closely coected to the structured liear iverse model (1) iclude high-dimesioal covariace matrix estimatio where the covariace matrix of iterest is baded/sparse/spiked (Karoui, 2008; Cai et al., 2010, 2013, 2014), sparse ad low rak decompositio i robust pricipal compoet aalysis (Cadès et al., 2011), ad sparse oise ad sparse parameter i demixig problem (Ameluxe et al., 2013), to ame a few. We will discuss the coectios i details i Sectio There are several fudametal questios for this geeral class of high-dimesioal liear iverse problems. Statistical Questios: How well ca the parameter M be estimated? What is the itrisic difficulty of the estimatio problem? How to provide iferece guaratees for M, i.e., cofidece itervals ad hypothesis testig, i geeral? 3

4 Computatioal Questios: Are there computatioally efficiet (polyomial time complexity) algorithms that are also sharp i terms of statistical estimatio ad iferece? 1.1 High-Dimesioal Liear Iverse Problems Liear iverse problems have bee well studied i the classical settig where the parameter of iterest lies i a covex set. See, for example, Tikhoov ad Arsei (1977), O Sulliva (1986), ad Johstoe ad Silverma (1990). I particular, for estimatio of a liear fuctioal over a covex parameter space, Dooho (1994) developed a elegat geometric characterizatio of the miimax theory i terms of the modulus of cotiuity. However, the theory relies critically o the covexity assumptio of the parameter space. As show i Cai ad Low (2004a,b), the behavior of the fuctioal estimatio ad cofidece iterval problems is sigificatly differet eve whe the parameter space is the uio of two covex sets. For the high-dimesioal liear iverse problems cosidered i the preset paper, the parameter space is highly o-covex ad the theory ad techiques developed i the classical settig are ot readily applicable. For high-dimesioal liear iverse problems such as those metioed earlier, the parameter space has low-complexity ad exhaustive search ofte leads to the optimal solutio i terms of statistical accuracy. However, it is computatioally prohibitive ad requires the prior kowledge of the true low complexity. I recet years, relaxig the problem to a covex program such as l 1 or uclear orm miimizatio ad the solvig it with optimizatio techiques have prove to be a powerful approach i idividual cases. Uified approaches to sigal recovery recetly appeared both i the applied mathematics literature (Chadrasekara et al., 2012; Ameluxe et al., 2013; Oymak et al., 2013) ad i the statistics literature (Negahba et al., 2012). Oymak et al. (2013) studied the geeralized LASSO problem through coic geometry with a simple boud i terms of the l 2 orm of the oise vector. (Chadrasekara et al., 2012) itroduced the otio of atomic orm to defie a low complexity structure ad showed that Gaussia width captures the miimum sample size required to esure recovery. Ameluxe et al. (2013) studied the phase trasitio for the covex algorithms for a wide rage of problems. These papers suggested that the geometry of the local taget coe determies the miimum umber of samples to esure successful recovery i the oiseless or determiistic oise settigs. Negahba et al. (2012) studied the regularized- M estimatio with a decomposable orm pealty i the additive Gaussia oise settig. Aother lie of research is focused o a detailed aalysis of the Empirical Risk Miimizatio (ERM) (Lecué ad Medelso, 2013). Here, the objective fuctio is the excess risk for the squared error loss. The excess risk is show to have the rate of 1/2 or 1, i terms of the sample size. The aalysis is based o the empirical processes idexed by the geeral subgaussia fuctioal classes, with a proper localizatio radius aroud the best parameter. I additio to covexity, the ERM requires the prior kowledge o the size of the bouded parameter set of iterest. This kowledge is ot eeded for 4

5 the algorithm we propose i the preset paper. Compared to estimatio, there is a paucity of methods ad theoretical results for cofidece itervals ad hypothesis testig for these liear iverse models. Specifically for high-dimesioal liear regressio, cofidece itervals ad sigificace testig have draw icreasig recet attetio. Bühlma (2013) studied a bias correctio method based o the ridge estimatio, while Zhag ad Zhag (2014) proposed bias correctio via score vector usig scaled Lasso as the iitial estimator. va de Geer et al. (2014); Javamard ad Motaari (2014) focused o de-sparsifyig the Lasso via costructig a ear iverse of the Gram matrix, oe uses ode-wise Lasso while the other uses a l costraied quadratic programig, with similar theoretical guaratees. To the best of our kowledge, iferece procedures for other high-dimesioal liear iverse models are yet to be developed. 1.2 Geometric Characterizatio of Liear Iverse Problems Uder the liear iverse model (1), the parameter M is assumed to have certai low complexity structure with respect to a give atom set i a high-dimesioal Euclidea space, which itroduces a o-covex costrait. The o-covex costrait poses difficulty for the iverse problem. However, proper covex relaxatio based o the geeral atom structure provides a computatioally feasible solutio. Our goal is to recover ad make iferece o the parameter M based o the observatio (X,Y ) efficietly. This problem ca also be framed i the laguage of geometric fuctioal aalysis (Ledoux ad Talagrad, 1991; Vershyi, 2011). For poit estimatio, we are iterested i how the local covex geometry aroud the true parameter affects the estimatio procedure ad the itrisic estimatio difficulty, i terms of the local upper boud ad the local miimax lower boud respectively. Note that local taget coe plays a key role i our aalysis. For statistical iferece, we develop geeral procedures iduced by the covex geometry, which aswers iferetial questios such as cofidece itervals ad hypothesis testig efficietly. We are also iterested i the sample size coditio iduced by the local covex geometry for valid iferece guaratees. Complexity measures such as Gaussia width ad Rademacher complexity are well studied i the empirical processes theory (Ledoux ad Talagrad, 1991; Talagrad, 1996), ad are kow to capture the difficulty of the estimatio problem. Coverig/Packig etropy ad volume ratio (Yag ad Barro, 1999; Vershyi, 2011; Ma ad Wu, 2013) are also widely used i geometric fuctioal aalysis to measure the complexity. I this paper, we show how these geometric quatities affect the computatioally efficiet estimatio/iferece procedure, as well as the itrisic difficulty of the estimatio/iferece problem. Our mai result ca be summarized as follows. We propose uified covex algorithms for estimatio ad iferece, ad the aalyze the theoretical properties for these algorithms. O the local taget coe T A (M) (the formal defiitio is give i (8), ad B p 2 below deotes Euclidea ball i Rp ), geometric quatities such as the Gaussia width w(b p 2 T A (M)), Sudakov mioratio estimate e(b p 2 T A (M)), ad volume ratio v(b p 2 T A (M)) (defied i Sectio 2.2) capture the rate of covergece of the liear 5

6 iverse problem. I terms of the upper boud, with overwhelmig probability, if w 2 (B p 2 T A (M)), the estimatio error uder l 2 orm for our algorithm is of the rate σ γ A (M)w(X A ) where γ A (M) is the local asphericity ratio defied i (15). The miimax lower boud for estimatio uder l 2 orm over the local taget coe T A (M)satisfies [ p e(b 2 σ T A (M)) v(b p 2 T A (M)) ]. For statistical iferece, we establish valid asymptotic ormality for ay low-dimesioal liear fuctioal of the parameter M uder the coditio γ 2 A lim (M)w 2 (X A ) = 0,,p which ca be compared to the coditio for poit estimatio cosistecy γ A (M)w(X A ) lim = 0.,p We remark o the critical differece o the sufficiet coditios betwee valid iferece ad estimatio cosistecy - more striget coditio o sample size is required for iferece beyod estimatio. Ituitively, statistical iferece is purely geometrized by Gaussia width ad Sudakov mioratio estimate. 1.3 Our Cotributios The mai cotributios of the preset paper are two-fold. Uified covex algorithms for estimatio ad iferece. We propose a geeral computatioally feasible covex program that provides ear optimal rate of covergece simultaeously for a collectio of high-dimesioal liear iverse problems. We also provide a geeral covex feasibility program that leads to iferece guaratees for ay fiite liear cotrast, such as cofidece itervals ad hypothesis testig. Local geometric theory: Upper ad lower bouds, cofidece itervals ad hypothesis testig. A uified theoretical framework is provided for aalyzig high-dimesioal liear iverse problems based o the local coic geometry ad duality. The poit estimatio ad statistical iferece are adaptive i the sese that the difficulty (rate of covergece, coditios o sample size, etc.) automatically adapts to the low complexity structure of the true parameter. Both the iferece guaratee ad estimatio cosistecy are closely related ad rely o coditios iduced by the local coic geometry. It is show that the miimax lower boud for estimatio over the local taget coe is captured by the Sudakov mioratio estimate or volume ratio. The results geometrize statistical iferece for geeral liear iverse problems with low complexity structure. 6

7 1.4 Orgaizatio of the Paper The rest of the paper is structured as follows. I Sectio 2, after otatio, defiitios, ad basic covex geometry are reviewed, we formally preset covex programs for recoverig the parameter M, ad for providig iferece guaratees for M, based o the observatio (X, Y ). The properties of the proposed procedures are the studied i Sectio 3. Uder the Gaussia settig, a geometric theory is developed i terms of the local upper boud, the miimax lower boud as well as the cofidece itervals ad hypothesis testig. Applicatios to particular high-dimesioal estimatio problems are also icluded at the ed of this sectio. Sectio 4 exteds the geometric theory beyod Gaussia. Relatios betwee the upper ad lower bouds are discussed. Further discussios appear i Sectio 5, ad the proofs of the mai results are give i Sectio 6 ad Appedix A ad B. 2 Prelimiaries ad Algorithms We review i this sectio otatio ad defiitios that will be used i the rest of the paper. I particular, we itroduce basics of covex geometry icludig importat geometric quatities that will be show to be istrumetal i characterizig the difficulty for statistical estimatio ad iferece i later sectios. We the collect some kow results o the complexity measures, Gaussia width, Sudakov estimate ad volume ratio, that will be used repeatedly later. Fially, we will formally itroduce our geeral estimatio ad iferece programs based o the covex geometry ad duality. I this paper, we use lq to deote the l q orm of a vector ad use B p 2 to deote the uit Euclidea ball i R p. For a matrix M, deote by M F, M, ad M the Frobeius orm, uclear orm, ad spectral orm of M respectively. Whe there is o cofusio, we also deote M F = M l2 for a matrix M. For a vector V R p, deote its traspose by V. The ier product o vectors is defied as usual V 1,V 2 = V 1 V 2. For matrices M 1, M 2 = Tr(M 1 M 2) = Vec(M 1 ) Vec(M 2 ), where Vec(M) R pq deotes the vectorized versio of matrix M R p q. X : R p R deotes a liear operator from R p to R. Followig the otatio above, M R q p is the adjoit (traspose) matrix of M ad X : R R p is the adjoit operator of X such that X (V 1 ),V 2 = V 1,X (V 2 ). For a covex compact set K i a metric space with the metric d, we say that S K is a ɛ-coverig set if x K, y S such that d(x, y) < ɛ. Ad we say that S K is a ɛ-packig set if x, y S, x y, d(x, y) ɛ. The ɛ-etropy for a covex compact set K with respect to the metric d is deoted i the followig way: ɛ-packig etropy logm (K,ɛ,d) is the logarithm cardiality of the largest ɛ-packig set, ad ɛ-coverig etropy logn (K,ɛ,d) is the logarithm cardiality of the smallest ɛ-coverig set with respect to metric d. A well kow result is M (K,2ɛ,d) N (K,ɛ,d) M (K,ɛ,d). Whe the metric d is the usual Euclidea distace, we will omit d i M (K,ɛ,d) ad N (K,ɛ,d) ad simply write M (K,ɛ) ad N (K,ɛ). For two sequeces of positive umbers {a } ad {b }, we deote a b if there exists a costat c 0 such that a b c 0 for all ad a b if there exists a costat C 0 such that a b C 0 for all. We write 7

8 a b if a b ad a b. Throughout the paper, c,c,c 0,C 0 deote costats that may vary from place to place. 2.1 Basic Covex Geometry We cosider the liear iverse model (1) i the high-dimesioal settig where the dimesio p ca possibly be much larger tha the sample size ad the parameter of iterest M lies i a certai low complexity space. Examples iclude sparsity i oisy compressed sesig ad low rak i trace regressio ad matrix completio. The liear operator X i the model (1) ca be viewed as a matrix X R p. Without loss of geerality, we assume X is stadardized to have uit colum l 2 orm. The oise vector Z R is assumed to have the oise level σ/ ad the covariace matrix σ2 I. The otio of low complexity is based o a collectio of basic atoms. We deote the collectio of these basic atoms as a atom set A, either coutable or ucoutable, as illustrated i Figure 1. A parameter M is of complexity k i terms of the atoms i A if M ca be expressed as a liear combiatio of at most k atoms i A, i.e., there exists a decompositio M = c a (M) a, where 1 {ca (M) 0} k a A a A kmk A cov(a) cov(a) M A Figure 1: Atom set illustratio. The red dots deote atoms. This particular example illustrates the atoms beig basis vectors for sparse regressio. Figure 2: Atomic orm illustratio. The red dashed lie deotes the covex hull of atoms set. The blue dashed lie deotes the scaled covex hull where M lies i. I covex geometry (Pisier, 1999), the Mikowski fuctioal (gauge) of a symmetric covex body K 8

9 is defied as x K = if{t > 0 : x tk }. Let A be a collectio of atoms that is a compact subset of R p. We assume that the elemets of A are extreme poits of the covex hull cov(a ) (i the sese that for ay x R p, sup{ x, a : a A } = sup{ x, a : a cov(a )}). The atomic orm x A for ay x R p is defied as the gauge of cov(a ) (see Figure 2): x A = if{t > 0 : x t cov(a )}. As oted i Chadrasekara et al. (2012), the atomic orm ca also be writte as { x A = if c a : x = } c a a, c a 0. (5) a A a A The dual orm of this atomic orm is defied i the followig way (sice the atoms i A are the extreme poits of cov(a )), x A = sup{ x, a : a A } = sup{ x, a : a A 1}. (6) We have the followig ( Cauchy-Schwarz ) symmetric relatio for the orm ad its dual x, y x A y A. (7) It is clear that the uit ball with respect to the atomic orm A is the covex hull of the set of atoms A. The taget coe at x with respect to the scaled uit ball x A cov(a ) is defied to be (see Figures 3 ad 4) T A (x) = coe{h : x + h A x A }. (8) Also kow as a recessio coe, T A (x) is the collectio of directios where the atomic orm becomes smaller. This taget coe T A (x) determies the geometric property of the eighborhood aroud the true parameter M, ad thus the complexity of this coe will affect the difficulty of the recovery problem. The coe is ubouded, but we ca look at the coe itersected with the uit ball B p 2 T A (M) i aalyzig the complexity of the coe. Figure 3 provides a ituitive illustratio where the red shaded area is the scaled atomic orm ball, M is the true parameter, the black arrow deotes oe vector iside the taget coe, ad the regio eclosed by the blue dashed lies is the T A (M). I order to better illustrate the geeral model ad otio of low complexity, it is helpful to look at the atom set, atomic orm ad taget coe geometry i a few examples. Example 1 For sparse sigal recovery i high-dimesioal liear regressio, the atom set cosists of the uit basis vectors {±e i }, the atomic orm is the vector l 1 orm, ad its dual orm is the vector l 9

kmk A cov(a) M 1 M + h M M 2 h T A (M) M 3 Figure 3: Taget coe geeral illustratio 2D. The red shaped area is the scaled covex hull of atom set. The blue dashed lie forms the taget coe at M.

10 kmk A cov(a) M 1 M + h M M 2 h T A (M) M 3 Figure 3: Taget coe geeral illustratio 2D. The red shaped area is the scaled covex hull of atom set. The blue dashed lie forms the taget coe at M. Black arrow deotes the possible directios iside the coe. Figure 4: Taget coe illustratio 3D for sparse regressio. For three possible locatios M i,1 i 3, the taget coe are differet, with coes becomig more complex as i icreases. orm. The covex hull cov(a ) is called the cross-polytope. Figure 4 illustrates this taget coe for 3D l 1 orm ball for 3 differet cases T A (M i ),1 i 3. The agle or complexity of the local taget coe determies the difficulty of recovery. Most of the previous work showed that the algebraic characterizatio (sparsity) of the parameter space drives the global rate, ad we are arguig that the geometric characterizatio through the local taget coe provides a ituitive ad refied local approach to highdimesioal liear iverse problem. Example 2 I trace regressio ad matrix completio, the goal is to recover low rak matrices. I such settigs, the atom set cosists of the rak oe matrices (matrix maifold) A = {uv : u l2 = 1, v l2 = 1} ad the atomic orm is the uclear orm ad the dual orm is the spectral orm. The covex hull cov(a ) is called the uclear orm ball of matrices. The positio of the true parameter o the scaled uclear orm ball determies the geometry of the local taget coe, thus affectig the estimatio difficulty. Example 3 I iteger programmig, oe would like to recover the sig vectors whose etries take o values ±1. The atom set is all sig vectors (cardiality 2 p ) ad the covex hull cov(a ) is the hypercube. Taget coes for each parameter have the same structure i this case. Example 4 I orthogoal matrix recovery, the matrix of iterest is costraied to be orthogoal. I this 10

11 case, the atom set is all orthogoal matrices ad the covex hull cov(a ) is the spectral orm ball. Similar to sig vector recovery, the local taget coes for each orthogoal matrix share similar geometric property. 2.2 Gaussia Width, Sudakov Estimate, ad Other Geometric Quatities Our theoretical aalysis relies o several key geometric quatities. We first itroduce two complexity measures, the Gaussia width ad Sudakov estimate. Defiitio 1 (Gaussia Width) For a compact set K R p, the Gaussia width is defied as where g N (0, I p ) is the stadard multivariate Gaussia vector. ] w(k ) := E g [sup g, v. (9) v K Gaussia width quatifies the probability that a radomly orieted subspace misses a covex subset. It was itroduced i Gordo s aalysis (Gordo, 1988), ad was show recetly to play a crucial rule i liear iverse problems i various oiseless or determiistic oise settigs, see, for example, Ameluxe et al. (2013). Explicit upper bouds o the Gaussia width for differet covex sets have bee give i Chadrasekara et al. (2012); Ameluxe et al. (2013). For example, if M R p is a s sparse vector, w(b p 2 T A (M)) s log p/s. Whe M R p q is a rak-r matrix, w(b p 2 T A (M)) r (p + q r ). For sig vector i R p, w(b p 2 T A (M)) p, while for orthogoal matrix i R m m, w(b p 2 T A (M)) m(m 1). See Sectio 3.4 propositios i Chadrasekara et al. (2012) for detailed calculatios. The Gaussia width as a complexity measure of the local taget coe will be used i the upper boud aalysis i Sectios 3 ad 4. Defiitio 2 (Sudakov Mioratio Estimate) The Sudakov estimate of a compact set K R p is defied as e(k ) := sup ɛ ɛ logn (K,ɛ). (10) where N (K,ɛ) deotes the ɛ coverig umber of set K with respect to the Euclidea orm. Sudakov estimate has bee widely kow i the literature to capture the complexity of a geeral fuctioal class (Yag ad Barro, 1999). Through balacig the cardiality of the coverig set at scale ɛ ad the coverig radius ɛ, Sudakov estimate defies the best radius ɛ that maximizes ɛ logn (B p 2 T A (M),ɛ), thus determies the complexity of the set T A (M),ɛ). Sudakov estimate as a complexity measure of the local taget coe is useful for the miimax lower boud aalysis. 11

12 B B g B T A (M) T A (M) sup hg,vi v2b\ta(m) sup p log N (B \ T A (M), ) >0 Figure 5: Gaussia width. Figure 6: Sudakov estimate. The followig Sudakov mioratio ad Dudley etropy itegral (Dudley, 1967; Ledoux ad Talagrad, 1991) show how the Gaussia width w( ) ad Sudakov estimate e( ), both geometric quatities, are related to each other. Lemma 1 (Sudakov Mioratio ad Dudley Etropy Itegral) For ay compact subset K R p, there exist a uiversal costat c > 0 such that c e(k ) w(k ) 24 0 logn (K,ɛ)dɛ. (11) I the literature, aother complexity measure, volume ratio has also bee used to characterize the miimax lower bouds (Ma ad Wu, 2013). Volume ratio has bee studied i Pisier (1999) ad Vershyi (2011). For a covex set K R p, volume ratio used i the preset paper is defied as follows. Defiitio 3 (Volume Ratio) The volume ratio is defied as v(k ) := p ( vol(k ) vol(b p 2 ) ) 1 p (12) The followig Urysoh s iequality, which is proved through Bru-Mikowski Theorem, liks the Gaussia width w( ) with the volume ratio v( ). Lemma 2 (Urysoh s Iequality) Let K be a compact subset of R p. The v(k ) w(k ) with the equality achieved if ad oly if K is the l 2 ball B p 2. The recovery difficulty of the liear iverse problem also depeds o other geometric quatities defied o the local taget coe T A (M): the local isometry costats φ A (M,X ) ad ψ A (M,X ) ad the 12

13 local asphericity ratio γ A (M). The local isometry costats are defied for the local taget coe at the true parameter M as { } X (h) l2 φ A (M,X ) := if : h T A (M),h 0 (13) h l2 { } X (h) l2 ψ A (M,X ) := sup : h T A (M),h 0. (14) h l2 The local isometry costats measure how well the liear operator preserves the l 2 orm withi the local taget coe. Ituitively, the larger the ψ or the smaller the φ is, the harder the recovery is. We will see later that the local isometry costats are determied by the Gaussia width uder the Gaussia esemble desig. The local asphericity ratio is defied as γ A (M) := sup { } h A : h T A (M),h 0, (15) h l2 which measures how extreme the atomic orm is relative to the l 2 orm withi the local taget coe. 2.3 Poit Estimatio via Covex Relaxatio We ow retur to the liear iverse model (1) i the high-dimesioal settig. Suppose we observe (X, Y ) as i (1) where the parameter of iterest M is assumed to have low complexity with respect to a give atom set A. The low complexity of M itroduces a o-covex costrait, which leads to serious computatioal difficulties if solved directly. Covex relaxatio is a effective ad atural approach i such a settig. We propose a geeric covex costraied miimizatio procedure iduced by the atomic orm ad the correspodig dual orm to estimate M: { ˆM = argmi M A : X (Y X (M)) A λ} (16) M where λ is a tuig parameter (localizatio radius) that depeds o the sample size, oise level, ad geometry of the atom set A. A explicit formula for λ is give i (20) i the case of Gaussia oise. Ituitively, the atomic orm miimizatio (16) is a covex relaxatio to the low complexity structure ad λ specifies the localizatio scale give the oise distributio. This geeric covex program utilizes the duality ad recovers the low complexity structure adaptively. The Datzig selector for high-dimesioal sparse regressio (Cadès ad Tao, 2007) ad the costraied uclear orm miimizatio Cadès ad Pla (2011) for trace regressio are particular examples of (16). The properties of the estimator ˆM will be ivestigated i Sectios 3 ad Statistical Iferece via Feasibility of Covex Program I the high-dimesioal settig, p-values as well as cofidece itervals are importat iferetial questios beyod poit estimatio. I this sectio we will show how to perform statistical iferece for the 13

14 liear iverse model (1). Let M R p be the vectorized parameter of iterest, ad {e i,1 i p} are the correspodig basis vectors. Cosider the followig covex feasibility problem for matrix Ω R p p, where each row Ω i satisfies X X Ω i e i A η, 1 i p (17) where η is some tuig parameter that depeds o the sample size ad geometry of the atom set A. Oe ca also solve a stroger versio of the above covex program for η R,Ω R p p simultaeously { (Ω,η ) = argmi η : X X Ω i e i A η, 1 i p}. (18) Ω,η Built upo the costraied miimizatio estimator ˆM i (16) ad feasible matrix Ω i (18), the debiased estimator for iferece o parameter M is defied as M := ˆM + ΩX (Y X ( ˆM)). (19) We will establish the asymptotic ormality for fiite liear cotrast v, M, where v R p, v l2 = 1, v l0 k, k does ot grow with, p, ad costruct cofidece itervals ad hypothesis tests based o the asymptotic ormality result. I the case of high-dimesioal liear regressio, de-biased estimators has bee ivestigated i Bühlma (2013); Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014). The covex feasibility program we proposed here ca be viewed as a uified treatmet for geeral liear iverse models. We will show that uder some coditios o the sample size ad the local taget coe, asymptotic cofidece itervals ad hypothesis tests are valid for fiite liear cotrast v, M which iclude as a special case the idividual coordiates of M. 3 Local Geometric Theory: Gaussia Settig We establish i this sectio a geeral theory of geometric iferece for the liear iverse problem uder the Gaussia settig where the oise vector Z is Gaussia ad the liear operator X is the Gaussia esemble desig i the followig sese. Defiitio 4 (Gaussia Esemble Desig) Let X R p overload the matrix form of the liear operator X : R p R. X is Gaussia esemble if each elemet is i.i.d Gaussia radom variable with mea 0 ad variace 1. Our aalysis is quite differet from the case by case global aalysis of the Datzig selector, Lasso ad uclear orm miimizatio. We show a stroger result which adapts to the local taget coe geometry. All the aalyses i our theory are o-asymptotic, ad the costats are explicit. Aother advatage is that the local aalysis yields robustess for a give parameter (with ear but ot exact low complexity), as the covergece rate is captured by the geometry of the associated local taget coe at a give M. Later 14

15 i Sectio 4 we will show how to exted the theory to a more geeral settig. Without loss of geerality, we assume i our aalysis that the atom set A is scaled so that sup v A v l2 = 1. That is, the atom set A is embedded ito the uit Euclidea ball. 3.1 Local Geometric Upper Boud For the upper boud aalysis, we eed to choose a suitable localizatio radius λ (i the covex program (16)) to guaratee that the true parameter M is i the feasible set with high probability. The tuig parameter, uder the Gaussia oise assumptio, is chose as λ A (X,σ,) = σ } {w(x A ) + δ sup X v l2 σ w(x A ) (20) v A where X A is the image of the atom set uder the liear operator X, ad δ > 0 ca be chose arbitrarily accordig to the probability of success we would like to attai (δ is commoly chose at order log p). λa (X,σ,) is a global parameter that depeds o the liear operator X ad the atom set A, but, importatly, ot o the complexity of M. The followig theorem geometrizes the local rate of covergece i the Gaussia case. Theorem 1 (Gaussia Esemble: Covergece Rate) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let ˆM be the solutio of (16) with λ chose as i (20). Let 0 < c < 1 be a costat. For ay δ > 0, if the with probability at least 1 3exp( δ 2 /2), 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c, ˆM M l2 2σ (1 c) 2 γa (M)w(X A ), ˆM M A 2σ (1 c) 2 γ2 A (M)w(X A ), X ( ˆM M) l2 2σ (1 c) γa (M)w(X A ). Theorem 1 gives bouds for the estimatio error uder both the l 2 orm loss ad the atomic orm loss as well as for the i sample predictio error. The upper bouds are determied by the geometric quatities w(x A ),γ A (M) ad w(b p 2 T A (M)). Take for example the estimatio error uder the l 2 loss. Give ay ɛ > 0, the smallest sample size to esure the recovery error ˆM M l2 ɛ with probability at least 1 3exp( δ 2 /2) is { 4σ 2 max (1 c) 4 γ2 A (M)w 2 (X A ) ɛ 2, 4w 2 p (B2 T } A (M)) c 2. 15

16 That is, the miimum sample size for guarateed statistical accuracy is drive by two geometric terms w(x A )γ A (M) ad w(b p 2 T A (M)). We will see i Sectio 3.4 that these two rates match i a rage of specific high-dimesioal estimatio problems. For the other two loss fuctios, similar calculatio applies. It should be oted that Theorem 1 provides a local aalysis of the performace of the estimator for a give M, which is quite differet from a usual global aalysis over a large parameter space. The proof of Theorem 1 (ad Theorem 4 i Sectio 4) relies o the followig two key lemmas. The first oe is o the choice of the tuig parameter λ which is based o the followig lemma i the Gaussia case. Lemma 3 (Choice of Tuig Parameter) Cosider the liear iverse model (1) with Z N (0, σ2 I ). For ay δ > 0, with probability at least 1 exp( δ 2 /2), X (Z ) A σ {w(x A ) + δ sup X v l2 }. (21) v A This lemma is proved i Sectio 6. The particular value of λ A (X,σ,) for a rage of examples will be calculated i Sectio 3.4. The ext lemma addresses the local behavior of the liear operator X aroud the true parameter M uder the Gaussia esemble desig. We call a liear operator locally ear-isometric if the local isometry costats are uiformly bouded. The followig lemma tells us that i the most widely used Gaussia esemble case, the local isometry costats are guarateed to be bouded, give the sample size is at least of order [w(b p 2 T A (M))] 2. Hece, the difficulty of the problem is captured by the Gaussia width. Lemma 4 (Local Isometry Boud for Gaussia Esemble) Assume the liear operator X is the Gaussia esemble desig. Let 0 < c < 1 be a costat. For ay δ > 0, if 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c, the with probability at least 1 2exp( δ 2 /2), the local isometry costats are aroud 1 with φ A (M,X ) 1 c ad ψ A (M,X ) 1 + c. 3.2 Local Geometric Iferece: Cofidece Itervals ad Hypothesis Testig For statistical iferece o the geeral liear iverse model, we would like to choose the smallest η i (17) to esure that, uder the Gaussia esemble desig, the feasibility set for (17) is o-empty with high probability. The followig theorem establishes geometric iferece for Model (1). Theorem 2 (Geometric Iferece) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let ˆM R p,ω R p p be the solutio of (16) ad (17), ad let M R p be the 16

17 de-biased estimator as i (19). Assume p w 2 (B p 2 T A (M)). If the tuig parameters λ,η are chose with λ σ w(x A ), η 1 w(x A ), covex programs (16) ad (17) have o-empty feasibility set for Ω with high probability. The followig decompositio M M = + σ ΩX W (22) holds, where W N (0, I ) is the stadard Gaussia vector with ΩX W N (0,ΩX X Ω ). ad R p satisfies Suppose γ 2 A (M) λη σγ2 A (M)w 2 (X A ). γ 2 A lim (M)w 2 (X A ) = 0,,p the for ay v R p, v l2 = 1, v l0 k with k fiite, we have the asymptotic ormality for the fuctioal v, M, ( v, M v, M ) σ v [ΩX X Ω ]v,p N (0,1) (23) It follows from Theorem 2 that a valid asymptotic (1 α)-level cofidece itervals for M i,1 i p (whe v is take as e i i Theorem 2) is M ( i + Φ 1 α ) [ΩX X Ω ] i i σ, M ( i + Φ 1 1 α ) [ΩX X Ω ] i i σ. (24) 2 2 If we are iterested i a low-dimesioal liear cotrast v, M = v 0, v l2 = 1, v l0 = k with k fixed, cosider the hypothesis testig problem p p H 0 : v i M i = v 0 v.s. H α : v i M i v 0. i=1 i=1 The test statistic is ( ) v, M v 0 σ(v [ΩX X Ω ]v) 1/2 ad uder the ull, it follows a asymptotic stadard ormal distributio as. 17

18 Similarly, the p-value is of the form ( ( ) ) v, M 2 2Φ 1 v 0 σ(v [ΩX X Ω ]v) 1/2 as. Note the asymptotic ormality holds for ay fiite liear cotrast, ad the asymptotic variace early achieves the Fisher iformatio lower boud, as Ω is a estimate of the iverse of X X. For fixed dimesio iferece, Fisher iformatio lower boud is asymptotically optimal. Remark 1 Note that the coditio for estimatio cosistecy of the parameter M uder the l 2 orm is γ A (M)w(X A ) lim = 0.,p I cotrast, valid cofidece itervals require a stroger coditio γ 2 A lim (M)w 2 (X A ) = 0.,p I the case whe > p ad the Gaussia esemble desig, X X is o-sigular with high probability. With the choice of Ω = (X X ) 1 ad η = 0, for ay i [p], the followig equatio ( M i M i ) N (0,σ 2 [(X X ) 1 ] i i ) holds o-asymptotically. 3.3 Miimax Lower Boud for Local Taget Coe As see i Sectio 3.1 ad 3.2, the local taget coe plays a importat role i the upper boud aalysis. I this sectio, we are iterested i restrictig the parameter space to the local taget coe ad seeig how the geometry of the coe affects the miimax lower boud. Theorem 3 (Lower boud Based o Local Taget Coe) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let M be the true parameter of iterest. Let 0 < c < 1 be a costat. For ay δ > 0, if The with probability at least 1 2exp( δ 2 /2), if ˆM sup M T A (M) 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c. ( E X ˆM M 2 l 2 c 0σ 2 p e(b (1 + c) 2 2 T ) 2 A (M)) for some uiversal costat c 0 > 0. Here E X stads for the coditioal expectatio give the desig matrix X, ad the probability statemet is with respect to the distributio of X uder the Gaussia esemble desig. 18

19 I the Gaussia settig, whe w 2 (B p 2 T A (M)), we have the followig observatios. From Theorem 1, the local upper boud is basically determied by γ 2 A (M)w 2 (X A ), which is of the rate w 2 (B p 2 T A (M)), as we will show i Sectio 3.4 i may examples. The geeral relatioship betwee these two quatities is give i Lemma 5 below. Lemma 5 For ay atom set A, we have the followig relatio γ A (M)w(A ) w(b p 2 T A (M)) where w( ) is the Gaussia width ad γ A (M) is defied i (15). Lemma 5 is proved i Appedix A. From Theorem 3, the miimax lower boud for estimatio over the local taget coe is determied by the Sudakov estimate e 2 (B p 2 T A (M)). A iterestig questio is: How are the two terms w(b p 2 T A (M)) ad e(b p 2 T A (M)) related to each other? It follows directly from Lemma 1 that there exists a uiversal costat c > 0 such that c e(b p 2 T A (M)) w(b p 2 T A (M)) 24 0 logn (B p 2 T A (M),ɛ)dɛ. Thus we have show that uder the Gaussia settig, both i terms of the upper boud ad lower boud, geometric complexity measures gover the difficulty of the estimatio problem, through closely related quatities Gaussia width ad Sudakov estimate. 3.4 Uiversality of the Geometric Approach I this sectio we apply the geeral theory uder the Gaussia settig to some of the actively studied high-dimesioal problems metioed i Sectio 1 to illustrate the wide applicability of the theory. The detail proofs are deferred to Appedix B High Dimesioal Liear Regressio We begi by cosiderig the high-dimesioal liear regressio model (2) uder the assumptio that the true parameter M R p is sparse, say M l0 = s. Our geeral theory applyig to the l 1 miimizatio recovers the optimality results as i Datzig selector ad Lasso. I this case, it ca be show that γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same rate s log p. See Sectio B for the detailed calculatios. The asphericity ratio γ A (M) 1 2 reflects the sparsity of M through the local taget coe ad s the Gaussia width w(x A ) log p. The followig corollary, proved i Sectio B, follows from the geometric aalysis of the high-dimesioal regressio model. Corollary 1 Cosider the high-dimesioal liear regressio model (2). Assume that X R p is the Gaussia esemble desig ad the parameter of iterest M R p is of sparsity s. Let ˆM be the solutio to 19

20 log p the costraied l 1 miimizatio (16) with λ = C 1 σ. If C 2s log p, the ˆM M l2 C 3 σ s log p, ˆM M l1 C 3 σs log p, X ( ˆM M) l2 C 3 σ s log p. with high probability, where C i > 0,1 i 3 are some uiversal costats. s log p For l 2 orm cosistecy of the estimatio for M, we require lim,p = 0. However, for valid iferetial guaratee, the de-biased Datzig selector type estimator M satisfies asymptotic ormality uder the coditio lim,p s log p = 0 through Theorem 2. Uder this coditio, the cofidece itervals give i (24) has asymptotic coverage probability of (1 α) ad its expected legth is at the parametric rate 1. Furthermore, the cofidece itervals do ot deped o the specific value of s. These properties are similar to the cofidece itervals costructed i Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014) Low Rak Matrix Recovery We ow cosider the recovery of low-rak matrices uder the trace regressio model (3). The geometric theory leads to the optimal recovery results as i uclear orm miimizatio ad pealized trace regressio i existig literatures. Assume the true parameter M R p q is of low rak i the sese that rak(m) = r. Let us examie the behavior of φ A (M,X ), γ A (M), ad λ A (X,σ,). Detailed calculatios give i Sectio B show that i this case γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same order r (p + q). The asphericity ratio γ A (M) 1 2 2r characterizes the low rak structure ad the Gaussia width w(x A ) p + q. We have the followig corollary for low rak matrix recovery. Corollary 2 Cosider the trace regressio model (3). Assume that X R pq is the Gaussia esemble desig ad the true parameter M R p q is of rak r. Let ˆM be the solutio to the costraied uclear p+q orm miimizatio (16) with λ = C 1 σ. If C 2r (p + q), the, with high probability, r (p + q) ˆM M F C 3 σ, p + q ˆM M C 3 σr, r (p + q) X ( ˆM M) l2 C 3 σ. 20

21 where C i > 0,1 i 3 are some uiversal costats. For poit estimatio cosistecy of M uder the Frobeius orm loss, the asymptotic coditio is lim,p,q r (p+q) 0. For statistical iferece, Theorem 2 requires lim,p,q = 0, which is essetially pq (sample size is larger tha the dimesio) for r = 1. This pheomeo happes whe the Gaussia width complexity of the rak-1 matrices is large, i.e., the atom set beig too rich. We would like to remark that i practice, covex program (18) ca still be used for costructig cofidece itervals ad performig hypothesis testig. However, it is harder to provide sharp boud theoretically for the approximatio error η i (18), for ay give r, p, q. r (p+q) = Sig Vector Recovery We tur to the sig vector recovery model (4) where the parameter of iterest M {+1, 1} p is a sig vector. The covex hull of the atom set (sig vectors) is the l orm ball ad the correspodig l orm miimizatio program is: { ˆM = argmi M l : X (Y X (M)) l1 λ }. (25) M Applyig the geeral theory to the l orm miimizatio leads to the rates of covergece for the sig vector recovery. The calculatios give i Sectio B show that the asphericity ratio γ A (M) 1 ad the Gaussia width w(x A ) p. Furthermore, γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same order p. Applyig the geometric theory to sig vector recovery leads to the followig result. Corollary 3 Cosider the model (4) where the true parameter M {+1, 1} p is a sig vector. Assume that X R p is the Gaussia esemble desig. Let ˆM be the solutio to the covex program (16) with λ = C 1 σ p. If C 2p, the, with high probability, where C > 0 is some uiversal costats. p ˆM M l2, ˆM M l, X ( ˆM M) l2 C σ, Orthogoal Matrix Recovery We ow treat orthogoal matrix recovery usig the spectral orm miimizatio. Please see Example 4 i Sectio 2.1 for details. The spectral orm miimizatio program is { ˆM = argmi M : X (Y X (M)) λ }. (26) M Cosider the same model as i trace regressio, but the parameter of iterest M R m m is a orthogoal matrix. Calculatios i Sectio B show that γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same rate m 2. 21

22 Applyig the geometric aalysis to orthogoal matrix recovery usig the costraied spectral orm miimizatio yields the followig. Corollary 4 Cosider the orthogoal matrix recovery model (3). Assume that X R m2 is the Gaussia esemble matrix ad the true parameter M R m m is a orthogoal matrix. Let ˆM be the solutio to the m program (16) with λ = C 1 σ 2. If C 2m 2, the, with high probability, m ˆM M l2, ˆM M, X ( ˆM 2 M) l2 C σ, where C > 0 is some uiversal costats Other examples Other examples that ca be formalized uder the framework of the liear iverse model iclude permutatio matrix recovery (Jagabathula ad Shah, 2011), sparse plus low rak matrix recovery (Cadès et al., 2011) ad matrix completio (Cadès ad Recht, 2009). The covex relaxatio of permutatio matrix is double stochastic matrix; the atomic orm correspodig to sparse plus low rak atom set is the ifimal covolutio of the l 1 orm ad uclear orm; for matrix completio, the desig matrix ca be viewed as a diagoal matrix with diagoal elemets beig idepedet Beroulli radom variables. See Sectio 5 for a discussio o further examples. 4 Local Geometric Theory: Geeral Settig We have developed i the last sectio a local geometric theory for the liear iverse model i the Gaussia settig. The Gaussia assumptio o the desig ad oise eables us to carry out cocrete ad more specific calculatios as see i the examples give i Sectio 3.4, but the distributioal assumptio is ot essetial. I this sectio we exted this theory to the geeral settig. 4.1 Geeral Local Upper Boud We shall cosider a fixed desig matrix X. I the case of radom desig, results we will establish are coditioal o the desig. We coditio o the evet whe the oise is cotrolled X (Z ) A λ. We have see i Sectio 3.1 how to choose λ to make this happe with overwhelmig probability i Lemma 3 uder Gaussia oise. Theorem 4 (Geometrizig Local Covergece) Suppose we observe (X, Y ) as i (1). Coditio o the evet that the oise vector Z satisfies, for some give choice of localizatio radius λ X (Z ) A λ. 22

23 Let ˆM be the solutio to the covex program (16) with λ beig the tuig parameter. The the geometric quatities defied o the local taget coe capture the local covergece rate for ˆM, ˆM M l2 2 γ A (M) φ 2 A (M,X )λ, ˆM M A 2 γ2 A (M) φ 2 A (M,X )λ, X ( ˆM M) l2 2 γ A (M) φ A (M,X ) λ with the local asphericity ratio γ A (M) defied i (15) ad the local lower isometry costat φ A (M,X ) defied i (13). Remark 2 This theorem decomposes the estimatio ad predictio errors ito three geometric compoets. The tuig parameter λ ca be regarded as a localizatio radius aroud the true parameter it quatifies the ucertaity i estimatio for a give sample size. It is a global parameter which does ot deped o the local geometry. The other two geometric terms deped o the local taget coe geometry. For example, whe X is the Gaussia esemble desig, the the local lower isometry costat φ A (M,X ) is lower bouded by a costat uder certai coditios, which we have show i Lemma 4. The bouds 1 c φ A (M,X ) ψ A (M,X ) 1 + c hold for may differet radom desig matrices X. As we have see, Sectio 3.4 illustrates how this term behaves i several settigs. Aother observatio worth otig is that Theorem 4 holds determiistically uder the coditios o X (Z ) A ad φ A (M,X ). It does ot require distributioal assumptios o oise, or does it impose coditios o the desig matrix. Theorem 1 ca be viewed as a special case where the local isometry costat φ A (M,X ) ad the local radius λ are calculated explicitly uder the Gaussia assumptio. 4.2 Geeral Geometric Iferece Geometric iferece ca also be exteded for other fixed desig ad oise distributios. We ca modify the covex feasibility program (17) ito the followig stroger form { (Ω,η ) = argmi η : X X Ω i e i A η, 1 i p}. (27) Ω,η The the followig theorem holds (proof is aalogous to Theorem 2). Theorem 5 (Geometric Iferece) Suppose we observe (X, Y ) as i (1). Coditio o the evet that the oise vector Z satisfies, for some give choice of localizatio radius λ, X (Z ) A λ. Let ˆM be the solutio to the covex program (16) with λ beig the tuig parameter. Deote Ω ad η as the optimal 23

24 solutio to the covex program (27), ad M as the de-biased estimator. The followig decompositio M M = + σ ΩX W (28) holds, where W N (0, I ) is the stadard Gaussia vector ΩX W N (0,ΩX X Ω ) ad R p satisfies 2 γ2 A (M) φ A (M,X ) λ η. 4.3 Geeral Local Miimax Lower Boud The lower boud give i the Gaussia case ca also be exteded to the geeral settig where the class of oise distributios cotais the Gaussia distributios. We aim to geometrize the itrisic difficulty of the estimatio problem i a uified maer. We first preset a geeral result for a covex coe T i the parameter space, which illustrates how the Sudakov estimate, volume ratio ad the desig matrix affect the miimax lower boud. Theorem 6 (Miimax Lower Boud via Sudakov Estimate ad Volume Ratio) Let T R p be a compact covex coe. The miimax lower boud for the liear iverse model (1), if restricted to the coe T, is if ˆM sup M T ( E X ˆM M 2 l 2 c 0σ 2 p e(b ψ 2 2 T ) v(b p 2 T ) ) 2. where ˆM is ay measurable estimator, ψ = sup p v B 2 T X (v) l 2 ad c 0 is a uiversal costat. Here the otatio E X meas takig expectatio coditioed o the desig matrix X. e( ) ad v( ) deote the Sudakov estimate (see (10)) ad volume ratio (see (12)). Applyig the theorem to the local taget coe yields the followig corollary. Corollary 5 (Lower boud Based o Local Taget Coe) Assume T A (M) is the local taget coe of iterest. For for ay measurable estimator ˆM ad for parameters M T A (M), we have the followig miimax lower boud if ˆM sup M T A (M) E X ˆM M 2 l 2 ( c 0 σ 2 p e(b ψ 2 A (M,X ) 2 T A (M)) v(b p 2 T ) 2 A (M)) where ψ A (M,X ) is defied i (13). Here the otatio E X meas takig expectatio coditioed o the desig matrix X. 24

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short