arxiv: v3 [math.st] 16 Jun 2015

Size: px
Start display at page:

Download "arxiv: v3 [math.st] 16 Jun 2015"

Transcription

1 Geometric Iferece for Geeral High-Dimesioal Liear Iverse Problems T. Toy Cai, Tegyua Liag ad Alexader Rakhli arxiv: v3 [math.st] 16 Ju 2015 Departmet of Statistics The Wharto School Uiversity of Pesylvaia Abstract This paper presets a uified geometric framework for the statistical aalysis of a geeral ill-posed liear iverse model which icludes as special cases oisy compressed sesig, sig vector recovery, trace regressio, orthogoal matrix estimatio, ad oisy matrix completio. We propose computatioally feasible covex programs for statistical iferece icludig estimatio, cofidece itervals ad hypothesis testig. A theoretical framework is developed to characterize the local estimatio rate of covergece ad to provide statistical iferece guaratees. Our results are built based o the local coic geometry ad duality. The difficulty of statistical iferece is captured by the geometric characterizatio of the local taget coe through the Gaussia width ad Sudakov mioratio estimate. 1 Itroductio Drive by a wide rage of applicatios, high-dimesioal liear iverse problems such as oisy compressed sesig, sig vector recovery, trace regressio, orthogoal matrix estimatio, ad oisy matrix completio have draw sigificat recet iterest i several fields, icludig statistics, applied mathematics, computer sciece, ad electrical egieerig. These problems are ofte studied i a case-bycase fashio ad the focus so far is maily o estimatio. Although similarities i the techical aalyses have bee suggested heuristically, a geeral uified theory for statistical iferece icludig estimatio, cofidece itervals ad hypothesis testig is still yet to be developed. The research of Toy Cai was supported i part by NSF Grats DMS ad DMS , ad NIH Grat R01 CA Tegyua Liag ackowledges the support of Wikelma Fellowship. Alexader Rakhli gratefully ackowledges the support of NSF uder grat CAREER DMS

2 I this paper, we cosider a geeral liear iverse model Y = X (M) + Z (1) where M R p is the vectorized versio of the parameter of iterest, X : R p R is a liear operator, ad Z R is a oise vector. We observe (X,Y ) ad wish to recover the ukow parameter M. A particular focus is o the high-dimesioal settig where the ambiet dimesio p of the parameter M is much larger tha the sample size, i.e., the dimesio of Y. I such a settig, the parameter of iterest M is commoly assumed to have, with respect to a give atom set A, a certai low complexity structure which captures the true dimesio of the statistical estimatio problem. A umber of high-dimesioal iferece problems actively studied i the recet literature ca be see as special cases of this geeral liear iverse model. High Dimesio Liear Regressio/Noisy Compressed Sesig. I high-dimesioal liear regressio, oe observes (X,Y ) with Y = X M + Z, (2) where Y R, X R p with p, M R p is a sparse sigal, ad Z R is a oise vector. The goal is to recover the ukow sparse sigal of iterest M R p based o the observatio (X,Y ) through a efficiet algorithm. May estimatio methods icludig l 1 -regularized procedures such as the Lasso ad Datzig Selector have bee developed ad aalyzed. See, for example, Tibshirai (1996); Cadès ad Tao (2007); Bickel et al. (2009); Bühlma ad va de Geer (2011) ad the refereces therei. Cofidece itervals ad hypothesis testig for high-dimesioal liear regressio have also bee actively studied i the last few years. A commo approach is to first costruct a de-biased Lasso or de-biased scaled-lasso estimator ad the make iferece based o the asymptotic ormality of low-dimesioal fuctioals of the de-biased estimator. See, for example, Bühlma (2013); Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014). Trace Regressio. Accurate recovery of a low-rak matrix based o a small umber of liear measuremets has a wide rage of applicatios ad has draw much recet attetio i several fields. See, for example, Recht et al. (2010); Koltchiskii (2011); Rohde et al. (2011); Koltchiskii et al. (2011); Cadès ad Pla (2011). I trace regressio, oe observes (X i,y i ), i = 1,..., with Y i = Tr(X T i M) + Z i, (3) where Y i R, X i R p 1 p 2 are measuremet matrices, ad Z i are oise. The goal is to recover the ukow matrix M R p 1 p 2 which is assumed to be of low rak. Here the dimesio of the parameter M is p p 1 p 2. A umber of costraied ad pealized uclear miimizatio methods have bee itroduced ad studied i both the oiseless ad oisy settigs. See the aforemetioed refereces for further details. 2

3 Sig Vector Recovery. The settig of sig vector recovery is similar to the oe for the high-dimesioal regressio except the sigal of iterest is a sig vector. More specifically, i sig vector recovery, oe observes (X,Y ) with Y = X M + Z (4) where Y R, X R p, M {+1, 1} p is a sig vector, ad Z R is a oise vector. The goal is to recover the ukow sig sigal of iterest M. Exhaustive search over the parameter set is computatioally prohibitive. The oiseless case of (4), kow as the geeralized multi-kapsack problem (Khuri et al., 1994; Magasaria ad Recht, 2011), ca be solved through a iteger program which is kow to be computatioally difficult eve for checkig the uiqueess of the solutio, see (Prokopyev et al., 2005; Valiat ad Vazirai, 1986). Orthogoal Matrix Recovery. I some applicatios the matrix of iterest i trace regressio is kow to be a orthogoal/rotatio matrix (Te Berge, 1977; Gower ad Dijksterhuis, 2004). More specifically, i orthogoal matrix recovery, we observe (X i,y i ), i = 1,..., as i the trace regressio model (3) where X i R m m are measuremet matrices ad M R m m is a orthogoal matrix. The goal is to recover the ukow M usig a efficiet algorithm. Computatioal difficulties come i because of the o-covex costrait. See Chadrasekara et al. (2012). Matrix Completio. Matrix completio aims to recover a low-rak matrix based o observatios of a subset of etries. It ca be viewed as a special case of the trace regressio model (3) with the measuremet matrices of the form e ik e for k = 1,...,, where e j k i is the i th stadard basis vector, ad i 1,,i ad j 1,, j are radomly draw with replacemet from {1,, p 1 } ad {1,, p 2 }, respectively. That is, the idividual etries of the matrix M are observed at radomly selected positios. The goal is to recover the low-rak matrix M based o the partial observatios Y. See Cadès ad Recht (2009); Recht (2011) for matrix recovery i the oiseless case ad Cades ad Pla (2010); Chatterjee (2012); Cai ad Zhou (2013) for the oisy case. Other high-dimesioal iferece problems that are closely coected to the structured liear iverse model (1) iclude high-dimesioal covariace matrix estimatio where the covariace matrix of iterest is baded/sparse/spiked (Karoui, 2008; Cai et al., 2010, 2013, 2014), sparse ad low rak decompositio i robust pricipal compoet aalysis (Cadès et al., 2011), ad sparse oise ad sparse parameter i demixig problem (Ameluxe et al., 2013), to ame a few. We will discuss the coectios i details i Sectio There are several fudametal questios for this geeral class of high-dimesioal liear iverse problems. Statistical Questios: How well ca the parameter M be estimated? What is the itrisic difficulty of the estimatio problem? How to provide iferece guaratees for M, i.e., cofidece itervals ad hypothesis testig, i geeral? 3

4 Computatioal Questios: Are there computatioally efficiet (polyomial time complexity) algorithms that are also sharp i terms of statistical estimatio ad iferece? 1.1 High-Dimesioal Liear Iverse Problems Liear iverse problems have bee well studied i the classical settig where the parameter of iterest lies i a covex set. See, for example, Tikhoov ad Arsei (1977), O Sulliva (1986), ad Johstoe ad Silverma (1990). I particular, for estimatio of a liear fuctioal over a covex parameter space, Dooho (1994) developed a elegat geometric characterizatio of the miimax theory i terms of the modulus of cotiuity. However, the theory relies critically o the covexity assumptio of the parameter space. As show i Cai ad Low (2004a,b), the behavior of the fuctioal estimatio ad cofidece iterval problems is sigificatly differet eve whe the parameter space is the uio of two covex sets. For the high-dimesioal liear iverse problems cosidered i the preset paper, the parameter space is highly o-covex ad the theory ad techiques developed i the classical settig are ot readily applicable. For high-dimesioal liear iverse problems such as those metioed earlier, the parameter space has low-complexity ad exhaustive search ofte leads to the optimal solutio i terms of statistical accuracy. However, it is computatioally prohibitive ad requires the prior kowledge of the true low complexity. I recet years, relaxig the problem to a covex program such as l 1 or uclear orm miimizatio ad the solvig it with optimizatio techiques have prove to be a powerful approach i idividual cases. Uified approaches to sigal recovery recetly appeared both i the applied mathematics literature (Chadrasekara et al., 2012; Ameluxe et al., 2013; Oymak et al., 2013) ad i the statistics literature (Negahba et al., 2012). Oymak et al. (2013) studied the geeralized LASSO problem through coic geometry with a simple boud i terms of the l 2 orm of the oise vector. (Chadrasekara et al., 2012) itroduced the otio of atomic orm to defie a low complexity structure ad showed that Gaussia width captures the miimum sample size required to esure recovery. Ameluxe et al. (2013) studied the phase trasitio for the covex algorithms for a wide rage of problems. These papers suggested that the geometry of the local taget coe determies the miimum umber of samples to esure successful recovery i the oiseless or determiistic oise settigs. Negahba et al. (2012) studied the regularized- M estimatio with a decomposable orm pealty i the additive Gaussia oise settig. Aother lie of research is focused o a detailed aalysis of the Empirical Risk Miimizatio (ERM) (Lecué ad Medelso, 2013). Here, the objective fuctio is the excess risk for the squared error loss. The excess risk is show to have the rate of 1/2 or 1, i terms of the sample size. The aalysis is based o the empirical processes idexed by the geeral subgaussia fuctioal classes, with a proper localizatio radius aroud the best parameter. I additio to covexity, the ERM requires the prior kowledge o the size of the bouded parameter set of iterest. This kowledge is ot eeded for 4

5 the algorithm we propose i the preset paper. Compared to estimatio, there is a paucity of methods ad theoretical results for cofidece itervals ad hypothesis testig for these liear iverse models. Specifically for high-dimesioal liear regressio, cofidece itervals ad sigificace testig have draw icreasig recet attetio. Bühlma (2013) studied a bias correctio method based o the ridge estimatio, while Zhag ad Zhag (2014) proposed bias correctio via score vector usig scaled Lasso as the iitial estimator. va de Geer et al. (2014); Javamard ad Motaari (2014) focused o de-sparsifyig the Lasso via costructig a ear iverse of the Gram matrix, oe uses ode-wise Lasso while the other uses a l costraied quadratic programig, with similar theoretical guaratees. To the best of our kowledge, iferece procedures for other high-dimesioal liear iverse models are yet to be developed. 1.2 Geometric Characterizatio of Liear Iverse Problems Uder the liear iverse model (1), the parameter M is assumed to have certai low complexity structure with respect to a give atom set i a high-dimesioal Euclidea space, which itroduces a o-covex costrait. The o-covex costrait poses difficulty for the iverse problem. However, proper covex relaxatio based o the geeral atom structure provides a computatioally feasible solutio. Our goal is to recover ad make iferece o the parameter M based o the observatio (X,Y ) efficietly. This problem ca also be framed i the laguage of geometric fuctioal aalysis (Ledoux ad Talagrad, 1991; Vershyi, 2011). For poit estimatio, we are iterested i how the local covex geometry aroud the true parameter affects the estimatio procedure ad the itrisic estimatio difficulty, i terms of the local upper boud ad the local miimax lower boud respectively. Note that local taget coe plays a key role i our aalysis. For statistical iferece, we develop geeral procedures iduced by the covex geometry, which aswers iferetial questios such as cofidece itervals ad hypothesis testig efficietly. We are also iterested i the sample size coditio iduced by the local covex geometry for valid iferece guaratees. Complexity measures such as Gaussia width ad Rademacher complexity are well studied i the empirical processes theory (Ledoux ad Talagrad, 1991; Talagrad, 1996), ad are kow to capture the difficulty of the estimatio problem. Coverig/Packig etropy ad volume ratio (Yag ad Barro, 1999; Vershyi, 2011; Ma ad Wu, 2013) are also widely used i geometric fuctioal aalysis to measure the complexity. I this paper, we show how these geometric quatities affect the computatioally efficiet estimatio/iferece procedure, as well as the itrisic difficulty of the estimatio/iferece problem. Our mai result ca be summarized as follows. We propose uified covex algorithms for estimatio ad iferece, ad the aalyze the theoretical properties for these algorithms. O the local taget coe T A (M) (the formal defiitio is give i (8), ad B p 2 below deotes Euclidea ball i Rp ), geometric quatities such as the Gaussia width w(b p 2 T A (M)), Sudakov mioratio estimate e(b p 2 T A (M)), ad volume ratio v(b p 2 T A (M)) (defied i Sectio 2.2) capture the rate of covergece of the liear 5

6 iverse problem. I terms of the upper boud, with overwhelmig probability, if w 2 (B p 2 T A (M)), the estimatio error uder l 2 orm for our algorithm is of the rate σ γ A (M)w(X A ) where γ A (M) is the local asphericity ratio defied i (15). The miimax lower boud for estimatio uder l 2 orm over the local taget coe T A (M)satisfies [ p e(b 2 σ T A (M)) v(b p 2 T A (M)) ]. For statistical iferece, we establish valid asymptotic ormality for ay low-dimesioal liear fuctioal of the parameter M uder the coditio γ 2 A lim (M)w 2 (X A ) = 0,,p which ca be compared to the coditio for poit estimatio cosistecy γ A (M)w(X A ) lim = 0.,p We remark o the critical differece o the sufficiet coditios betwee valid iferece ad estimatio cosistecy - more striget coditio o sample size is required for iferece beyod estimatio. Ituitively, statistical iferece is purely geometrized by Gaussia width ad Sudakov mioratio estimate. 1.3 Our Cotributios The mai cotributios of the preset paper are two-fold. Uified covex algorithms for estimatio ad iferece. We propose a geeral computatioally feasible covex program that provides ear optimal rate of covergece simultaeously for a collectio of high-dimesioal liear iverse problems. We also provide a geeral covex feasibility program that leads to iferece guaratees for ay fiite liear cotrast, such as cofidece itervals ad hypothesis testig. Local geometric theory: Upper ad lower bouds, cofidece itervals ad hypothesis testig. A uified theoretical framework is provided for aalyzig high-dimesioal liear iverse problems based o the local coic geometry ad duality. The poit estimatio ad statistical iferece are adaptive i the sese that the difficulty (rate of covergece, coditios o sample size, etc.) automatically adapts to the low complexity structure of the true parameter. Both the iferece guaratee ad estimatio cosistecy are closely related ad rely o coditios iduced by the local coic geometry. It is show that the miimax lower boud for estimatio over the local taget coe is captured by the Sudakov mioratio estimate or volume ratio. The results geometrize statistical iferece for geeral liear iverse problems with low complexity structure. 6

7 1.4 Orgaizatio of the Paper The rest of the paper is structured as follows. I Sectio 2, after otatio, defiitios, ad basic covex geometry are reviewed, we formally preset covex programs for recoverig the parameter M, ad for providig iferece guaratees for M, based o the observatio (X, Y ). The properties of the proposed procedures are the studied i Sectio 3. Uder the Gaussia settig, a geometric theory is developed i terms of the local upper boud, the miimax lower boud as well as the cofidece itervals ad hypothesis testig. Applicatios to particular high-dimesioal estimatio problems are also icluded at the ed of this sectio. Sectio 4 exteds the geometric theory beyod Gaussia. Relatios betwee the upper ad lower bouds are discussed. Further discussios appear i Sectio 5, ad the proofs of the mai results are give i Sectio 6 ad Appedix A ad B. 2 Prelimiaries ad Algorithms We review i this sectio otatio ad defiitios that will be used i the rest of the paper. I particular, we itroduce basics of covex geometry icludig importat geometric quatities that will be show to be istrumetal i characterizig the difficulty for statistical estimatio ad iferece i later sectios. We the collect some kow results o the complexity measures, Gaussia width, Sudakov estimate ad volume ratio, that will be used repeatedly later. Fially, we will formally itroduce our geeral estimatio ad iferece programs based o the covex geometry ad duality. I this paper, we use lq to deote the l q orm of a vector ad use B p 2 to deote the uit Euclidea ball i R p. For a matrix M, deote by M F, M, ad M the Frobeius orm, uclear orm, ad spectral orm of M respectively. Whe there is o cofusio, we also deote M F = M l2 for a matrix M. For a vector V R p, deote its traspose by V. The ier product o vectors is defied as usual V 1,V 2 = V 1 V 2. For matrices M 1, M 2 = Tr(M 1 M 2) = Vec(M 1 ) Vec(M 2 ), where Vec(M) R pq deotes the vectorized versio of matrix M R p q. X : R p R deotes a liear operator from R p to R. Followig the otatio above, M R q p is the adjoit (traspose) matrix of M ad X : R R p is the adjoit operator of X such that X (V 1 ),V 2 = V 1,X (V 2 ). For a covex compact set K i a metric space with the metric d, we say that S K is a ɛ-coverig set if x K, y S such that d(x, y) < ɛ. Ad we say that S K is a ɛ-packig set if x, y S, x y, d(x, y) ɛ. The ɛ-etropy for a covex compact set K with respect to the metric d is deoted i the followig way: ɛ-packig etropy logm (K,ɛ,d) is the logarithm cardiality of the largest ɛ-packig set, ad ɛ-coverig etropy logn (K,ɛ,d) is the logarithm cardiality of the smallest ɛ-coverig set with respect to metric d. A well kow result is M (K,2ɛ,d) N (K,ɛ,d) M (K,ɛ,d). Whe the metric d is the usual Euclidea distace, we will omit d i M (K,ɛ,d) ad N (K,ɛ,d) ad simply write M (K,ɛ) ad N (K,ɛ). For two sequeces of positive umbers {a } ad {b }, we deote a b if there exists a costat c 0 such that a b c 0 for all ad a b if there exists a costat C 0 such that a b C 0 for all. We write 7

8 a b if a b ad a b. Throughout the paper, c,c,c 0,C 0 deote costats that may vary from place to place. 2.1 Basic Covex Geometry We cosider the liear iverse model (1) i the high-dimesioal settig where the dimesio p ca possibly be much larger tha the sample size ad the parameter of iterest M lies i a certai low complexity space. Examples iclude sparsity i oisy compressed sesig ad low rak i trace regressio ad matrix completio. The liear operator X i the model (1) ca be viewed as a matrix X R p. Without loss of geerality, we assume X is stadardized to have uit colum l 2 orm. The oise vector Z R is assumed to have the oise level σ/ ad the covariace matrix σ2 I. The otio of low complexity is based o a collectio of basic atoms. We deote the collectio of these basic atoms as a atom set A, either coutable or ucoutable, as illustrated i Figure 1. A parameter M is of complexity k i terms of the atoms i A if M ca be expressed as a liear combiatio of at most k atoms i A, i.e., there exists a decompositio M = c a (M) a, where 1 {ca (M) 0} k a A a A kmk A cov(a) cov(a) M A Figure 1: Atom set illustratio. The red dots deote atoms. This particular example illustrates the atoms beig basis vectors for sparse regressio. Figure 2: Atomic orm illustratio. The red dashed lie deotes the covex hull of atoms set. The blue dashed lie deotes the scaled covex hull where M lies i. I covex geometry (Pisier, 1999), the Mikowski fuctioal (gauge) of a symmetric covex body K 8

9 is defied as x K = if{t > 0 : x tk }. Let A be a collectio of atoms that is a compact subset of R p. We assume that the elemets of A are extreme poits of the covex hull cov(a ) (i the sese that for ay x R p, sup{ x, a : a A } = sup{ x, a : a cov(a )}). The atomic orm x A for ay x R p is defied as the gauge of cov(a ) (see Figure 2): x A = if{t > 0 : x t cov(a )}. As oted i Chadrasekara et al. (2012), the atomic orm ca also be writte as { x A = if c a : x = } c a a, c a 0. (5) a A a A The dual orm of this atomic orm is defied i the followig way (sice the atoms i A are the extreme poits of cov(a )), x A = sup{ x, a : a A } = sup{ x, a : a A 1}. (6) We have the followig ( Cauchy-Schwarz ) symmetric relatio for the orm ad its dual x, y x A y A. (7) It is clear that the uit ball with respect to the atomic orm A is the covex hull of the set of atoms A. The taget coe at x with respect to the scaled uit ball x A cov(a ) is defied to be (see Figures 3 ad 4) T A (x) = coe{h : x + h A x A }. (8) Also kow as a recessio coe, T A (x) is the collectio of directios where the atomic orm becomes smaller. This taget coe T A (x) determies the geometric property of the eighborhood aroud the true parameter M, ad thus the complexity of this coe will affect the difficulty of the recovery problem. The coe is ubouded, but we ca look at the coe itersected with the uit ball B p 2 T A (M) i aalyzig the complexity of the coe. Figure 3 provides a ituitive illustratio where the red shaded area is the scaled atomic orm ball, M is the true parameter, the black arrow deotes oe vector iside the taget coe, ad the regio eclosed by the blue dashed lies is the T A (M). I order to better illustrate the geeral model ad otio of low complexity, it is helpful to look at the atom set, atomic orm ad taget coe geometry i a few examples. Example 1 For sparse sigal recovery i high-dimesioal liear regressio, the atom set cosists of the uit basis vectors {±e i }, the atomic orm is the vector l 1 orm, ad its dual orm is the vector l 9

10 kmk A cov(a) M 1 M + h M M 2 h T A (M) M 3 Figure 3: Taget coe geeral illustratio 2D. The red shaped area is the scaled covex hull of atom set. The blue dashed lie forms the taget coe at M. Black arrow deotes the possible directios iside the coe. Figure 4: Taget coe illustratio 3D for sparse regressio. For three possible locatios M i,1 i 3, the taget coe are differet, with coes becomig more complex as i icreases. orm. The covex hull cov(a ) is called the cross-polytope. Figure 4 illustrates this taget coe for 3D l 1 orm ball for 3 differet cases T A (M i ),1 i 3. The agle or complexity of the local taget coe determies the difficulty of recovery. Most of the previous work showed that the algebraic characterizatio (sparsity) of the parameter space drives the global rate, ad we are arguig that the geometric characterizatio through the local taget coe provides a ituitive ad refied local approach to highdimesioal liear iverse problem. Example 2 I trace regressio ad matrix completio, the goal is to recover low rak matrices. I such settigs, the atom set cosists of the rak oe matrices (matrix maifold) A = {uv : u l2 = 1, v l2 = 1} ad the atomic orm is the uclear orm ad the dual orm is the spectral orm. The covex hull cov(a ) is called the uclear orm ball of matrices. The positio of the true parameter o the scaled uclear orm ball determies the geometry of the local taget coe, thus affectig the estimatio difficulty. Example 3 I iteger programmig, oe would like to recover the sig vectors whose etries take o values ±1. The atom set is all sig vectors (cardiality 2 p ) ad the covex hull cov(a ) is the hypercube. Taget coes for each parameter have the same structure i this case. Example 4 I orthogoal matrix recovery, the matrix of iterest is costraied to be orthogoal. I this 10

11 case, the atom set is all orthogoal matrices ad the covex hull cov(a ) is the spectral orm ball. Similar to sig vector recovery, the local taget coes for each orthogoal matrix share similar geometric property. 2.2 Gaussia Width, Sudakov Estimate, ad Other Geometric Quatities Our theoretical aalysis relies o several key geometric quatities. We first itroduce two complexity measures, the Gaussia width ad Sudakov estimate. Defiitio 1 (Gaussia Width) For a compact set K R p, the Gaussia width is defied as where g N (0, I p ) is the stadard multivariate Gaussia vector. ] w(k ) := E g [sup g, v. (9) v K Gaussia width quatifies the probability that a radomly orieted subspace misses a covex subset. It was itroduced i Gordo s aalysis (Gordo, 1988), ad was show recetly to play a crucial rule i liear iverse problems i various oiseless or determiistic oise settigs, see, for example, Ameluxe et al. (2013). Explicit upper bouds o the Gaussia width for differet covex sets have bee give i Chadrasekara et al. (2012); Ameluxe et al. (2013). For example, if M R p is a s sparse vector, w(b p 2 T A (M)) s log p/s. Whe M R p q is a rak-r matrix, w(b p 2 T A (M)) r (p + q r ). For sig vector i R p, w(b p 2 T A (M)) p, while for orthogoal matrix i R m m, w(b p 2 T A (M)) m(m 1). See Sectio 3.4 propositios i Chadrasekara et al. (2012) for detailed calculatios. The Gaussia width as a complexity measure of the local taget coe will be used i the upper boud aalysis i Sectios 3 ad 4. Defiitio 2 (Sudakov Mioratio Estimate) The Sudakov estimate of a compact set K R p is defied as e(k ) := sup ɛ ɛ logn (K,ɛ). (10) where N (K,ɛ) deotes the ɛ coverig umber of set K with respect to the Euclidea orm. Sudakov estimate has bee widely kow i the literature to capture the complexity of a geeral fuctioal class (Yag ad Barro, 1999). Through balacig the cardiality of the coverig set at scale ɛ ad the coverig radius ɛ, Sudakov estimate defies the best radius ɛ that maximizes ɛ logn (B p 2 T A (M),ɛ), thus determies the complexity of the set T A (M),ɛ). Sudakov estimate as a complexity measure of the local taget coe is useful for the miimax lower boud aalysis. 11

12 B B g B T A (M) T A (M) sup hg,vi v2b\ta(m) sup p log N (B \ T A (M), ) >0 Figure 5: Gaussia width. Figure 6: Sudakov estimate. The followig Sudakov mioratio ad Dudley etropy itegral (Dudley, 1967; Ledoux ad Talagrad, 1991) show how the Gaussia width w( ) ad Sudakov estimate e( ), both geometric quatities, are related to each other. Lemma 1 (Sudakov Mioratio ad Dudley Etropy Itegral) For ay compact subset K R p, there exist a uiversal costat c > 0 such that c e(k ) w(k ) 24 0 logn (K,ɛ)dɛ. (11) I the literature, aother complexity measure, volume ratio has also bee used to characterize the miimax lower bouds (Ma ad Wu, 2013). Volume ratio has bee studied i Pisier (1999) ad Vershyi (2011). For a covex set K R p, volume ratio used i the preset paper is defied as follows. Defiitio 3 (Volume Ratio) The volume ratio is defied as v(k ) := p ( vol(k ) vol(b p 2 ) ) 1 p (12) The followig Urysoh s iequality, which is proved through Bru-Mikowski Theorem, liks the Gaussia width w( ) with the volume ratio v( ). Lemma 2 (Urysoh s Iequality) Let K be a compact subset of R p. The v(k ) w(k ) with the equality achieved if ad oly if K is the l 2 ball B p 2. The recovery difficulty of the liear iverse problem also depeds o other geometric quatities defied o the local taget coe T A (M): the local isometry costats φ A (M,X ) ad ψ A (M,X ) ad the 12

13 local asphericity ratio γ A (M). The local isometry costats are defied for the local taget coe at the true parameter M as { } X (h) l2 φ A (M,X ) := if : h T A (M),h 0 (13) h l2 { } X (h) l2 ψ A (M,X ) := sup : h T A (M),h 0. (14) h l2 The local isometry costats measure how well the liear operator preserves the l 2 orm withi the local taget coe. Ituitively, the larger the ψ or the smaller the φ is, the harder the recovery is. We will see later that the local isometry costats are determied by the Gaussia width uder the Gaussia esemble desig. The local asphericity ratio is defied as γ A (M) := sup { } h A : h T A (M),h 0, (15) h l2 which measures how extreme the atomic orm is relative to the l 2 orm withi the local taget coe. 2.3 Poit Estimatio via Covex Relaxatio We ow retur to the liear iverse model (1) i the high-dimesioal settig. Suppose we observe (X, Y ) as i (1) where the parameter of iterest M is assumed to have low complexity with respect to a give atom set A. The low complexity of M itroduces a o-covex costrait, which leads to serious computatioal difficulties if solved directly. Covex relaxatio is a effective ad atural approach i such a settig. We propose a geeric covex costraied miimizatio procedure iduced by the atomic orm ad the correspodig dual orm to estimate M: { ˆM = argmi M A : X (Y X (M)) A λ} (16) M where λ is a tuig parameter (localizatio radius) that depeds o the sample size, oise level, ad geometry of the atom set A. A explicit formula for λ is give i (20) i the case of Gaussia oise. Ituitively, the atomic orm miimizatio (16) is a covex relaxatio to the low complexity structure ad λ specifies the localizatio scale give the oise distributio. This geeric covex program utilizes the duality ad recovers the low complexity structure adaptively. The Datzig selector for high-dimesioal sparse regressio (Cadès ad Tao, 2007) ad the costraied uclear orm miimizatio Cadès ad Pla (2011) for trace regressio are particular examples of (16). The properties of the estimator ˆM will be ivestigated i Sectios 3 ad Statistical Iferece via Feasibility of Covex Program I the high-dimesioal settig, p-values as well as cofidece itervals are importat iferetial questios beyod poit estimatio. I this sectio we will show how to perform statistical iferece for the 13

14 liear iverse model (1). Let M R p be the vectorized parameter of iterest, ad {e i,1 i p} are the correspodig basis vectors. Cosider the followig covex feasibility problem for matrix Ω R p p, where each row Ω i satisfies X X Ω i e i A η, 1 i p (17) where η is some tuig parameter that depeds o the sample size ad geometry of the atom set A. Oe ca also solve a stroger versio of the above covex program for η R,Ω R p p simultaeously { (Ω,η ) = argmi η : X X Ω i e i A η, 1 i p}. (18) Ω,η Built upo the costraied miimizatio estimator ˆM i (16) ad feasible matrix Ω i (18), the debiased estimator for iferece o parameter M is defied as M := ˆM + ΩX (Y X ( ˆM)). (19) We will establish the asymptotic ormality for fiite liear cotrast v, M, where v R p, v l2 = 1, v l0 k, k does ot grow with, p, ad costruct cofidece itervals ad hypothesis tests based o the asymptotic ormality result. I the case of high-dimesioal liear regressio, de-biased estimators has bee ivestigated i Bühlma (2013); Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014). The covex feasibility program we proposed here ca be viewed as a uified treatmet for geeral liear iverse models. We will show that uder some coditios o the sample size ad the local taget coe, asymptotic cofidece itervals ad hypothesis tests are valid for fiite liear cotrast v, M which iclude as a special case the idividual coordiates of M. 3 Local Geometric Theory: Gaussia Settig We establish i this sectio a geeral theory of geometric iferece for the liear iverse problem uder the Gaussia settig where the oise vector Z is Gaussia ad the liear operator X is the Gaussia esemble desig i the followig sese. Defiitio 4 (Gaussia Esemble Desig) Let X R p overload the matrix form of the liear operator X : R p R. X is Gaussia esemble if each elemet is i.i.d Gaussia radom variable with mea 0 ad variace 1. Our aalysis is quite differet from the case by case global aalysis of the Datzig selector, Lasso ad uclear orm miimizatio. We show a stroger result which adapts to the local taget coe geometry. All the aalyses i our theory are o-asymptotic, ad the costats are explicit. Aother advatage is that the local aalysis yields robustess for a give parameter (with ear but ot exact low complexity), as the covergece rate is captured by the geometry of the associated local taget coe at a give M. Later 14

15 i Sectio 4 we will show how to exted the theory to a more geeral settig. Without loss of geerality, we assume i our aalysis that the atom set A is scaled so that sup v A v l2 = 1. That is, the atom set A is embedded ito the uit Euclidea ball. 3.1 Local Geometric Upper Boud For the upper boud aalysis, we eed to choose a suitable localizatio radius λ (i the covex program (16)) to guaratee that the true parameter M is i the feasible set with high probability. The tuig parameter, uder the Gaussia oise assumptio, is chose as λ A (X,σ,) = σ } {w(x A ) + δ sup X v l2 σ w(x A ) (20) v A where X A is the image of the atom set uder the liear operator X, ad δ > 0 ca be chose arbitrarily accordig to the probability of success we would like to attai (δ is commoly chose at order log p). λa (X,σ,) is a global parameter that depeds o the liear operator X ad the atom set A, but, importatly, ot o the complexity of M. The followig theorem geometrizes the local rate of covergece i the Gaussia case. Theorem 1 (Gaussia Esemble: Covergece Rate) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let ˆM be the solutio of (16) with λ chose as i (20). Let 0 < c < 1 be a costat. For ay δ > 0, if the with probability at least 1 3exp( δ 2 /2), 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c, ˆM M l2 2σ (1 c) 2 γa (M)w(X A ), ˆM M A 2σ (1 c) 2 γ2 A (M)w(X A ), X ( ˆM M) l2 2σ (1 c) γa (M)w(X A ). Theorem 1 gives bouds for the estimatio error uder both the l 2 orm loss ad the atomic orm loss as well as for the i sample predictio error. The upper bouds are determied by the geometric quatities w(x A ),γ A (M) ad w(b p 2 T A (M)). Take for example the estimatio error uder the l 2 loss. Give ay ɛ > 0, the smallest sample size to esure the recovery error ˆM M l2 ɛ with probability at least 1 3exp( δ 2 /2) is { 4σ 2 max (1 c) 4 γ2 A (M)w 2 (X A ) ɛ 2, 4w 2 p (B2 T } A (M)) c 2. 15

16 That is, the miimum sample size for guarateed statistical accuracy is drive by two geometric terms w(x A )γ A (M) ad w(b p 2 T A (M)). We will see i Sectio 3.4 that these two rates match i a rage of specific high-dimesioal estimatio problems. For the other two loss fuctios, similar calculatio applies. It should be oted that Theorem 1 provides a local aalysis of the performace of the estimator for a give M, which is quite differet from a usual global aalysis over a large parameter space. The proof of Theorem 1 (ad Theorem 4 i Sectio 4) relies o the followig two key lemmas. The first oe is o the choice of the tuig parameter λ which is based o the followig lemma i the Gaussia case. Lemma 3 (Choice of Tuig Parameter) Cosider the liear iverse model (1) with Z N (0, σ2 I ). For ay δ > 0, with probability at least 1 exp( δ 2 /2), X (Z ) A σ {w(x A ) + δ sup X v l2 }. (21) v A This lemma is proved i Sectio 6. The particular value of λ A (X,σ,) for a rage of examples will be calculated i Sectio 3.4. The ext lemma addresses the local behavior of the liear operator X aroud the true parameter M uder the Gaussia esemble desig. We call a liear operator locally ear-isometric if the local isometry costats are uiformly bouded. The followig lemma tells us that i the most widely used Gaussia esemble case, the local isometry costats are guarateed to be bouded, give the sample size is at least of order [w(b p 2 T A (M))] 2. Hece, the difficulty of the problem is captured by the Gaussia width. Lemma 4 (Local Isometry Boud for Gaussia Esemble) Assume the liear operator X is the Gaussia esemble desig. Let 0 < c < 1 be a costat. For ay δ > 0, if 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c, the with probability at least 1 2exp( δ 2 /2), the local isometry costats are aroud 1 with φ A (M,X ) 1 c ad ψ A (M,X ) 1 + c. 3.2 Local Geometric Iferece: Cofidece Itervals ad Hypothesis Testig For statistical iferece o the geeral liear iverse model, we would like to choose the smallest η i (17) to esure that, uder the Gaussia esemble desig, the feasibility set for (17) is o-empty with high probability. The followig theorem establishes geometric iferece for Model (1). Theorem 2 (Geometric Iferece) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let ˆM R p,ω R p p be the solutio of (16) ad (17), ad let M R p be the 16

17 de-biased estimator as i (19). Assume p w 2 (B p 2 T A (M)). If the tuig parameters λ,η are chose with λ σ w(x A ), η 1 w(x A ), covex programs (16) ad (17) have o-empty feasibility set for Ω with high probability. The followig decompositio M M = + σ ΩX W (22) holds, where W N (0, I ) is the stadard Gaussia vector with ΩX W N (0,ΩX X Ω ). ad R p satisfies Suppose γ 2 A (M) λη σγ2 A (M)w 2 (X A ). γ 2 A lim (M)w 2 (X A ) = 0,,p the for ay v R p, v l2 = 1, v l0 k with k fiite, we have the asymptotic ormality for the fuctioal v, M, ( v, M v, M ) σ v [ΩX X Ω ]v,p N (0,1) (23) It follows from Theorem 2 that a valid asymptotic (1 α)-level cofidece itervals for M i,1 i p (whe v is take as e i i Theorem 2) is M ( i + Φ 1 α ) [ΩX X Ω ] i i σ, M ( i + Φ 1 1 α ) [ΩX X Ω ] i i σ. (24) 2 2 If we are iterested i a low-dimesioal liear cotrast v, M = v 0, v l2 = 1, v l0 = k with k fixed, cosider the hypothesis testig problem p p H 0 : v i M i = v 0 v.s. H α : v i M i v 0. i=1 i=1 The test statistic is ( ) v, M v 0 σ(v [ΩX X Ω ]v) 1/2 ad uder the ull, it follows a asymptotic stadard ormal distributio as. 17

18 Similarly, the p-value is of the form ( ( ) ) v, M 2 2Φ 1 v 0 σ(v [ΩX X Ω ]v) 1/2 as. Note the asymptotic ormality holds for ay fiite liear cotrast, ad the asymptotic variace early achieves the Fisher iformatio lower boud, as Ω is a estimate of the iverse of X X. For fixed dimesio iferece, Fisher iformatio lower boud is asymptotically optimal. Remark 1 Note that the coditio for estimatio cosistecy of the parameter M uder the l 2 orm is γ A (M)w(X A ) lim = 0.,p I cotrast, valid cofidece itervals require a stroger coditio γ 2 A lim (M)w 2 (X A ) = 0.,p I the case whe > p ad the Gaussia esemble desig, X X is o-sigular with high probability. With the choice of Ω = (X X ) 1 ad η = 0, for ay i [p], the followig equatio ( M i M i ) N (0,σ 2 [(X X ) 1 ] i i ) holds o-asymptotically. 3.3 Miimax Lower Boud for Local Taget Coe As see i Sectio 3.1 ad 3.2, the local taget coe plays a importat role i the upper boud aalysis. I this sectio, we are iterested i restrictig the parameter space to the local taget coe ad seeig how the geometry of the coe affects the miimax lower boud. Theorem 3 (Lower boud Based o Local Taget Coe) Suppose we observe (X, Y ) as i (1) with the Gaussia esemble desig ad Z N (0, σ2 I ). Let M be the true parameter of iterest. Let 0 < c < 1 be a costat. For ay δ > 0, if The with probability at least 1 2exp( δ 2 /2), if ˆM sup M T A (M) 4[w(B p 2 T A (M)) + δ] 2 c 2 1 c. ( E X ˆM M 2 l 2 c 0σ 2 p e(b (1 + c) 2 2 T ) 2 A (M)) for some uiversal costat c 0 > 0. Here E X stads for the coditioal expectatio give the desig matrix X, ad the probability statemet is with respect to the distributio of X uder the Gaussia esemble desig. 18

19 I the Gaussia settig, whe w 2 (B p 2 T A (M)), we have the followig observatios. From Theorem 1, the local upper boud is basically determied by γ 2 A (M)w 2 (X A ), which is of the rate w 2 (B p 2 T A (M)), as we will show i Sectio 3.4 i may examples. The geeral relatioship betwee these two quatities is give i Lemma 5 below. Lemma 5 For ay atom set A, we have the followig relatio γ A (M)w(A ) w(b p 2 T A (M)) where w( ) is the Gaussia width ad γ A (M) is defied i (15). Lemma 5 is proved i Appedix A. From Theorem 3, the miimax lower boud for estimatio over the local taget coe is determied by the Sudakov estimate e 2 (B p 2 T A (M)). A iterestig questio is: How are the two terms w(b p 2 T A (M)) ad e(b p 2 T A (M)) related to each other? It follows directly from Lemma 1 that there exists a uiversal costat c > 0 such that c e(b p 2 T A (M)) w(b p 2 T A (M)) 24 0 logn (B p 2 T A (M),ɛ)dɛ. Thus we have show that uder the Gaussia settig, both i terms of the upper boud ad lower boud, geometric complexity measures gover the difficulty of the estimatio problem, through closely related quatities Gaussia width ad Sudakov estimate. 3.4 Uiversality of the Geometric Approach I this sectio we apply the geeral theory uder the Gaussia settig to some of the actively studied high-dimesioal problems metioed i Sectio 1 to illustrate the wide applicability of the theory. The detail proofs are deferred to Appedix B High Dimesioal Liear Regressio We begi by cosiderig the high-dimesioal liear regressio model (2) uder the assumptio that the true parameter M R p is sparse, say M l0 = s. Our geeral theory applyig to the l 1 miimizatio recovers the optimality results as i Datzig selector ad Lasso. I this case, it ca be show that γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same rate s log p. See Sectio B for the detailed calculatios. The asphericity ratio γ A (M) 1 2 reflects the sparsity of M through the local taget coe ad s the Gaussia width w(x A ) log p. The followig corollary, proved i Sectio B, follows from the geometric aalysis of the high-dimesioal regressio model. Corollary 1 Cosider the high-dimesioal liear regressio model (2). Assume that X R p is the Gaussia esemble desig ad the parameter of iterest M R p is of sparsity s. Let ˆM be the solutio to 19

20 log p the costraied l 1 miimizatio (16) with λ = C 1 σ. If C 2s log p, the ˆM M l2 C 3 σ s log p, ˆM M l1 C 3 σs log p, X ( ˆM M) l2 C 3 σ s log p. with high probability, where C i > 0,1 i 3 are some uiversal costats. s log p For l 2 orm cosistecy of the estimatio for M, we require lim,p = 0. However, for valid iferetial guaratee, the de-biased Datzig selector type estimator M satisfies asymptotic ormality uder the coditio lim,p s log p = 0 through Theorem 2. Uder this coditio, the cofidece itervals give i (24) has asymptotic coverage probability of (1 α) ad its expected legth is at the parametric rate 1. Furthermore, the cofidece itervals do ot deped o the specific value of s. These properties are similar to the cofidece itervals costructed i Zhag ad Zhag (2014); va de Geer et al. (2014); Javamard ad Motaari (2014) Low Rak Matrix Recovery We ow cosider the recovery of low-rak matrices uder the trace regressio model (3). The geometric theory leads to the optimal recovery results as i uclear orm miimizatio ad pealized trace regressio i existig literatures. Assume the true parameter M R p q is of low rak i the sese that rak(m) = r. Let us examie the behavior of φ A (M,X ), γ A (M), ad λ A (X,σ,). Detailed calculatios give i Sectio B show that i this case γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same order r (p + q). The asphericity ratio γ A (M) 1 2 2r characterizes the low rak structure ad the Gaussia width w(x A ) p + q. We have the followig corollary for low rak matrix recovery. Corollary 2 Cosider the trace regressio model (3). Assume that X R pq is the Gaussia esemble desig ad the true parameter M R p q is of rak r. Let ˆM be the solutio to the costraied uclear p+q orm miimizatio (16) with λ = C 1 σ. If C 2r (p + q), the, with high probability, r (p + q) ˆM M F C 3 σ, p + q ˆM M C 3 σr, r (p + q) X ( ˆM M) l2 C 3 σ. 20

21 where C i > 0,1 i 3 are some uiversal costats. For poit estimatio cosistecy of M uder the Frobeius orm loss, the asymptotic coditio is lim,p,q r (p+q) 0. For statistical iferece, Theorem 2 requires lim,p,q = 0, which is essetially pq (sample size is larger tha the dimesio) for r = 1. This pheomeo happes whe the Gaussia width complexity of the rak-1 matrices is large, i.e., the atom set beig too rich. We would like to remark that i practice, covex program (18) ca still be used for costructig cofidece itervals ad performig hypothesis testig. However, it is harder to provide sharp boud theoretically for the approximatio error η i (18), for ay give r, p, q. r (p+q) = Sig Vector Recovery We tur to the sig vector recovery model (4) where the parameter of iterest M {+1, 1} p is a sig vector. The covex hull of the atom set (sig vectors) is the l orm ball ad the correspodig l orm miimizatio program is: { ˆM = argmi M l : X (Y X (M)) l1 λ }. (25) M Applyig the geeral theory to the l orm miimizatio leads to the rates of covergece for the sig vector recovery. The calculatios give i Sectio B show that the asphericity ratio γ A (M) 1 ad the Gaussia width w(x A ) p. Furthermore, γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same order p. Applyig the geometric theory to sig vector recovery leads to the followig result. Corollary 3 Cosider the model (4) where the true parameter M {+1, 1} p is a sig vector. Assume that X R p is the Gaussia esemble desig. Let ˆM be the solutio to the covex program (16) with λ = C 1 σ p. If C 2p, the, with high probability, where C > 0 is some uiversal costats. p ˆM M l2, ˆM M l, X ( ˆM M) l2 C σ, Orthogoal Matrix Recovery We ow treat orthogoal matrix recovery usig the spectral orm miimizatio. Please see Example 4 i Sectio 2.1 for details. The spectral orm miimizatio program is { ˆM = argmi M : X (Y X (M)) λ }. (26) M Cosider the same model as i trace regressio, but the parameter of iterest M R m m is a orthogoal matrix. Calculatios i Sectio B show that γ A (M)w(A ) ad w(b p 2 T A (M)) are of the same rate m 2. 21

22 Applyig the geometric aalysis to orthogoal matrix recovery usig the costraied spectral orm miimizatio yields the followig. Corollary 4 Cosider the orthogoal matrix recovery model (3). Assume that X R m2 is the Gaussia esemble matrix ad the true parameter M R m m is a orthogoal matrix. Let ˆM be the solutio to the m program (16) with λ = C 1 σ 2. If C 2m 2, the, with high probability, m ˆM M l2, ˆM M, X ( ˆM 2 M) l2 C σ, where C > 0 is some uiversal costats Other examples Other examples that ca be formalized uder the framework of the liear iverse model iclude permutatio matrix recovery (Jagabathula ad Shah, 2011), sparse plus low rak matrix recovery (Cadès et al., 2011) ad matrix completio (Cadès ad Recht, 2009). The covex relaxatio of permutatio matrix is double stochastic matrix; the atomic orm correspodig to sparse plus low rak atom set is the ifimal covolutio of the l 1 orm ad uclear orm; for matrix completio, the desig matrix ca be viewed as a diagoal matrix with diagoal elemets beig idepedet Beroulli radom variables. See Sectio 5 for a discussio o further examples. 4 Local Geometric Theory: Geeral Settig We have developed i the last sectio a local geometric theory for the liear iverse model i the Gaussia settig. The Gaussia assumptio o the desig ad oise eables us to carry out cocrete ad more specific calculatios as see i the examples give i Sectio 3.4, but the distributioal assumptio is ot essetial. I this sectio we exted this theory to the geeral settig. 4.1 Geeral Local Upper Boud We shall cosider a fixed desig matrix X. I the case of radom desig, results we will establish are coditioal o the desig. We coditio o the evet whe the oise is cotrolled X (Z ) A λ. We have see i Sectio 3.1 how to choose λ to make this happe with overwhelmig probability i Lemma 3 uder Gaussia oise. Theorem 4 (Geometrizig Local Covergece) Suppose we observe (X, Y ) as i (1). Coditio o the evet that the oise vector Z satisfies, for some give choice of localizatio radius λ X (Z ) A λ. 22

23 Let ˆM be the solutio to the covex program (16) with λ beig the tuig parameter. The the geometric quatities defied o the local taget coe capture the local covergece rate for ˆM, ˆM M l2 2 γ A (M) φ 2 A (M,X )λ, ˆM M A 2 γ2 A (M) φ 2 A (M,X )λ, X ( ˆM M) l2 2 γ A (M) φ A (M,X ) λ with the local asphericity ratio γ A (M) defied i (15) ad the local lower isometry costat φ A (M,X ) defied i (13). Remark 2 This theorem decomposes the estimatio ad predictio errors ito three geometric compoets. The tuig parameter λ ca be regarded as a localizatio radius aroud the true parameter it quatifies the ucertaity i estimatio for a give sample size. It is a global parameter which does ot deped o the local geometry. The other two geometric terms deped o the local taget coe geometry. For example, whe X is the Gaussia esemble desig, the the local lower isometry costat φ A (M,X ) is lower bouded by a costat uder certai coditios, which we have show i Lemma 4. The bouds 1 c φ A (M,X ) ψ A (M,X ) 1 + c hold for may differet radom desig matrices X. As we have see, Sectio 3.4 illustrates how this term behaves i several settigs. Aother observatio worth otig is that Theorem 4 holds determiistically uder the coditios o X (Z ) A ad φ A (M,X ). It does ot require distributioal assumptios o oise, or does it impose coditios o the desig matrix. Theorem 1 ca be viewed as a special case where the local isometry costat φ A (M,X ) ad the local radius λ are calculated explicitly uder the Gaussia assumptio. 4.2 Geeral Geometric Iferece Geometric iferece ca also be exteded for other fixed desig ad oise distributios. We ca modify the covex feasibility program (17) ito the followig stroger form { (Ω,η ) = argmi η : X X Ω i e i A η, 1 i p}. (27) Ω,η The the followig theorem holds (proof is aalogous to Theorem 2). Theorem 5 (Geometric Iferece) Suppose we observe (X, Y ) as i (1). Coditio o the evet that the oise vector Z satisfies, for some give choice of localizatio radius λ, X (Z ) A λ. Let ˆM be the solutio to the covex program (16) with λ beig the tuig parameter. Deote Ω ad η as the optimal 23

24 solutio to the covex program (27), ad M as the de-biased estimator. The followig decompositio M M = + σ ΩX W (28) holds, where W N (0, I ) is the stadard Gaussia vector ΩX W N (0,ΩX X Ω ) ad R p satisfies 2 γ2 A (M) φ A (M,X ) λ η. 4.3 Geeral Local Miimax Lower Boud The lower boud give i the Gaussia case ca also be exteded to the geeral settig where the class of oise distributios cotais the Gaussia distributios. We aim to geometrize the itrisic difficulty of the estimatio problem i a uified maer. We first preset a geeral result for a covex coe T i the parameter space, which illustrates how the Sudakov estimate, volume ratio ad the desig matrix affect the miimax lower boud. Theorem 6 (Miimax Lower Boud via Sudakov Estimate ad Volume Ratio) Let T R p be a compact covex coe. The miimax lower boud for the liear iverse model (1), if restricted to the coe T, is if ˆM sup M T ( E X ˆM M 2 l 2 c 0σ 2 p e(b ψ 2 2 T ) v(b p 2 T ) ) 2. where ˆM is ay measurable estimator, ψ = sup p v B 2 T X (v) l 2 ad c 0 is a uiversal costat. Here the otatio E X meas takig expectatio coditioed o the desig matrix X. e( ) ad v( ) deote the Sudakov estimate (see (10)) ad volume ratio (see (12)). Applyig the theorem to the local taget coe yields the followig corollary. Corollary 5 (Lower boud Based o Local Taget Coe) Assume T A (M) is the local taget coe of iterest. For for ay measurable estimator ˆM ad for parameters M T A (M), we have the followig miimax lower boud if ˆM sup M T A (M) E X ˆM M 2 l 2 ( c 0 σ 2 p e(b ψ 2 A (M,X ) 2 T A (M)) v(b p 2 T ) 2 A (M)) where ψ A (M,X ) is defied i (13). Here the otatio E X meas takig expectatio coditioed o the desig matrix X. 24

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS

SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS Submitted to the Aals of Statistics arxiv: arxiv:0000.0000 SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS By T. Toy Cai, Tegyua Liag ad Alexader Rakhli The Wharto

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research 5-207 Cofidece Itervals for High-Dimesioal Liear Regressio: Miimax Rates ad Adaptivity Toy Cai Uiversity of Pesylvaia Zijia

More information

High Dimensional Structured Superposition Models

High Dimensional Structured Superposition Models High Dimesioal Structured Superpositio Models Qilog Gu Dept of Computer Sciece & Egieerig Uiversity of Miesota, Twi Cities guxxx396@cs.um.edu Aridam Baerjee Dept of Computer Sciece & Egieerig Uiversity

More information

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications Semi-supervised Iferece for Explaied Variace i High-dimesioal Liear Regressio ad Its Applicatios T. Toy Cai ad Zijia Guo Uiversity of Pesylvaia ad Rutgers Uiversity March 8, 08 Abstract We cosider statistical

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

arxiv: v1 [math.pr] 4 Dec 2013

arxiv: v1 [math.pr] 4 Dec 2013 Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University

ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1. BY T. TONY CAI AND ZIJIAN GUO University of Pennsylvania and Rutgers University The Aals of Statistics 018, Vol. 46, No. 4, 1807 1836 https://doi.org/10.114/17-aos1604 Istitute of Mathematical Statistics, 018 ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION 1 BY T. TONY

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba 1 Pradeep Ravikumar 2 Marti J. Waiwright 1,3 Bi Yu 1,3 Departmet of EECS 1 Departmet of CS 2 Departmet

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations Differece Equatios to Differetial Equatios Sectio. Calculus: Areas Ad Tagets The study of calculus begis with questios about chage. What happes to the velocity of a swigig pedulum as its positio chages?

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size. Lecture 7: Measure ad Category The Borel hierarchy classifies subsets of the reals by their topological complexity. Aother approach is to classify them by size. Filters ad Ideals The most commo measure

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Lecture 3 : Random variables and their distributions

Lecture 3 : Random variables and their distributions Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

Minimal surface area position of a convex body is not always an M-position

Minimal surface area position of a convex body is not always an M-position Miimal surface area positio of a covex body is ot always a M-positio Christos Saroglou Abstract Milma proved that there exists a absolute costat C > 0 such that, for every covex body i R there exists a

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Rank tests and regression rank scores tests in measurement error models

Rank tests and regression rank scores tests in measurement error models Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness Lower bouds o miimax rates for oparametric regressio with additive sparsity ad smoothess Garvesh Raskutti 1, Marti J. Waiwright 1,2, Bi Yu 1,2 1 UC Berkeley Departmet of Statistics 2 UC Berkeley Departmet

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls Garvesh Raskutti Marti J. Waiwright, garveshr@stat.berkeley.edu waiwrig@stat.berkeley.edu Bi Yu, biyu@stat.berkeley.edu arxiv:090.04v

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm

Lecture 20. Brief Review of Gram-Schmidt and Gauss s Algorithm 8.409 A Algorithmist s Toolkit Nov. 9, 2009 Lecturer: Joatha Keler Lecture 20 Brief Review of Gram-Schmidt ad Gauss s Algorithm Our mai task of this lecture is to show a polyomial time algorithm which

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

A gentle introduction to Measure Theory

A gentle introduction to Measure Theory A getle itroductio to Measure Theory Gaurav Chadalia Departmet of Computer ciece ad Egieerig UNY - Uiversity at Buffalo, Buffalo, NY gsc4@buffalo.edu March 12, 2007 Abstract This ote itroduces the basic

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information