Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Size: px
Start display at page:

Download "Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness"

Transcription

1 Lower bouds o miimax rates for oparametric regressio with additive sparsity ad smoothess Garvesh Raskutti 1, Marti J. Waiwright 1,2, Bi Yu 1,2 1 UC Berkeley Departmet of Statistics 2 UC Berkeley Departmet of Electrical Egieerig ad Computer Sciece Abstract We study miimax rates for estimatig high-dimesioal oparametric regressio models with sparse additive structure ad smoothess costraits. More precisely, our goal is to estimate a fuctio f : R p R that has a additive decompositio of the form f (X 1,...,X p ) = j S h j (X j), where each compoet fuctio h j lies i some class H of smooth fuctios, ad S {1,...,p} is a ukow subset with cardiality s = S. Give i.i.d. observatios of f (X) corrupted with additive white Gaussia oise where the covariate vectors (X 1,X 2,X 3,...,X p ) are draw with i.i.d. compoets from some distributio P, we determie lower bouds o the miimax rate for estimatig the regressio fuctio with respect to squared-l 2 (P) error. Our mai result is a lower boud o the miimax rate that scales as max ( s log(p/s),sǫ 2 (H) ). The first term reflects the sample size required for performig subset selectio, ad is idepedet of the fuctio class H. The secod term sǫ 2 (H) is a s-dimesioal estimatio term correspodig to the sample size required for estimatig a sum of s uivariate fuctios, each chose from the fuctio class H. It depeds liearly o the sparsity idex s but is idepedet of the global dimesio p. As a special case, if H correspods to fuctios that are m-times differetiable (a m th -order Sobolev space), the the s-dimesioal estimatio term takes the form sǫ 2 (H) s 2m/(2m+1). Either of the two terms may be domiat i differet regimes, depedig o the relatio betwee the sparsity ad smoothess of the additive decompositio. 1 Itroductio May problems i moder sciece ad egieerig ivolve high-dimesioal data, by which we mea that the ambiet dimesio p i which the data lies is of the same order or larger tha the sample size. A simple example is parametric liear regressio uder high-dimesioal scalig, i which the goal is to estimate a regressio vector β R p based o samples. I the absece of additioal structure, it is impossible to obtai cosistet estimators uless the ratio p/ coverges to zero which precludes the regime p. I may applicatios, it is atural to impose sparsity coditios, such as requirig that β have at most s o-zero parameters for some s p. The method of l 1 -regularized least squares, also kow as the Lasso algorithm [14], has bee show to have a umber of attractive theoretical properties for such high-dimesioal sparse models (e.g., [1, 19, 10]). Of course, the assumptio of a parametric liear model may be too restrictive for some applicatios. Accordigly, a atural extesio is the o-parametric regressio model y = f (x 1,...,x p )+w, where w N(0,σ 2 ) is additive observatio oise. Ufortuately, this geeral o-parametric model is kow to suffer severely from the curse of dimesioality, i that for most atural fuctio classes, the sample size required to achieve a give estimatio accuracy grows expoetially i the dimesio. This challege motivates the use of additive o-parametric models (see the book [6] ad refereces therei), i which the fuctio f is decomposed additively as a sum f (x 1,x 2,...,x p ) = p j=1 h j (x j) of uivariate fuctios h j. A atural sub-class of these 1

2 models are the sparse additive models, studied by Ravikumar et. al [12], i which where S {1,2,...,p} is some ukow subset of cardiality S = s. f (x 1,x 2,...,x p ) = j S h j(x j ), (1) A lie of past work has proposed ad aalyzed computatioally efficiet algorithms for estimatig regressio fuctios of this form. Just as l 1 -based relaxatios such as the Lasso have desirable properties for sparse parametric models, similar l 1 -based approaches have prove to be successful. Ravikumar et al. [12] propose a back-fittig algorithm to recover the compoet fuctios h j ad prove cosistecy i both subset recovery ad cosistecy i empirical L 2 (P ) orm. Meier et al. [9] propose a method that ivolves a sparsity-smoothess pealty term, ad also demostrate cosistecy i L 2 (P) orm. I the special case that H is a reproducig kerel Hilbert space (RKHS), Koltchiskii ad Yua [7] aalyze a least-squares estimator based o imposig a l 1 l H -pealty. The aalysis i these paper demostrates that uder certai coditios o the covariates, such regularized procedures ca yield estimators that are cosistet i the L 2 (P)-orm eve whe p. Of complemetary iterest to the rates achievable by practical methods are the fudametal limits of the estimatig sparse additive models, meaig lower bouds that apply to ay algorithm. Although such lower bouds are well-kow uder classical scalig (where p remais fixed idepedet of ), to the best of our kowledge, lower bouds for miimax rates o sparse additive models have ot bee determied. I this paper, our mai result is to establish a lower boud o the miimax rate i L 2 (P) orm that scales as max ( s log(p/s),sǫ 2 (H) ). The first term s log(p/s) is a subset selectio term, idepedet of the uivariate fuctio space H i which the additive compoets lie, that reflects the difficulty of fidig the subset S. The secod term sǫ 2 (H) i a s-dimesioal estimatio term, which depeds o the low dimesio s but ot the ambiet dimesio p, ad reflects the difficulty of estimatig the sum of s uivariate fuctios, each draw from fuctio class H. Either the subset selectio or s-dimesioal estimatio term domiates, depedig o the relative sizes of, p, ad s as well as H. Importatly, our aalysis applies both i the low-dimesioal settig ( p) ad the highdimesioal settig (p ) provided that,p ad s are goig to. Our aalysis is based o iformatiotheoretic techiques cetered aroud the use of metric etropy, mutual iformatio ad Fao s iequality i order to obtai lower bouds. Such techiques are stadard i the aalysis of o-parametric procedures uder classical scalig [5, 2, 17], ad have also bee used more recetly to develop lower bouds for high-dimesioal iferece problems [16, 11]. The remaider of the paper is orgaized as follows. I the ext sectio, the results are stated icludig appropriate prelimiary cocepts, otatio ad assumptios. I Sectio 3, we state the mai results, ad provide some comparisos to the rates achieved by existig algorithms. I Sectio 4, we provide a overview of the proof. We discuss ad summarize the mai cosequeces i Sectio 5. 2 Backgroud ad problem formulatio I this paper, we cosider a o-parametric regressio model with radom desig, meaig that we make observatios of the form y (i) = f (X (i) ) + w (i), for i = 1,2,...,. (2) Here the radom vectors X (i) R p are the covariates, ad have elemets X (i) j draw i.i.d. from some uderlyig distributio P. We assume that the oise variables w (i) N(0,σ 2 ) are draw idepedetly, ad idepedet of all X (i) s. Give a base class H of uivariate fuctios with orm H, cosider the class of fuctios f : R p R that have a additive decompositio: F := { p f : R p R f(x 1,x 2,...,x p ) = h j (x j ), ad h j H 1 j = 1,...,p }. j=1 Give some iteger s {1,...,p}, we defie the fuctio class F 0 (s), which is a uio of ( p s) s-dimesioal subspaces of F, give by F 0 (s) := { p f F I(h j 0) s }. (3) The miimax rate of estimatio over F 0 (s) is defied by the quatity mi bf max f F 0(s) E f f 2 L 2 (P), where the expectatio is take over the oise w, ad radomess i the samplig, ad f rages over all (measurable) j=1 2

3 fuctios of the observatios {(y (i),x (i) )} i=1. The goal of this paper is to determie lower bouds o this miimax rate. 2.1 Ier products ad orms Give a uivariate fuctio h j H, we defie the usual L 2 (P) ier product h j,h j L 2 (P) := h j (x)h j(x)dp(x). R (With a slight abuse of otatio, we use P to refer to the measure over R p as well as the iduced margial measure i each directio defied over R). Without loss of geerality (re-ceterig the fuctios as eeded), we may assume E[h j (X)] = h j (x)dp(x) = 0, R for all h j H. As a cosequece, we have E[f(X 1,...,X p )] = 0 for all fuctios f F 0 (s). Give our assumptio that the covariate vector X = (X 1,...,X p ) has idepedet compoets, the L 2 (P) ier product o F has the additive decompositio f,f L 2 (P) = p j=1 h j,h j L 2 (P). (Note that if idepedece were ot assumed the L 2 (P) ier product over F would ivolve cross-terms.) 2.2 Kullback-Leibler divergece Sice we are usig iformatio theoretic techiques, we will be usig the Kullback-Leibler (KL) divergece as a measure of distace betwee distributios. For a give pair of fuctios f ad f, cosider the -dimesioal vectors f(x) = ( f(x (1) ),f(x (2) ),...,f(x () ) ) T ad f(x) = ( f(x (1) ), f(x (2) ),..., f(x () ) ) T. Sice Y f(x) N(f(X),σ 2 I ) ad Y f(x) N( f(x),σ 2 I ), D(Y f(x) Y f(x)) = 1 2σ 2 f(x) f(x) 2 2. (4) We also use the otatio D(f f) to mea the average K-L divergece betwee the distributios of Y iduced by the fuctios f ad f respectively. Therefore we have the relatio D(f f) = E X [ D(Y f(x) Y f(x)) ] = 2σ 2 f f 2 L 2 (P). (5) This relatio betwee average K-L divergece ad squared L 2 (P) distace plays a importat role i our proof. 2.3 Metric etropy for fuctio classes I this sectio, we defie the otio of metric etropy, which provides a way i which to measure the relative sizes of differet fuctio classes with respect to some metric ρ. More specifically, cetral to our results is the metric etropy of F 0 (s) with respect to the L 2 (P) orm. Defiitio 1 (Coverig ad packig umbers). Cosider a metric space cosistig of a set S ad a metric ρ : S S R +. (a) A ǫ-coverig of S i the metric ρ is a collectio {f 1,...,f N } S such that for all f S, there exists some i {1,...,N} with ρ(f,f i ) ǫ. The ǫ-coverig umber N ρ (ǫ) is the cardiality of the smallest ǫ-coverig. (b) A ǫ-packig of S i the metric ρ is a collectio {f 1,...,f M } S such that ρ(f i,f j ) ǫ for all i j. The ǫ-packig umber M ρ (ǫ) is the cardiality of the largest ǫ-packig. The coverig ad packig etropy (deoted by log N ρ (ǫ) ad log M ρ (ǫ) respectively) are simply the logarithms of the coverig ad packig umbers, respectively. It ca be show that for ay covex set, the quatities log N ρ (ǫ) ad log M ρ (ǫ) are of the same order (withi costat factors idepedet of ǫ). 3

4 I this paper, we are iterested i packig (ad coverig) subsets of the fuctio class F 0 (s) i the L 2 (P) metric, ad so drop the subscript ρ from here owards. E route to characterizig the metric etropy of F 0 (s), we eed to uderstad the metric etropy of the uit balls of our uivariate fuctio class H amely, the sets B H (1) := {h H h H 1}. The metric etropy (both coverig ad packig etropy) for may classes of fuctios are kow. We provide some cocrete examples here: (i) Cosider the class H = {h β : R R h β (x) = βx} of all uivariate liear fuctios with the orm h β H = β. The it is kow [15] that the metric etropy of B H (1) scales as log M(ǫ; H) log(1/ǫ). (ii) Cosider the class H = {h : [0,1] [0,1] h(x) h(y) x y } of all 1-Lipschitz fuctios o [0,1] with the orm h H = sup x [0,1] h(x). I this case, it is kow [15] that the metric etropy scales as log M H (ǫ; H) 1/ǫ. Compared to the previous example of liear models, ote that the metric etropy grows much faster as ǫ 0, idicatig that the class of Lipschitz fuctios is much richer. (iii) Cosider the class of Sobolev spaces W m for m 1, cosistig of all fuctios that have m derivatives, ad the m th derivative is bouded i L 2 (P) orm. I this case, it is kow that log M(ǫ; H) ǫ 1 m (e.g., [3]). Clearly, icreasig the smoothess costrait m leads to smaller classes. Such Sobolev spaces are a particular class of fuctios whose packig/coverig etropy grows at a rate polyomial i 1 ǫ. I our aalysis, we require that the metric etropy of B H (1) satisfy the followig techical coditio: Assumptio 1. Usig log M(ǫ; H) to deote the packig etropy of the uit ball B H (1) i the L 2 (P)-orm, assume that there exists some α (0,1) such that log M(αǫ; H) lim ǫ 0 log M(ǫ; H) > 1. The coditio is required to esure that log M(cǫ)/log M(ǫ) ca be made arbitrarily small or large uiformly over small ǫ by chagig c, so that a boud due to Yag ad Barro [17] ca be applied. It is satisfied for most o-parametric classes, icludig (for istace) the Lipschitz ad Sobolev classes defied i Examples (ii) ad (iii) above. It may fail to hold for certai parametric classes, such as the set of liear fuctios cosidered i Example (i); however, we ca use a alterative techique to derive bouds for the parametric case (see Corollary 2). 3 Mai result ad some cosequeces I this sectio, we state our mai result ad the develop some of its cosequeces. We begi with a theorem that covers the fuctio class F 0 (s) i which the uivariate fuctio classes H have metric etropy that satisfies Assumptio 1. We state a corollary for the special cases of uivariate classes H with metric etropy growig polyomial i (1/ǫ), ad also a corollary for the special case of sparse liear regressio. Cosider the observatio model (2) where the covariate vectors have i.i.d. elemets X j P, ad the regressio fuctio f F 0 (s). Suppose that the uivariate fuctio class H that uderlies F 0 (s) satisfies Assumptio 1. Uder these coditios, we have the followig result: Theorem 1. Give i.i.d. samples from the sparse additive model (2), the miimax risk i squared-l 2 (P) orm is lower bouded as [ mi max E f σ f 2 2 ] f b F 2 (P) max slog(p/s) s, 0(s) ǫ2 (H), (6) where, for a fixed costat c, the quatity ǫ (H) = ǫ > 0 is largest positive umber satisfyig the iequality ǫ 2 2σ 2 log M( cǫ ). (7) For the case where H has a etropy that is growig to at a polyomial rate as ǫ 0 say log M(ǫ; H) = Θ(ǫ 1/m ) for some m > 1 2, we ca compute the rate for the s-dimesioal estimatio term explicitly. 4

5 Corollary 1. For the sparse additive model (2) with uivariate fuctio space H such that such that log M(ǫ; H) = Θ(ǫ 1/m ), we have [ mi max E f σ f 2 2 f b F 2 (P) max slog(p/s),c s ( σ 2 ] ) 2m 2m+1, (8) 0(s) 32 for some C > Some cosequeces I this sectio, we discuss some cosequeces of our results. Effect of smoothess: Focusig o Corollary 1, for spaces with m bouded derivatives (i.e., fuctios i the Sobolev space W m ), the miimax rate is 2m 2m+1 (for details, see e.g. Stoe [13]). Clearly, faster rates are obtaied for larger smoothess idices m, ad as m, the rate approaches the parametric rate of 1. Sice we are estimatig over a s-dimesioal space (uder the assumptio of idepedece), we are effectively estimatig s uivariate fuctios, each lyig withi the fuctio space H. Therefore the ui-dimesioal rate is multiplied by s. Smoothess versus sparsity: It is worth otig that depedig o the relative scaligs of s, ad p ad the metric etropy of H, it is possible for either the subset selectio term or s-dimesioal estimatio term to domiate the lower boud. I geeral, if log(p/s) = o(ǫ 2 (H)), the s-dimesioal estimatio term domiates, ad vice versa (at the boudary, either term determies the miimax rate). I the case of a uivariate fuctio class H with polyomial etropy as i Corollary 1, it ca be see that for = o((log(p/s)) 2m+1 ), the s-dimesioal estimatio term domiates while for = Ω((log(p/s)) 2m+1 ), the subset selectio term domiates. Rates for liear models: Usig a alterative proof techique (ot the oe used i this paper), it is possible [11] to derive the exact miimax rate for estimatio i the sparse liear regressio model, i which we observe y (i) = j S β j X (i) j + w (i), for i = 1,2,...,. (9) Note that this is a special case of the geeral model (2) i which H correspods to the class of uivariate liear fuctios (see Example (i)). Corollary 2. For sparse liear regressio model (9), the the miimax rate scales as max ( s log(p/s), s ). I this case, we see clearly the subset selectio term domiates for p, meaig the subset selectio problem is always harder (i a statistical sese) tha the s-dimesioal estimatio problem. As show by Bickel et al. [1], the rate achieved by l 1 -regularized methods is s log p uder suitable coditios o the covariates X. Upper bouds: To show that the lower bouds are tight, upper bouds that are matchig eed to be derived. Upper bouds (matchig up to costat factors) ca be derived via a classical iformatio-theoretic approach (e.g., [5, 2]), which ivolves costructig a estimator based o a coverig set ad boudig the coverig etropy of F 0 (s). While this estimatio approach does ot lead to a implemetable algorithm, it is a simple theoretical device to demostrate that lower bouds are tight. We tur our focus o implemetable algorithms i the ext poit. Compariso to existig bouds: We ow provide a brief compariso of the miimax lower bouds with upper bouds o rates achieved by existig implemetable algorithms provided by past work [12, 7, 9]. Ravikumar et al. [12] propose a back-fittig algorithm to miimize the least-squares objective with a sparsity costrait o the the fuctio f. The rates derived i Koltchiskii ad Yua [7] do ot match the lower bouds derived i Theorem 1. Further, it is difficult to directly compare the rates i Ravikumar et al. [12] ad Meier et al. [9] with our miimax lower bouds sice their aalysis does ot explicitly track the sparsity idex s. We are curretly i the process of coductig a thorough compariso with the above-metioed l 1 -based methods. 4 Proof outlie I this sectio, we provide a outlie of the proof of Theorem 1; due to space costraits, we defer some of the techical details to the full-legth versio. The proof is based o a combiatio of iformatio-theoretic 5

6 techiques ad the cocepts of packig ad coverig etropy, as defied previously i Sectio 2.3. First, we provide a high-level overview of the proof. The basic idea is to carefully choose two subsets T 1 ad T 2 of the fuctio class F 0 (s) ad lower boud the miimax rates over these two subsets. I Sectio 4.1, applicatio of the geeralized Fao method a techique based o Fao s iequality to the set T 1 defied i equatio (10) yields a lower boud o the subset selectio term. I Sectio 4.2, we apply a alterative method for obtaiig lower bouds over a secod set T 2 defied i equatio (11) that captures the difficulty of estimatig the sum of s uivariate fuctios.. The secod techique also exploits Fao s iequality but uses a more refied upper boud o the mutual iformatio developed by Yag ad Barro [17]. Before proceddig, we first ote that for ay T F 0 (s), we have mi F 2 (P) mi 0(s) T 2 (P). Moreover, for ay subsets T 1,T 2 F 0 (s), we have mi f b F 2 (P) max( mi 0(s) f b T 2 (P),mi ) 1 bf T 2 (P), 2 sice the boud holds for each of the two terms. We apply this lower boud usig the subsets T 1 ad T 2 defied i equatios (10) ad (11). 4.1 Boudig the complexity of subset selectio For part of the proof, we use the geeralized Fao s method [4], which we state below without proof. Give some parameter space, we let d be a metric o it. Lemma 1. (Geeralized Fao Method) For a give iteger r 2, cosider a collectio M r = {P 1,...,P r } of r probability distributios such that d(θ(p i ),θ(p j )) α r for all i j, ad the pairwise KL divergece satisfies D(P i P j ) β r for all i,j = 1,...,r. The the miimax risk over the family is lower bouded as max E j d(θ(p j ), θ) α r j 2 bf ( 1 β r + log 2 log r The proof of Lemma 1 ivolves applyig Fao s iequality over the discrete set of parameters θ Θ idexed by the set of distributios M r. Now we costruct the set T 1 which creates the set of probability distributios M r. Let g be a arbitrary fuctio i H such that g L 2 (P) = σ log (p/s) 4. The set T 1 is defied as { p } T 1 := f : f(x 1,X 2,...,X p ) = c j g(x j ),c j { 1,0,1} c 0 = s. (10) j=1 T 1 may be viewed as a hypercube of F 0 (s) ad will lead to the lower boud for the subset selectio term. This hypercube costructio is ofte used to prove lower bouds (see Yu [18]). Next, we require a further reductio of the set T 1 to a set A (defied i Lemma 2) to esure that elemets of A are well-separated i L 2 (P) orm. The costructio of A is as follows: Lemma 2. There exists a subset A T 1 such that: (i) log A 1 2 slog(p/s), (ii) f f 2 L 2 (P) σ2 s log(p/s) 16 f, f A, ad (iii) D(f f ) 1 8 slog(p/s) f, f A. The proof ivolves usig a combiatoric argumet to costruct the set A. For a argumet o how the set is costructed, see Küh [8]. For slog p s 8log 2, applyig the Geeralized Fao Method (Lemma 1) together with Lemma 2 yields the boud mi F 2 (P) mi 0(s) A 2 (P) σ2 slog(p/s). 32 This completes the proof for the subset selectio term ( s log(p/s) ) i Theorem 1. bf 6 ).

7 4.2 Boudig the complexity of s-dimesioal estimatio Next we derive a boud for the s-dimesioal estimatio term by determiig a lower boud over T 2. Let S be a arbitrary subset of s itegers i {1,2,..,p}, ad defie the set F S as T 2 := F S := { f F : f(x) = j S h j (X j ) }. (11) Clearly F S F 0 (s) meaig that mi F 2 (P) mi 0(s) F 2 (P). S We use a techique used i Yag ad Barro [17] to lower boud the miimax rate over F S. The idea is to costruct a maximal δ -packig set for F S ad a miimal ǫ -coverig set for F S, ad the to apply Fao s iequality to a carefully chose mixture distributio ivolvig the coverig ad packig sets (see the full-legth versio for details). Followig these steps yields the followig result: Lemma 3. mi F 2 (P) δ2 S 4 bf ( 1 log N(ǫ ; F S ) + ǫ 2 /2σ 2 ) + log 2. log M(δ ; F S ) Now we have a boud with expressios ivolvig the coverig ad packig etropies of the s-dimesioal space F S. The followig Lemma allows bouds o log M(ǫ; F S ) ad log N(ǫ; F S ) i terms of the uidimesioal packig ad coverig etropies respectively: Lemma 4. Let H be fuctio space with a packig etropy log M(ǫ; H) that satisfies Assumptio 1. The we have the bouds log M(ǫ; F S ) slog M(ǫ/ s; H), ad log N(ǫ; F S ) slog N(ǫ/ s; H). The proof ivolves costructig ǫ s - packig set ad coverig sets i each of the s dimesios ad displayig that these are ǫ-packig ad coverigs sets i F S (respectively). Combiig Lemmas 3 ad 4 leads to the iequality ( mi f b F 2 (P) δ2 1 slog N(ǫ / s; H) + ǫ 2 /2σ 2 ) + log 2 S 4 slog M(δ /. (12) s; H) Now we choose ǫ ad δ to meet the followig costraits: 2σ 2 ǫ2 slog N( ǫ s ; H), ad (13a) 4log N( ǫ s ; H) log M( δ s ; H). Combiig Assumptio 1 with the well-kow relatios log M(2ǫ; H) log N(2ǫ; H) log M(ǫ; H), we coclude that i order to satisfy iequalities (13a) ad (13b), it is sufficiet to choose ǫ = cδ for a costat c, ad the require that slog M( cδ s ; H) δ2 2σ. Furthermore, if we defie δ 2 / s = δ, the this iequality ca be re-expressed as log M(c δ ) f 2 δ 2σ. For 2 2σ ǫ 2 2 log 2, usig iequalities (13a) ad (13b) together with equatio (12) yields the desired rate thereby completig the proof. 5 Discussio mi F 2 (P) s δ S 16, I this paper, we have derived lower bouds for the miimax risk i squared L 2 (P) error for estimatig sparse additive models based o the sum of uivariate fuctios from a fuctio class H. The rates show that the estimatio problem effectively decomposes ito a subset selectio problem ad a s-dimesioal estimatio 2 (13b) 7

8 problem, ad the harder of the two problems (i a statistical sese) determies the rate of covergece. More cocretely, we demostrated that the subset selectio term scales as s log(p/s), depedig liearly o the umber of compoets s ad oly logarithmically i the ambiet dimesio p. This subset selectio term is idepedet of the uivariate fuctio space H. O the other had, the s-dimesioal estimatio term depeds o the richess of the uivariate fuctio class, measured by its metric etropy; it scales liearly with s ad is idepedet of p. Ogoig work suggests that our lower bouds are tight i may cases, meaig that the rates derived i Theorem 1 are miimax optimal for may fuctio classes. There are a umber of ways i which the work ca be exteded. Oe implicit ad strog assumptio i our aalysis was that the covariates X j, j = 1,2,...,p are idepedet. It would be iterestig to ivestigate the case whe the radom variables are edowed with some correlatio structure. Oe would expect the rates to chage, particularly if may of the variables are colliear. It would also be iterestig to develop a more complete uderstadig of whether computatioally efficiet algorithms [7, 12, 9] based o regularizatio achieve the lower bouds o the miimax rate derived i this paper. Refereces [1] P. Bickel, Y. Ritov, ad A. Tsybakov. Simultaeous aalysis of the Lasso ad Datzig selector. Aals of Statistics, To appear. [2] L. Birgé. Approximatio das les espaces metriques et theorie de l estimatio. Z. Wahrsch. verw. Gebiete, 65: , [3] M. S. Birma ad M. Z. Solomjak. Piecewise-polyomial approximatios of fuctios of the classes W α p. Math. USSR-Sborik, 2(3): , [4] T. S. Ha ad S. Verdu. Geeralizig the Fao iequality. IEEE Trasactios o Iformatio Theory, 40: , [5] R. Z. Has miskii. A lower boud o the risks of oparametric estimates of desities i the uiform metric. Theory Prob. Appl., 23: , [6] T. Hastie ad R. Tibshirai. Geeralized Additive Models. Chapma ad Hall Ltd, Boca Rato, [7] V. Koltchiskii ad M. Yua. Sparse recovery i large esembles of kerel machies. I Proceedigs of COLT, [8] T. Küh. A lower estimate for etropy umbers. Joural of Approximatio Theory, 110: , [9] L. Meier, S. va de Geer, ad P. Buhlma. High-dimesioal additive modelig. Aals of Statistics, To appear. [10] N. Meishause ad B.Yu. Lasso-type recovery of sparse represetatios for high-dimesioal data. Aals of Statistics, 37(1): , [11] G. Raskutti, M. J. Waiwright, ad B. Yu. Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls. Techical Report arxiv: , UC Berkeley, Departmet of Statistics, [12] P. Ravikumar, H. Liu, J. Lafferty, ad L. Wasserma. Sparse additive models. Joural of the Royal Statistical Society, To appear. [13] C. J. Stoe. Optimal global rates of covergece for oparametric regressio. Aals of Statistics, 10: , [14] R. Tibshirai. Regressio shrikage ad selectio via the lasso. Joural of the Royal Statistical Society, Series B, 58(1): , [15] S. va de Geer. Empirical Processes i M-Estimatio. Cambridge Uiversity Press, [16] M. J. Waiwright. Iformatio-theoretic bouds for sparsity recovery i the high-dimesioal ad oisy settig. IEEE Tras. Ifo. Theory, December Preseted at Iteratioal Symposium o Iformatio Theory, Jue [17] Y. Yag ad A. Barro. Iformatio-theoretic determiatio of miimax rates of covergece. Aals of Statistics, 27(5): , [18] B. Yu. Assouad, Fao ad Le Cam. Research Papers i Probability ad Statistics: Festschrift i Hoor of Lucie Le Cam, pages , [19] C. H. Zhag ad J. Huag. The sparsity ad bias of the lasso selectio i high-dimesioal liear regressio. Aals of Statistics, 36: ,

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Information-theoretic bounds on model selection for Gaussian Markov random fields

Information-theoretic bounds on model selection for Gaussian Markov random fields Iformatio-theoretic bouds o model selectio for Gaussia Markov radom fields Wei Wag, Marti J. Waiwright,, ad Kaa Ramchadra Departmet of Electrical Egieerig ad Computer Scieces, ad Departmet of Statistics

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls Garvesh Raskutti Marti J. Waiwright, garveshr@stat.berkeley.edu waiwrig@stat.berkeley.edu Bi Yu, biyu@stat.berkeley.edu arxiv:090.04v

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls TO APPEAR IN IEEE TRANS. OF INFORMATION THEORY Miimax rates of estimatio for high-dimesioal liear regressio over l -balls Garvesh Raskutti, Marti J. Waiwright, Seior Member, IEEE ad Bi Yu, Fellow, IEEE.

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba Departmet of EECS UC Berkeley sahad @eecs.berkeley.edu Marti J. Waiwright Departmet of Statistics

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Lecture #20. n ( x p i )1/p = max

Lecture #20. n ( x p i )1/p = max COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Lecture 9: Expanders Part 2, Extractors

Lecture 9: Expanders Part 2, Extractors Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming Joural of Machie Learig Research 3 (202) 389-427 Submitted 8/0; Revised 2/; Published 2/2 Miimax-Optimal Rates For Sparse Additive Models Over Kerel Classes Via Covex Programmig Garvesh Raskutti Marti

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Lecture 11: Decision Trees

Lecture 11: Decision Trees ECE9 Sprig 7 Statistical Learig Theory Istructor: R. Nowak Lecture : Decisio Trees Miimum Complexity Pealized Fuctio Recall the basic results of the last lectures: let X ad Y deote the iput ad output spaces

More information

Approximation by Superpositions of a Sigmoidal Function

Approximation by Superpositions of a Sigmoidal Function Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 22 (2003, No. 2, 463 470 Approximatio by Superpositios of a Sigmoidal Fuctio G. Lewicki ad G. Mario Abstract. We geeralize

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Random assignment with integer costs

Random assignment with integer costs Radom assigmet with iteger costs Robert Parviaie Departmet of Mathematics, Uppsala Uiversity P.O. Box 480, SE-7506 Uppsala, Swede robert.parviaie@math.uu.se Jue 4, 200 Abstract The radom assigmet problem

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

arxiv: v1 [math.st] 30 Nov 2017

arxiv: v1 [math.st] 30 Nov 2017 Phase Trasitios i Approximate Rakig Chao Gao arxiv:1711.11189v1 [math.st] 30 Nov 017 Uiversity of Chicago chaogao@galto.uchicago.edu December 1, 017 Abstract We study the problem of approximate rakig from

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,

More information

Modeling and Estimation of a Bivariate Pareto Distribution using the Principle of Maximum Entropy

Modeling and Estimation of a Bivariate Pareto Distribution using the Principle of Maximum Entropy Sri Laka Joural of Applied Statistics, Vol (5-3) Modelig ad Estimatio of a Bivariate Pareto Distributio usig the Priciple of Maximum Etropy Jagathath Krisha K.M. * Ecoomics Research Divisio, CSIR-Cetral

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information