Estimation of a regression function by maxima of minima of linear functions

Size: px
Start display at page:

Download "Estimation of a regression function by maxima of minima of linear functions"

Transcription

1 Estimatio of a regressio fuctio by maxima of miima of liear fuctios Adil M. Bagirov, Coy Clause 2 ad Michael Kohler 3 School of Iformatio Techology ad Mathematical Scieces, Uiversity of Ballarat, PO Box 663, Ballarat Victoria 3353 Australia, a.bagirov@ballarat.edu.au 2 Departmet of Mathematics, Uiversität des Saarlades, Postfach 550, D-6604 Saarbrücke Germay, clause@math.ui-sb.de 3 Departmet of Mathematics, Techische Uiversität Darmstadt, Schloßgartestr. 7, D Darmstadt, Germay, kohler@mathematik.tu-darmstadt.de Abstract Estimatio of a regressio fuctio from idepedet ad idetically distributed radom variables is cosidered. Estimates are defied by miimizatio of the empirical L 2 risk over a class of fuctios, which are defied as maxima of miima of liear fuctios. Results cocerig the rate of covergece of the estimates are derived. I particular it is show that for smooth regressio fuctios satisfyig the assumptio of sigle idex models, the estimate is able to achieve up to some logarithmic factor the correspodig optimal oe-dimesioal rate of covergece. Hece uder these assumptios the estimate is able to circumvet the so-called curse of dimesioality. The small sample behaviour of the estimates is illustrated by applyig them to simulated data. Key words ad phrases: Adaptatio, dimesio reductio, oparametric regressio, Ruig title: Estimatio of a regressio fuctio Please sed correspodece ad proofs to: Coy Clause, Departmet of Mathematics, Uiversität des Saarlades, Postfach 550, D-6604 Saarbrücke Germay, clause@math.uisb.de

2 rate of covergece, sigle idex model, L 2 error. MSC 2000 subject classificatios: Primary 62G08; secodary 62G05 Itroductio. Scope of this paper. This paper cosiders the problem of estimatig a multivariate regressio fuctio give a sample of the uderlyig distributio. I applicatios usually o a priori iformatio about the regressio fuctio is kow, therefore it is ecessary to apply oparametric methods for this estimatio problem. There are several established methods for oparametric regressio, icludig regressio trees like CART cf., Breima et al. 984, adaptive splie fittig like MARS cf., Friedma 99 ad least squares eural etwork estimates cf., e.g., Chapter i Hastie, Tibshirai ad Friedma 200. All these methods miimize a kid of least squares risk of the regressio estimate, either heuristically over a fixed ad very complex fuctio space as for eural etworks or over a stepwise defied data depedet space of piecewise costat fuctios or piecewise polyomials as for CART or MARS. I this paper we cosider a rather complex fuctio space cosistig of maxima of miima of liear fuctios, over which we miimize a least squares risk. Sice each maxima of miima of liear fuctios is i fact a cotiuous piecewise liear fuctio, we fit a liear splie fuctio with free kots to the data. But i cotrast to MARS, we do ot eed heuristics to choose these free kots, but use istead advaced methods of optimizatio theory of oliear ad ocovex fuctios to compute our estimate approximately i a applicatio..2 Regressio estimatio. I regressio aalysis a IR d IR-valued radom vector X, Y with EY 2 < is cosidered ad the depedecy of Y o the value of X is of iterest. More precisely, the goal is to fid a fuctio f : IR d IR such that fx is a good approximatio of Y. I the sequel we assume that the mai 2

3 aim of the aalysis is miimizatio of the mea squared predictio error or L 2 risk E{ fx Y 2. I this case the optimal fuctio is the so-called regressio fuctio m : IR d IR, mx = E{Y X = x. Ideed, let f : IR d IR be a arbitrary measurable fuctio ad deote the distributio of X by µ. The well-kow relatio E{ fx Y 2 = E{ mx Y 2 + fx mx 2 µdx cf. e.g., Györfi et al. 2002, eq.. implies that the regressio fuctio is the optimal predictor i view of miimizatio of the L 2 risk: E{ mx Y 2 = mi E{ fx Y 2. f:ir d IR I additio, ay fuctio f is a good predictor i the sese that its L 2 risk is close to the optimal value, if ad oly if the so-called L 2 error fx mx 2 µdx is small. This motivates to measure the error caused by usig a fuctio f istead of the regressio fuctio by the L 2 error. I applicatios, usually the distributio of X, Y ad hece also the regressio fuctio is ukow. But ofte it is possible to observe a sample of the uderlyig distributio. This leads to the regressio estimatio problem. Here X, Y, X, Y, X 2, Y 2,... are idepedet ad idetically distributed radom vectors. The set of data D = {X, Y,...,X, Y is give, ad the goal is to costruct a estimate m = m, D : IR d IR of the regressio fuctio such that the L 2 error m x mx 2 µdx 3

4 is small. For a detailed itroductio to oparametric regressio we refer the reader to the moography Györfi et al Defiitio of the estimate. I the sequel we will use the priciple of least squares to fit maxima of miima of liear fuctios to the data. More precisely, let K IN ad L,,...,L K, IN be parameters of the estimate ad set { F = f : IR d IR : fx = max mi a k,l x + b k,l x IR d k=,...,k l=,...,l k, for some a k,l IR d, b k,l IR where a k,l x = a k,l x a d k,l xd deotes the scalar product betwee a k,l = a k,l,...,ad k,l T ad x = x,...,x d T. For this class of fuctios the estimate m is defied by m = arg mi f F fx i Y i 2. 2 Here we assume that the miimum exists, however we do ot require that it is uique. I Sectio 2 we will aalyze the rate of covergece of a trucated versio of this least squares estimate defied by β z > β, m = T β m, where T β z = z β z β, β z < β for some β R +..4 Mai results. Uder a Sub-Gaussia coditio o the distributio of Y ad for bouded distributio of X we show that the L 2 error of the estimate achieves for p, C-smooth regressio fuctio with p 2 where roughly speakig all partial derivates of the regressio fuctio of order p exist the correspodig optimal rate of covergece i= 2p/2p+d 4

5 up to some logarithmic factor. For sigle idex models, where the regressio fuctio m satifies i additio mx = mβ T x x R d for some uivariate fuctio m ad some vector β R d, we show furthermore, that our estimate achieves up to some logarithmic factor the oe-dimesioal rate of covergece 2p/2p+. Hece uder these assumptios the estimate is able to circumvet the curse of dimesioality..5 Discussio of related results. I multivariate oparametric regressio fuctio estimatio there is a gap betwee theory ad practice: The established estimates like CART, MARS or least squares eural etworks are based o several heuristics for computig the estimates, which makes it basically impossible to aalyze their rate of covergece theoretically. However, if oe defies them without these heuristics, their rate of covergece ca be aalyzed ad this has bee doe for eural etworks, e.g., i Barro 993, 994 ad for CART i Kohler 999, but i this form the estimates caot be computed i a applicatio. For our estimate, a similar pheome occurs sice we eed heuristics to compute it approximately i a applicatio. The differece to the above established estimates is that we use heuristics from advaced optimizatio theory, i particular from the optimizatio theory of oliear ad ocovex optimizatio cf., e.g., Bagirov 999, 2002 ad Bagirov ad Udo 2006 istead of complicated heuristics from statistics for stepwise computatio as for CART or MARS, or a simple gradiet descet as for least squares eural etworks. It follows from Stoe 982 that the rates of covergece, which we derive, are optimal i some Miimax sese up to a logarithmic factor. The idea of imposig additioal restrictios o the structure of the regressio fuctio like additivity or like the assumptio i the sigle idex model ad to derive uder these assumptio better rates of covergece is due to Stoe 985,

6 We use a theorem of Lee, Bartlett ad Williamso 996 to derive our rate of covergece results. This approach is described i detail i Sectio.3 of Györfi at al Below we exted this approach to ubouded data which satisfies a Sub- Gaussia coditio by itroducig ew trucatio argumets. I this way we are able to derive the results uder similar geeral assumptios o the distributio of Y as with alterative methods from empirical process theory, see, e.g., the moography va de Geer 2000 or Kohler 2000, Maxima of miima of liear fuctios have bee used i regressio estimatio already i Beliakov ad Kohler There least squares estimates are derived by miimizig the empirical L 2 risk over classes of fuctios cosistig of Lipschitz smooth fuctios where a boud o the Lipschitz costat is give. It is show that the resultig estimate is i fact a maxima of miima of liear fuctios, where the umber of miima occurrig i the maxima is equal to the sample size. Additioal restrictios e.g. o the liear fuctios i the miima esure that there will be o overfittig. I cotrast, the umber of liear fuctios which we cosider i this article is much smaller ad restrictios o these liear fuctios are therefore ot ecessary. This seems to be promisig, because we do ot fit too may parameters to the data. I Corollary 2 we show that eve for large dimesio of X the L 2 error of our estimate coverges to zero quickly if the regressio fuctio satisfies the structural assumptio of sigle idex models. Similar results are show i Sectio 22.2 of Györfi et al However, i cotrast to the estimate defied there our ewly proposed estimate ca be computed i a applicatio which we will demostrate i Sectio 3. So the mai result here is to derive this good rate of covergece for a estimate which ca be computed i a applicatio..6 Notatios. The sets of atural umbers, atural umbers icludig zero, real umbers ad o-egative real umbers are deoted by N, N 0, R ad R +, respectively. For vectors x R we deote by x the Euclidia orm of x ad by x y the scalarproduct betwee x ad y. The least iteger greater or equal to a real umber 6

7 x will be deoted by x. logx deotes the atural logarithm of x > 0. For a fuctio f : R d R deotes the supremum orm. f = sup x R d fx.7 Outlie of the paper The mai theoretical result is formulated i Sectio 2 ad prove i Sectio 4. I Sectio 3 the estimate is illustrated by applyig it to simulated data. 2 Aalysis of the rate of covergece of the estimate Our first theorem gives a upper boud for the expected L 2 error of our estimate. Theorem. Let K, L,,..., L K, N, with K max{l,,..., L K, 2, ad set β = c log for some costat c > 0. Assume that the distributio of X, Y satifies E e c 2 Y 2 < 3 for some costat c 2 > 0 ad that the regressio fuctio m is bouded i absolute value. The for the estimate m defied as i Subsectio.3 E m x mx 2 µdx c 3 log 3 K k= L k, +E 2 if f F fx i Y i 2 i= for some costat c 3 > 0 ad hece also E mx i Y i 2 i= m x mx 2 µdx c 3 log 3 K k= L k, +2 if f F fx mx 2 µdx, where c 3 does ot deped o, β or the parameters of the estimate. 4 7

8 The coditio 3 is a modified Sub-Gaussia coditio ad it is particulary satisfied, if P Y X=x is the ormal distributio N mx,σ 2 ad the regressio fuctio m is bouded. This coditio allows us to cosider a ubouded coditioal distributio of Y. Together with a approximatio result this theorem implies the ext corollary, which cosiders the rate of covergece of the estimate. Here it is ecessary to impose smoothess coditios o the regressio fuctio. Defiitio. Let p = k + β for some k N 0 ad 0 < β ad let C > 0. A fuctio m : [a, b] d R is called p, C-smooth if for every α = α,..., α d, α i N 0, d j= α j = k the partial derivative k f x α... x α d d exists ad satisfies k f k f x α x... xα d x α z... xα d C x z β for all x, z [a, b] d. d Corollary. Assume that the distributio of X, Y satifies, that X [a, b] d a.s. for some a, b R, that the modified Sub-Gaussia coditio Eexpc 2 Y 2 < is fullfiled for some costat c 2 > 0 ad that m is p, C-smooth for some 0 < p 2 ad C >. Set β = c log for some c > 0, d/2p+d K = C 2d 2p+d ad L log 3 k, = L k = 2d + k =,..., K. The we have for the estimate m defied as above E m x mx 2 µdx cost C 2d 2p+d d log 3 2p 2p+d. The above rate of covergece coverges slowly to zero i case of large dimesio d of the predictor variable X so-called curse of dimesioality. Next we preset a result which shows that uder structural assumptios o the regressio fuctio 8

9 more precisly, for sigle idex models our estimate is able to circumvet the curse of dimesioality. Corollary 2. Assume that the distributio of X, Y satifies, that X [a, b] d a.s. for some a, b R ad that the modified Sub-Gaussia coditio Eexpc 2 Y 2 < is fullfiled for some costat c 2 > 0. Furthermore assume, that the regressio fuctio m satisfies mx = mα x x R d for some p, C-smooth fuctio m : R R ad some α R d. The for the estimate m as above ad with the settig β = c log for some c > 0, /2p+ K = C 2 2p+ ad L log 3 k, = L k = 3 k =,..., K we get E m x mx 2 µdx cost C 2 2p+ log 3 2p 2p+. Remark. It follows from Stoe 982 that uder the coditios of Corollary o estimate ca achieve i some Miimax sese a rate of covergece which coverges faster to zero tha 2p/2p+d cf., e.g., Chapter 3 i Györfi et al Hece Corollary implies, that our estimate has a optimal rate of covergece up to the logarithmic factor. Remark 2. I ay applicatio the smoothess of the regressio fuctio measured by p, C is ot kow i advace ad hece the parameters of the estimate have to be chose data-depedet. This ca be doe, e.g., by splittig of the sample, where the estimate is computed for various values of the parameters o a learig sample cosistig, e.g., of the first half of the data poits ad the parameters are chose 9

10 such that the empirical L 2 risk o a testig sample cosistig, e.g., of the secod half of the data poits is miimized cf., e.g., Chapter 7 i Györfi et al Theoretical results cocerig splittig of the sample ca be foud i Hamers ad Kohler 2003 ad Chapter 7 i Györfi et al Applicatio to simulated data I our applicatios we choose the umber of liear fuctios cosidered i the maxima ad the miima i a data-depedet way by splittig of the sample. We split the sample of size i a learig sample of size l < ad a testig sample of size t = l. We use the learig sample to defie for a fixed umber of liear fuctios K a estimate m l,k, ad compute the empirical L 2 risk of this estimate o the testig sample. Sice the testig sample is idepedet of the learig sample, this gives us a ubiased estimate of the L 2 risk of m l,k. The we choose K by miimizig this estimate with respect to K. I the sequel we use {500, 3000 ad t = l = /2. To compute the estimate for give umbers of liear fuctios we have to miimize i= max k=,...,k 2 mi a k,l x i + b k,l y i l=,...,l k for give fixed x,..., x IR d, y,...,y IR with respect to a k,l IR d, b k,l IR k =,...,K, l =,...,L k. Ufortuately, we caot solve this miimizatio problem exactly i geeral. The reaso is that the fuctio to be miimized is osmooth ad ocovex. Depedig o K ad L k it may have a large umber of variables more tha hudred eve i the case of uivariate data. The fuctio has may local miima ad their umber icreases drastically as the umber of maxima ad miima fuctios icreases. Most of the local miimizers do ot provide a good approximatio to the data ad 0

11 therefore oe is iterested to fid either a global miimizer or a miimizer which is ear to a global oe. Covetioal methods of global optimizatio are ot effective for miimizig of such fuctios, sice they are very time cosumig ad caot solve this problem i a reasoable time. Furthermore, the fuctio to be miimized is a very complicated osmooth fuctio ad the calculatio eve of oly oe subgradiet of such a fuctio is a difficult task. Therefore subgradiet-based methods of osmooth optimizatio are ot effective here. Eve though we caot solve this miimzatio problem exactly, we are able to compute the estimate approximately. For this we use the followig properties of the fuctio to be miimized: It is a semismooth fuctio cf., Miffli 977, moreover it is a smooth compositio of so-called quasidifferetiable fuctios see, Demyaov ad Rubiov 995 for the defiitio of quasidifferetiable fuctios. Therefore we ca use the discrete gradiet method from Bagirov 2002 to solve it. Furthermore, it is piecewise partially separable see Bagirov ad Ugo 2006 for the defiitio of such fuctios. We use the versio of the discrete gradiet method described i Bagirov ad Ugo 2006 for miimizig piecewise partially separable fuctios to solve it. The discrete gradiet method is a derivative-free method ad it is especially effective for miimizatio of osmooth ad ocovex fuctio whe the subgradiet is ot available or it is difficult to calculate the subgradiet. A detailed descriptio of the algorithm used to compute the estimate is give i Bagirov, Clause ad Kohler A implemetatio of the estimate i Fortra is available from the authors by request. I Bagirov, Clause ad Kohler 2007 the estimate is also compared to various other oparametric regressio estimates. I the sequel we will oly illustrate it by applyig it to a few simulated data sets. Here we defie X, Y by Y = mx + σ ǫ, where X is uiformly distributed o [ 2, 2] d, ǫ is stadard ormally distributed ad idepedet of X, ad σ 0. I Figures to 4 we choose d = ad σ =, ad use

12 four differet uivariate regressio fuctios i order to defie four differet data sets of size = 500. Each figures shows the true regressio fuctio together with its formula, a correspodig sample of size = 500 ad our estimate applied to this sample. mx=2*x^3 4*x y x = 500, sigma= Figure : Simulatio with the first uivariate regressio fuctio. Here the first two examples show how the maxmi-estimate looks like for rather simple regressio estimates, while i the third ad fourth example the regressio fuctio has some local irregularity. Here it ca be see that our ewly proposed estimate is able to adapt locally to such irregularities i the regressio fuctio. Next we cosider the case d = 2. I our fifth example we choose mx, x 2 = x six 2 x 2 six 2 2, = 5000 ad σ = 0.2. Figures 5 shows the regressio fuctio ad our estimate applied to a correspodig data set of sample size

13 mx=4* x *six*pi/2 y x = 500, sigma= Figure 2: Simulatio with the secod uivariate regressio fuctio. I our sixth example we choose mx, x 2 = x x 2 2, ad agai = 5000 ad σ = 0.2. Figures 6 shows the regressio fuctio ad our estimate applied to a correspodig data set of sample size I our seveth ad fial example we choose mx, x 2 = 6 2 mi3, 4 x x 2, ad agai = 5000 ad σ = 0.2. Figures 7 show the regressio fuctio ad our estimate applied to a correspodig data set of sample size From the last simulatio we see agai that our estimate is able to adapt to the local behaviour of the regressio fuctio. 3

14 mx=8 if x>0, mx=0 if x<=0 y x = 500, sigma= 4 Proofs Figure 3: Simulatio with the third uivariate regressio fuctio. I the proofs we eed the otatio of coverig umbers. Defiitio 2. Let x,..., x R d ad set x = x,..., x. Let F be a set of fuctios f : R d R. A L p -ǫ-cover of F o x is a fiite set of fuctios f,..., f k : R d R with the property mi j k /p fx i f j x i p < ǫ for all f F. 5 i= The L p -ǫ-coverig umber N p ǫ, F, x of F o x is the miimal size of a L p-ǫcover of F o x. I case that there exist o fiite L p-ǫ-cover of F the L p -ǫ-coverig umber of F o x is defied by N pǫ, F, x =. To get bouds for coverig umbers of sets of maxima of miima of liear fuctios we first show the coectio betwee the L p -ǫ-coverig umbers of sets 4

15 mx= 8*x 2*x^2/0.5+x^2 y x = 500, sigma= Figure 4: Simulatio with the fourth uivariate regressio fuctio. G, G 2,... ad the L p -ǫ-coverig umber of their maximum { max{g,..., G l = f : IR d IR : fx = max{g x,..., g l x ad miimum defied aalogously, respectively. for some g G,..., g l G l Lemma. Let G, G 2,..., G l be l sets of fuctios from R d to R ad let x = x,..., x R d R d be fixed poits i R d. The N p ǫ, max {G,..., G l,x l i= ǫ N p l /p,g i, x 6 ad N p ǫ, mi {G,..., G l,x l i= ǫ N p l /p,g i, x. 7 5

16 y y x x x2 x2 Figure 5: The bivariate regressio fuctio together with our max-mi-estimate i the fifth example. Proof. Iequaltity 6 follows from max g ix k max i=,...,l k= i=,...,l gj i i x k p /p max i=,...,l k= k= gi x k g j i i x k /p p l gi x k g j i i x k /p p i= l /p max i=,...,l k= g i x k g j i i x k p /p. Iequality 7 follows directly from 6 with mi {G,..., G l = max { G,..., G l. 6

17 y y x x x2 x2 Figure 6: The bivariate regressio fuctio together with our max-mi-estimate i the sixth example. I the ext lemma we boud the L p -ǫ-coverig umber of a trucated versio of our class F of fuctios. Lemma 2. Let x Rd... R d ad set L := max{l,,..., L K,. The for 0 < ǫ < β/2 N ǫ, T β F, x 3 6eβ ǫ K L 2d+2 PK k= L k,. Proof. I the first step of the proof, we show that we ca ivolve the trucatio operator ito the class of fuctios, i.e., we show 7

18 y y x x x2 x2 Figure 7: The bivariate regressio fuctio together with our max-mi-estimate i the seveth example. { T β F = f : IR d IR : fx = max mi T β a k,l x + b k,l 8 k K l L k, for some a k,l R d, b k,l R At the begiig we observe, that by mootoicity of the mappig x T β x the equality T β max i z i = max i T βz i 9 holds for real umbers z i R i =,...,. With mi i z i = max i z i ad T β z = T β z we get also T β mi i z i = mi i T βz i, 8

19 which implies 8. Set { G = g : IR d IR : gx = a k,l x + b k,l for some a k,l R d, b k,l R. From Theorem 9.4, Theorem 9.5 ad iequality 0.23 i Györfi et al we get N ǫ, T β G, x 3 4eβ ǫ log 6eβ d++. ǫ By applyig Lemma we get the desired result. With this boud of the coverig umber of T β F we ca ow start with the proof of Theorem. Proof of Theorem. I the proof we use the followig error decompositio: m x mx 2 µdx = = [ E { m X Y 2 D [ + E E { mx Y 2 ] E { m X T β Y 2 D E { m β X T β Y 2 { m X T β Y 2 D E { m β X T β Y 2 2 m ] X i T β Y i 2 m β X i T β Y i 2 i= [ + 2 m X i T β Y i 2 2 m β X i T β Y i 2 i= i= ] 2 m X i Y i 2 2 mx i Y i 2 i= i= [ ] + 2 m X i Y i 2 mx i Y i 2 i= i= 4 T i,, i= 9

20 where T β Y is the trucated versio of Y ad m β is the regressio fuctio of T β Y, i.e., { m β x = E T β Y X = x. We start with boudig T,. By usig a 2 b 2 = a ba + b we get T, = E { m X Y 2 m X T β Y 2 D E { mx Y 2 m β X T β Y 2 { = E T β Y Y 2m X Y T β Y D { E mx m β X + T β Y Y mx + m β X Y T β Y = T 5, + T 6,. With the Cauchy-Schwarz iequality ad I { Y >β expc 2/2 Y 2 expc 2 /2 β 2 0 it follows T 5, E{ T β Y Y 2 E{ 2m X Y T β Y 2 D E{ Y 2 I { Y >β E{2 2m X T β Y Y 2 D { E Y 2 expc 2/2 Y 2 expc 2 /2 β 2 E{2 2m X T β Y 2 D + 2E{ Y 2 { E Y 2 expc 2 /2 Y 2 exp c 2 β 2 23β E{ Y 2. With x expx for x R we get Y 2 2 c2 exp c 2 2 Y 2 { ad hece E Y 2 expc 2 /2 Y 2 is bouded by 2 E exp c 2 /2 Y 2 2 expc 2 /2 Y 2 E exp c 2 Y 2 c 4 c 2 c 2 20

21 which is less tha ifiity by the assumptios of the theorem. Furthermore the third term is bouded by 8β 2 + c 5 because E Y 2 E/c 2 expc 2 Y 2 c 5 < which follows agai as above. With the settig β = c log it follows for some costats c 6, c 7 > 0 T 5, c 4 exp c 6 log 2 8 c log 2 + c 5 c 7 log. From the Cauchy-Schwarz iequality we get 2E { mx m β X 2 T 6, E{ mx + mβ X Y T β Y + 2E { T β Y Y 2 2, where we ca boud the secod factor o the right had-side i the above iequality i the same way we have bouded the secod factor from T 5,, because by assumptio m is bouded ad furthermore m β is bouded by β. Thus we get for some costat c 8 > 0 E{ mx + mβ X Y T β Y 2 c 8 log. Next we cosider the first term. With the iequality from Jese it follows E { mx m β X 2 { E E Y T β Y 2 X = E { Y T β Y 2. Hece we get T 6, 4E { Y T β Y 2 c 8 log ad therefore with the calculatios from T 5, it follows T 6, c 9 log/ for some costat c 9 > 0. Alltogether we get T, c 0 log 2

22 for some costat c 0 > 0. Next we cosider T 2,. Let t > / be arbitrary. The { fx P{T 2, > t P f T β F : E T β Y 2 m β X E β β β fx i T β Y i 2 β i= β m β X i T β Y i 2 β β > t fx + E 2 T β Y 2 m β X E β β β β 2 Thus with Theorem.4 i Györfi et al ad { N δ, f : f F, x N δ β, F, x β, we get for x = x,..., x R d... R d t P{T 2, > t 4 sup N, T β F, x x 80β exp 536 β 2 T β Y β T β Y β From Lemma 2 we kow, that with L := max{l,,..., L K, for / < t < 40β t N, T β F, x 80β 6eβ 80β K L 3 t c PK k= L k, t. 2d+2 P K k= L k, for some sufficiet large c > 0. This iequality holds also for t 40β, sice the right-had side above does ot deped o t ad the coverig umber is decreasig i t. Usig this we get for arbitrary ǫ / ET 2, ǫ + ad this expressio is miimized for Alltogether we get ǫ P{T 2, > tdt = ǫ + 4 c P K k= L k, 536β2 exp ǫ = 536 β2 log 4 c P K k= L k,. 536β 2 ǫ 2 2. ET 2, c 2 log 3 K k= L k, 22

23 for some sufficiet large costat c 2 > 0, which does ot deped o, β or the parameters of the estimate. By boudig T 3, similarly to T, we get ET 3, c 3 log for some large eough costat c 3 > 0 ad hece we get over all 3 E T i, c 4 log 3 K k= L k, i= for some sufficiet large costat c 4 > 0. We fiish the proof by boudig T 4,. Let A be the evet, that there exists i {,..., such that Y i > β ad let I A be the idicator fuctio of A. The we get ET 4, 2 E m X i Y i 2 I A i= +2 E m X i Y i 2 I A c i= = 2 E m X Y 2 I A +2 E m X i Y i 2 I A c = T 7, + T 8,. i= mx i Y i 2 i= mx i Y i 2 i= With the Cauchy-Schwarz iequality we get for T 7, 2 T 7, E m X Y 2 2 PA E 2 m X Y 2 2 P{ Y > β E 8 m X Y 4 E expc 2 Y 2 expc 2 β 2, where the last iequality follows from iequality 0. With x expx for x R we get E Y 4 = E Y 2 Y 2 E 2 c2 exp c 2 2 Y 2 2 c2 exp Y 2 c 2 2 = 4 c 2 2 E exp c 2 Y 2, 23

24 which is less tha ifiity by coditio 3 of the theorem. Furthermore m is bouded by β ad therefore the first factor is bouded by c 5 β 2 = c 6 log 2 for some costat c 6 > 0. The secod factor is bouded by /, because by the assumptios of the theorem E exp c 2 Y 2 is bouded by some costat c 7 < ad hece we get E expc 2 Y 2 expc 2 β 2 c7 expc2 β 2 Sice exp c log 2 = O 2 for c > 0, we get alltogether c7 expc 2 c 2 log 2 /2. T 7, c 8 log2 2 c 9 log2. With the defiitio of A c ad m defied as i 2 it follows T 8, 2 E m X i Y i 2 I A c mx i Y i 2 i= i= 2 E m X i Y i 2 mx i Y i 2 i= i= 2 E if fx i Y i 2 mx i Y i 2, f F i= i= because T β z y z y holds for y β. Hece ET 4, c 9 log2 + 2E if fx i Y i 2 f F which completes the proof. I the sequel we will boud i= mx i Y i 2, i= if f F fx i Y i 2 i= mx i Y i 2. i= Therefore we will use the followig lemma. 24

25 Lemma 3. Let K IN ad let Π be a partitio of [a, b] d cosistig of K rectagulars. Assume that f li : [a, b] d R is a piecewise polyomial of degree M = i each coordiate with respect to Π ad assume that f is cotiuous. Furthermore let x,..., x R d be fixed poits i R d. The there exist liear fuctios such that f li z = max f,0,..., f,2d,..., f K,0,..., f K,2d : R d R, i=,...,k mi f i,kz for all z {x,..., x. k=0,..,2d Proof. Sice f li is a piecewise polyomial of degree it is of the shape K K d f li z = fi li z I Ai = α i,j z j + α i,0 I Ai i= i= for some costats α i,j R i =,..., K, j = 0,..., d, where Π = {A,..., A K is a partitio of [a, b] d ad for some uivariate itervals I j i edpoit of I j i A i = I i by a i,j ad b i,j, resp., i.e., j=... I d i i =,...,K. We deote the left ad the right I j i = [a i,j, b i,j or I j i = [a i,j, b i,j ]. This choice is without restrictio of ay kid because f li is cotiuous. Now we choose for every i {,..., K f i,0 x = f li i x = d α i,j x j + α i,0. This implies, that f i,0 ad the give piecewise polyomial f li match o A i for every i =,..., K. Furthermore for i =,..., K ad j =,..., d we defie where β i,j 0 is such that j= f i,2j x = f li i x + x j a i,j β i,j, f i,2j z f li z for all z = z,..., z d {x,..., x satisfyig z j < a i,j 25

26 ad f i,2j z f li z for all z = z,..., z d {x,..., x satisfyig z j > a i,j. The above coditios are satisfied, if β i,j max k=,...,;x j k a i,j For z j = a i,j obviously f i,2j z = fi li z. Aalogously we choose where γ i,j 0 is such that f li x k fi li x k. x j k a i,j f i,2j x = f li i x x j b i,j γ i,j, f i,2j z f li z for all z = z,..., z d {x,..., x satisfyig z j < b i,j ad f i,2j z f li z for all z = z,..., z d {x,..., x satisfyig z j > b i,j. I this case the coditios from above are satisfied, if fi li x k f li x k γ i,j max. k=,...,;x j k a i,j x j k b i,j From this choice of fuctios f i,j i =,..., K, j = 0,..., 2d results directly, that mi f = fi li z = f li z for z A i {x,..., x i,kz k=0,..,2d f li z for z {x,..., x holds for all i =,..., K, which implies the assertio. Proof of Corollary. Lemma 3 yields E 2 if fx i Y i 2 f F i= mx i Y i 2 i= 26

27 E 2 if fx i Y i 2 f G i= 2 if fx mx 2 µdx, f G mx i Y i 2 i= where G is the set of fuctios which cotais all cotiuous piecewise polyomials of degree with respect to a arbitrary partitio Π cosistig of K rectagulars. Next we icrease the right-had side above by choosig Π such that it cosists of equivolume cubes. Now we ca apply approximatio results from splie theory, see, e.g., Schumaker 98, Theorem 2.8 ad From this, the p, C smoothess of m ad Theorem we coclude for some sufficiet large costat c 20 > 0 E m x mx 2 µdx c 3 K 2d + log 3 c 20 C 2d 2p+d log 3 where the last iequaltity results from the choice of K. 2p 2p+d, + c 20 C 2 K 2p d Proof of Corollary 2. With the assumptios o the regressio fuctio m the secod term o the right-had side of iequality 4 equals E 2 if fx i Y i 2 mα X i Y i 2 f F i= ad with F := {max k=,...,k mi l=,...,lk a k,l x + b k,l, for some a k,l, b k,l R this expected value is less tha or equal to E 2 if hα X i Y i 2 h F i= because for every fuctio h F ad every vector α R d i= mα X i Y i 2, i= fx = hα x x R d is cotaied i F. Together with Lemma 3 this yields to E 2 if fx i Y i 2 mx i Y i 2 f F i= 27 i=

28 E 2 if hα X i Y i 2 h G i= 2 if hα x mα x 2 µdx h G 2 if max h G 2 if h G hα x mα x 2 x [a,b] d max hx mx 2, x [â,ˆb] mα X i Y i 2 i= where G is the set of fuctios from R to R which cotais all piecewise polyomials of degree oe with respect to a partitio of [â,ˆb] cosistig of K itervals. Here [â,ˆb] is chose such that α x [â,ˆb] for x [a, b] d. Hece agai with the approximatio result from splie theory we get as i the proof of Corollary for some sufficietly large costat c 2 E 2 if f F fx i Y i 2 i= mx i Y i 2 i= c 2 C 2 K 2p. Summarizig the above results we get by Theorem E m x mx 2 µdx c 3 log 3 K k= L k, log c 22 C 2/2p+ 3 + c 2 C 2 K 2p 2p 2p+. Refereces [] Bagirov, A. M Miimizatio methods for oe class of osmooth fuctios ad calculatio of semi-equilibrium prices. I: A. Eberhard et al. eds. Progress i Optimizatio: Cotributio from Australia, Kluwer Academic Publishers, 999, pp [2] Bagirov, A. M A method for miimizatio of quasidifferetiable fuctios. Optimizatio Methods ad Software 7, pp

29 [3] Bagirov, A. M., Clause, C., ad Kohler, M A algorithm for the estimatio of a regressio fuctio by cotiuous piecewise liear fuctios. Submitted for publicatio. [4] Bagirov, A. M., ad Ugo, J Piecewise partially separable fuctios ad a derivative-free method for large-scale osmooth optimizatio. Joural of Global Optimizatio 35, pp [5] Barro, A. R Uiversal approximatio bouds for superpositios of a sigmoidal fuctio. IEEE Trasactios o Iformatio Theory 39, pp [6] Barro, A. R Approximatio ad estimatio bouds for eural etworks. Neural Networks 4, pp [7] Beliakov, G., ad Kohler, M Estimatio of regressio fuctios by Lipschitz cotiuous fuctios. Submitted for publicatio. [8] Breima, L., Friedma, J. H., Olshe, R. H. ad Stoe, C. J Classificatio ad regressio trees. Wadsworth, Belmot, CA. [9] Demyaov, V.F., ad Rubiov, A.M Costructive Nosmooth Aalysis. Peter Lag, Frakfurt am Mai, 995. [0] Friedma, J. H. 99. Multivariate adaptive regressio splies with discussio. Aals of Statistics 9, pp. -4. [] Györfi, L., Kohler, M., Krzyżak, A., ad Walk, H A Distributio-Free Theory of Noparametric Regressio. Spriger Series i Statistics, Spriger. [2] Hamers, M. ad Kohler, M A boud o the expected maximal deviatios of sample averages from their meas. Statistics & Probability Letters 62, pp [3] Hastie, T., Tibshirai, R. ad Friedma, J The elemets of statistical learig. Spriger-Verlag, New York. 29

30 [4] Kohler, M Noparametric estimatio of piecewise smooth regressio fuctios. Statistics & Probability Letters 43, pp [5] Kohler, M Iequalities for uiform deviatios of averages from expectatios with applicatios to oparametric regressio. Joural of Statistical Plaig ad Iferece 89, pp [6] Kohler, M Noparametric regressio with additioal measuremets errors i the depedet variable. Joural of Statistical Plaig ad Iferece 36, pp [7] Lee, W. S., Bartlett, P. L., Williamso, R. C Efficiet agostic learig of eural etworks with bouded fa i. IEEE Tras. Iform. Theory 42, pp [8] Miffli, R Semismooth ad semicovex fuctios i costraied optimizatio. SIAM Joural o Cotrol ad Optimizatio 5, pp [9] Schumaker, L., 98. Splie fuctios: Basic Theory. Wiley, New York. [20] Stoe, C.J Optimal global rates of covergece for oparametric regressio. Aals of Statistics 0, pp [2] Stoe, C. J Additive regressio ad other oparametric models. Aals of Statistics 3, pp [22] Stoe, C.J The use of polyomial splies ad their tesor products i multivariate fuctio estimatio. Aals of Statistics 22, pp [23] va de Geer, S Empirical Processes i M estimatio. Cambridge Uiversity Press. 30

Estimation of the essential supremum of a regression function

Estimation of the essential supremum of a regression function Estimatio of the essetial supremum of a regressio fuctio Michael ohler, Adam rzyżak 2, ad Harro Walk 3 Fachbereich Mathematik, Techische Uiversität Darmstadt, Schlossgartestr. 7, 64289 Darmstadt, Germay,

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Nonparametric estimation of conditional distributions

Nonparametric estimation of conditional distributions Noparametric estimatio of coditioal distributios László Györfi 1 ad Michael Kohler 2 1 Departmet of Computer Sciece ad Iformatio Theory, udapest Uiversity of Techology ad Ecoomics, 1521 Stoczek, U.2, udapest,

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Approximation by Superpositions of a Sigmoidal Function

Approximation by Superpositions of a Sigmoidal Function Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 22 (2003, No. 2, 463 470 Approximatio by Superpositios of a Sigmoidal Fuctio G. Lewicki ad G. Mario Abstract. We geeralize

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Nonparametric estimation of a maximum of quantiles

Nonparametric estimation of a maximum of quantiles Noparametric estimatio of a maximum of quatiles Georg C. Ess 1, Beedict Götz 1, Michael Kohler 2, Adam Krzyżak 3, ad Rolad Platz 4. 1 Fachgebiet Systemzuverlässigkeit ud Maschieakustik SzM, Techische Uiversität

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

A multivariate rational interpolation with no poles in R m

A multivariate rational interpolation with no poles in R m NTMSCI 3, No., 9-8 (05) 9 New Treds i Mathematical Scieces http://www.tmsci.com A multivariate ratioal iterpolatio with o poles i R m Osma Rasit Isik, Zekeriya Guey ad Mehmet Sezer Departmet of Mathematics,

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

MAS111 Convergence and Continuity

MAS111 Convergence and Continuity MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems

McGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial. Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

A remark on p-summing norms of operators

A remark on p-summing norms of operators A remark o p-summig orms of operators Artem Zvavitch Abstract. I this paper we improve a result of W. B. Johso ad G. Schechtma by provig that the p-summig orm of ay operator with -dimesioal domai ca be

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

The Choquet Integral with Respect to Fuzzy-Valued Set Functions

The Choquet Integral with Respect to Fuzzy-Valued Set Functions The Choquet Itegral with Respect to Fuzzy-Valued Set Fuctios Weiwei Zhag Abstract The Choquet itegral with respect to real-valued oadditive set fuctios, such as siged efficiecy measures, has bee used i

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck! Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Preponderantly increasing/decreasing data in regression analysis

Preponderantly increasing/decreasing data in regression analysis Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information