ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig Fially let f : [, ] R be a fuctio satisfyig E[W ] ad E[W 2 ] σ 2 <. f (t) f (s) L t s, t, s [, ], () where L > is a costat. A fuctio satisfyig coditio () is said to be Lipschitz o [, ]. Notice that such a fuctio ust be cotiuous, but it is ot ecessarily differetiable. A exaple of such a fuctio is depicted i Figure (a)..5.4.2.5.8.6.4.2.2.4.6.8 (a).2.4.6.8 (b) Figure : Exaple of a Lipschitz fuctio, ad our observatios settig. (a) rado saplig of f, the poits correspod to (X i, Y i ), i,..., ; (b) deteriistic saplig of f, the poits correspod to (i/, Y i ), i,...,. Note that E[Y X x] E[f (X) + W X x] E[f (x) + W X x] f (x) + E[W ] f (x). Cosider our usual setup: Estiate f usig traiig exaples {X i, Y i } i.i.d. i P XY, Y i f (X i ) + W i, i {,..., },

where i.i.d. eas idepedetly ad idetically distributed. Figure (a) illustrates this setup. For siplicity we will cosider a slightly differet settig. I ay applicatios we ca saple X [, ] as we like, ad ot ecessarily at rado. For exaple we ca take saples uiforly spaced o [, ] x i i, i,...,, Y i f (x i ) + W i ( ) i f + W i. We will proceed with this setup (as i Figure (b)) i the rest of the lecture. Our goal is to fid f such that E[ f f 2 ], as (here is the usual L 2 -or; i.e., f f 2 f (t) f (t) 2 dt). Let F {f : f is Lipschitz with costat L}. The Risk is defied as R(f) f f 2 f (t) f(t) 2 dt. The Expected Risk (recall that our estiator f is based o {x i, Y i } ad hece is a r.v.) is defied as Fially the Epirical Risk is defied as E[R( f )] E[ f f 2 ]. R (f) i ( f ( ) ) 2 i Y i. For the estiatio task we will use stair fuctios. Let N ad defie the class of piecewise costat fuctios F f : f(t) c j { j t< j }, c j R. F is the space of fuctios that are costat o itervals [ j,, j ), j,...,. Clearly if is rather large we ca approxiate alost ay bouded fuctio arbitrarily well. So it ake soe sese to use these classes to costruct a set of sieves. Let < 2 3 be a sequece of itegers satisfyig as. That is, for each value of there is a associated iteger value. Defie the Sieve F, F 2, F 3,..., F f : f(t) c j {t Ij,}, c j R. Fro here o we will use istead of ad istead of, for otatioal ease. Defie f(t) F to be a approxiatio of f, i particular f(t) c j {t Ij}, where c j ( ) i, N j i: i Ij f Where N j {i {,..., } : i }. Let be the uber of eleets of N j, ad assue is ot too large relative to so that >. I fact N j so as log as grows slightly slower tha we are okay. 2

Exercise Upper boud the error of approxiatio of f f 2. f f 2 f (t) f(t) 2 dt f (t) f(t) 2 dt f (t) c j 2 dt 2 f (t) ( i f dt ) i: i Ij ( ( )) 2 i f (t) f dt 2 ( ) i f (t) f dt 2 L dt dt. The above iplies that f f 2 as, sice as. I words, with sufficietly large we ca approxiate f to arbitrary accuracy usig odels i F (eve if the fuctios we are usig to approxiate f are ot Lipschitz!). Of course we caot copute f without kowig f, so let s use the data to fid a good odel i F. For ay f F, f c j {t Ij}, we have 2 R (f) c j {t Ij} Y i i (c j Y i ) 2. i: i Ij Let f arg i f F R (f). The f (t) ĉ j {t Ij}, where ĉ j Y i (2) N j i: i Ij 3

Exercise 2 Show (2). Note that E[ĉ j ] c j ad therefore E[ f (t)] f(t). Lets aalyze ow the expected risk of f : E[ f f 2 ] E[ f f + f f 2 ] f f 2 + E[ f f 2 ] + 2E[ f f, f f ] f f 2 + E[ f f 2 ] + 2 f f, E[ f f ] f f 2 + E[ f f 2 ], (3) where the fial step follows fro the fact that E[ f (t)] f(t). A couple of iportat rearks pertaiig the right-had-side of equatio (3): The first ter, f f 2, correspods to the approxiatio error, ad idicates how well ca we approxiate the fuctio f with a fuctio fro F. Clearly, the larger the class F is, the sallest we ca ake this ter. This ter is precisely the squared bias of the estiator f. The secod ter, E[ f f 2 ], is the estiatio error, the variace of our estiator. We will see that the estiatio error is sall if the class of possible estiators F is also sall. The behavior of the first ter i (3) was already studied. Cosider the other ter: [ ] E[ f f 2 ] E f(t) f (t) 2 dt E ( c j ĉ j ) 2 {t Ij}dt E ( c j ĉ j ) 2 dt for ay ɛ > provided / is large eough. Cobiig all the facts derived we have E[ f f 2 ] E [ ( c j ĉ j ) 2] 2 E (f (i/) Y i ) dt 2 E (W i ) dt σ 2 σ 2 / σ 2 / σ2 ( + ɛ)σ2, L2 2 + σ2 O ( { ax 2, }). (4) The otatio x O(y ) (that reads x is big-o y, or x is of the order of y as goes to ifiity ) eas that x Cy, where C is a positive costat ad y is a o-egative sequece. 4

What is the best choice of? If is sall the the approxiatio error (i.e., O(/ 2 )) is goig to be large, but the estiatio error (i.e., O(/)) is goig to be sall, ad vice-versa. This two coflictig goals provide a tradeoff that directs our choice of (as a fuctio of ). I Figure 2 we depict this tradeoff. I Figure 2(a) we cosidered a large value, ad we see that the approxiatio of f by a fuctio i the class F ca be very accurate (that is, our estiate will have a sall bias), but whe we use the easured data our estiate looks very bad (high variace). O the other had, as illustrated i Figure 2(b), usig a very sall allows our estiator to get very close to the best approxiatig fuctio i the class F, so we have a low variace estiator, but the bias of our estiator (i.e., the differece betwee f ad f ) is quite cosiderable..4.2.8.6.4.2.2.4.6.8 (a).4.2.8.6.4.2.2.4.6.8 (b) Figure 2: Approxiatio ad estiatio of f (i blue) for 6. The fuctio f is depicted i gree ad the fuctio f is depicted i red. I (a)we have 6 ad i (b) we have 6. We eed to balace the two ters i the right-had-side of (4) i order to axiize the rate of decay (with ) of the expected risk. This iplies that 2 therefore /3 ad the Mea Squared Error (MSE) is E[ f f 2 ] O( 2/3 ). So the sieve F, F 2, with /3 produces a F-cosistet estiator for f F. It is iterestig to ote that the rate of decay of the MSE we obtai with this strategy caot be further iproved by usig ore sophisticated estiatio techiques (that is, 2/3 is the iiax MSE rate for this proble). Also, rather surprisigly, we are cosiderig classes of odels F that are actually ot Lipschitz, therefore our estiator of f is ot a Lipschitz fuctio, ulike f itself. 5