Week 6. Intuition: Let p (x, y) be the joint density of (X, Y ) and p (x) be the marginal density of X. Then. h dy =

Size: px

Start display at page:

Download "Week 6. Intuition: Let p (x, y) be the joint density of (X, Y ) and p (x) be the marginal density of X. Then. h dy ="

Bethanie Harrell
5 years ago
Views:

1 Week 6 Lecture Kerel regressio ad miimax rates Model: Observe Y i = f (X i ) + ξ i were ξ i, i =,,,, iid wit Eξ i = 0 We ofte assume X i are iid or X i = i/ Nadaraya Watso estimator Let w i (x) = K ( X i x) Te Nadaraya-Watso estimator is ˆf NW (x) = { P i= Yiwi P i= wi, i= w i 0 0, oterwise Ituitio: Let p (x, y) be te joit desity of (X, Y ) ad p (x) be te margial desity of X Te yp (x, y) dy yˆp (x, y) dy f (x) = E (Y X = x) = p (x) ˆp (x) = i= K ( ) X i x ( ) yk Yi y dy i= K ( X i x) i= K ( X i x) [ ( ) Yi K Yi y dy ( ) ] Y i y K Yi y dy = i= K ( ) X i x i= = Y ik ( ) X i x i= K ( X i x) were te last idetity follows from te fact tat K (y) = ad yk (y) dy = 0 Remark: Te Nadaraya-Watso estimator is a solutio of te followig miimizatio problem ˆf NW (x) = arg mi θ (Y i θ) w i (x) Startig from tis equatio we may itroduce te local polyomial metod i= Rate of covergece Assume tat F = { f, sup x f (m) (x) } M, 0 < ɛ p (x) M ad Eξi < We may sow ( if sup E f (x) f (x)) C m/(m+) bf f F

2 Upper boud: But to simplify our calculatios we may assume X i are iid U (0, ) Let ˆf NW (x) = ( ) Xi x Y i K Bias: i= ( ) NW E ˆf (x) f (x) = t x f (t) K f (x) C m Variace: ( ) V ar ˆf NW (x) = ( ( )) V ar Y Xi x ik [ ( )] E Y Xi x ik [ E ( f (X i ) + ξ i ) K ( Xi x )] Te sup f F E ( ˆf NW Questios: ) (x) f (x) C m/(m+) Wat about lower boud? Similarly to desity estimatio case Adaptive estimatio? Estimatio uder sup-orm? Estimatio at a poit? Adaptive estimatio? Very similar results to desity estimatio! Asymptotic equivalece Geeral Teory: Le Cam (986) Too ard to read!? Desity estimatio Poisso E : Y, Y,, Y iid wit desity f (y), y [0, ] F : Y, Y,, Y N iid wit desity f (y), N P oisso (), y [0, ] Gaussia wite oise G : dy (t) = f (t) dt + ɛdw (t), ɛ = /

3 Gaussia regressio Spectral desity estimatio H : y i = f (i/) + z i, z i N (0, ) I : Y, Y,, Y, a statioary cetered Gaussia sequece wit spectral desity f ad more models: expoetial family regressio, geeral locatio models Brow ad Low (996): G ad H are asymptotically equivalet uder te assumptio f Hölder(α, M) wit α > / Nussbaum (996), Brow, Carter, Low ad Zag (004): E ad H (replacig f by f, ad ɛ by ɛ) are asymptotically equivalet uder te assumptio f Hölder(α, M) wit α > / ad f is bouded away from 0 3 Low ad Zou (005): E ad F are asymptotically equivalet uder te assumptio f a compact subset of C([0, ] d ) 4 Golubev, Nussbaum ad Zou (009): I ad H (replacig f by log f) are asymptotically equivalet uder te assumptio f Hölder(α, M) wit α > / ad f is bouded away from 0 5 Grama ad Nussbaum (998, 00): more geeral models 3

4 Lecture Gaussioizatio Desity estimatio ad Gaussia regressio Variace stabilizig trasformatios See Hoyle (973) for a review of te extesive literature See also Efro (98) ad Bar-Lev ad Eis (990) For Poisso distributios Bartlett (936) was te first to propose te root trasform X i a omoscedastic liear model were X Poisso(λ) Ascombe (948) proposed improvig te variace stabilizig properties by istead usig X Te costat 3 8 is cose to optimally stabilize te variace usig te Taylor expasio Ascombe s variace stabilizig trasformatio as also bee briefly discussed i Dooo (993) for desity estimatio I te cotext of oparametric desity estimatio, i compariso to variace stabilizatio, mea matcig is more importat A mea-matcig root trasform is eeded for miimizig te bias as well as stabilizig te variace Te goal of mea matcig is to coose a costat c so tat te mea of X + c is closest to λ Te followig lemma gives te expasios of te mea ad variace of root trasform of te form X + c were c is a costat It ca be see easily tat c = 4 is te optimal coice for miimizig te bias E( X + c) λ i te first order Lemma Let X Poisso(λ) wit λ > 0 ad let c 0 be a costat Te E( X + c) = λ + 4c 8 Var( X + c) = c 3 λ 6c 4c λ 3 + O(λ 5 ) λ + 3c 5c + 7 λ + O(λ 3 ) 8 I particular, for c = 4 ( ) E X + 4 ( ) Var X + 4 = λ 64 λ 3 + O(λ 5 ) = λ λ + O(λ 3 ) Desity estimatio troug regressio Desity estimatio E : X, X,, X iid wit desity f (x), x [0, ] Poisso (very close to desity estimatio) F : X, X,, X N iid wit desity f (x), N P oisso (), x [0, ] 4

5 Te first step of te procedure is biig Let T be some positive iteger (Te coice of T will be discussed later) Divide {X i } ito T equal legt subitervals betwee 0 ad Let Q,, Q T be te umber of observatios i eac of te subitervals Te Q i s joitly ave a multiomial distributio Note tat if te sample size is Poissoized, tat is, it is ot fixed but a Poisso radom variable wit mea ad idepedet of te X i s, te te couts {Q i : i =,, T } are idepedet Poisso radom variables wit Q i Poisso(mp i ) were p i = T i T i T f(x)dx We te apply te mea-matcig root trasform discussed i Sectio?? Set Y i = Q i + 4, were Q i = Card({k : X k I i }), i =,, T, ad treat Y = (Y, Y,, Y T ) as te ew equi-spaced sample for a oparametric regressio problem Troug biig ad te root trasform te desity estimatio problem as ow bee trasferred to a equi-spaced, early costat variace oparametric regressio problem Ay good oparametric regressio procedure, suc as a kerel, splie or wavelet procedure, ca be applied to yield a estimator f of f Te fial desity estimator ca be obtaied by ormalizig te square of f Algoritmically, te root-uroot desity estimatio procedure ca be summarized as follows Biig: Divide {X i } ito T equal legt itervals betwee 0 ad Let Q, Q,, Q T be te umber of observatios i eac of te itervals Root Trasform: Let Y i = Q i + 4, i =,, T, ad treat Y = (Y, Y,, Y T ) as te ew equi-spaced sample for a oparametric regressio problem 3 Noparametric Regressio: Apply your favorite oparametric regressio procedure to te bied ad root trasformed data Y to obtai a estimate f of f 4 Uroot: Te desity fuctio f is estimated by f = ( f) 5 Normalizatio: Te estimator ˆf give i Step 4 may ot itegrate to Set f(t) = f(t)/ f(t)dt 0 ad use f as te fial estimator Teoretical property 5

6 We sall use te quatile couplig iequality of Komlós, Major ad Tusády (975) to approximate te bied ad root trasformed data by idepedet ormal variables Te followig lemma is a direct cosequece of te results give i Komlós, Major ad Tusády (975) ad Zou (006) Lemma Let λ > 0 ad let X Poisso(λ) Tere exists a stadard ormal radom variable Z N(0, ) ad costats c, c, c 3 > 0 ot depedig o λ suc tat weever te evet A = { X λ c λ} occurs, X λ λz < c Z + c 3 Propositio 3 Let m = /T We ca write Y i as Y i = mp i + ɛ i + Z i + ξ i, i =,,, T, were Z i iid N(0, ), ɛ i are costats satisfyig ɛ i 64 (mp i) 3 (+o()) ad cosequetly for some costat C > 0 T ɛ i C m 4, i= ad ξ i are idepedet ad stocastically small radom variables satisfyig E ξ i l C l (mp i ) l ad P ( ξ i > a) C l (a mp i ) l were l > 0, a > 0 ad C l > 0 is a costat depedig o l oly 3 Gaussia regressio ad Expoetial family regressio Mea-matcig variace stabilizig trasformatio Let X, X,, X m be a radom sample from a distributio i a atural expoetial families wit te probability desity/mass fuctio q(x η) = e ηx ψ(η) (x) Here η is called te atural parameter Te mea ad variace are respectively µ(η) = ψ (η), ad σ (η) = ψ (η) A special subclass of iterest is te oe wit a quadratic variace fuctio (QVF), σ V (µ) = a 0 + a µ + a µ I tis case we sall write X i NQ(µ) Te NEF-QVF families cosist of six distributios, tree cotiuous: ormal, gamma, ad NEF-GHS distributios ad tree discrete: biomial, egative biomial, ad Poisso See, eg, Morris (98) ad Brow (986) 6

7 Set X = m i= X i Accordig to te Cetral Limit Teorem, L m(x/m µ(η)) N(0, V (µ(η))), as m A variace stabilizig trasformatio (VST) is a fuctio G : R R suc tat G (µ) = V (µ) Te stadard delta metod te yields L m{g(x/m) G(µ(η))} N(0, ) It is kow tat te variace stabilizig properties ca ofte be furter improved by usig a trasformatio of te form H m (X) = G( X + a m + b ) wit suitable coice of costats a ad b See, eg, Ascombe (948) I tis paper we sall use te VST as a tool for oparametric regressio i expoetial families For tis purpose, it is more importat to optimally matc te meas ta to optimally stabilize te variace Tat is, we wis to coose te costats a ad b suc tat E{H m (X)} optimally matces G(µ(η)) To derive te optimal coice of a ad b, we eed te followig expasios for te mea ad variace of te trasformed variable H m (X) Lemma 4 Let Θ 0 be a compact subset of te atural parameter space Θ Assume tat η Θ 0 ad te variace σ (η) is positive o Θ 0 Te for costats a ad b ad E{H m (X)} G(µ(η)) = σ(η) (a bµ(η) µ (η) 4µ (η) ) m + O(m ) V ar{h m (X)} = m + O(m ) Moreover, tere exist costats a ad b suc tat E{G( X + a m + b )} G(µ(η)) = O(m ) if ad oly if te expoetial family as a quadratic variace fuctio Note tat if ad oly if a bµ(η) µ (η) 4µ (η) = 0 σ (η) = µ (η) = a 0 + 4aµ(η) bµ (η) for some costat a 0 Hece te solutio of te differetial equatio is exactly te subclass of atural expoetial family wit a quadratic variace fuctio (QVF) Te followig are te specific expressios of te mea-matcig VST H m for te five distributios (oter ta ormal) i te NEF-QVF families 7

8 Poisso: a = /4, b = 0, ad H m (X) = (X + 4 )/m Biomial(r, p): a = /4, b = r, ad H m(x) = ( ) r arcsi X+/4 rm+/ Negative Biomial(r, p): a = /4, b = r, ad ( H m (X) = ) X + /4 r l mr / + + X + /4 mr / Gamma(r, λ) (wit r kow): a = 0, b = r, ad H m(x) = r l( NEF-GHS(r, λ) (wit r kow): a = 0, b = r, ad H m (X) = ( ) X r l rm / + X + (mr /) X rm / ) Expoetial family regressio troug Gaussia regressio Suppose we observe Y i id NQ(f(t i )), i =,,, t i = i ad wis to estimate te mea fuctio f(t) Similar to desity estimatio we ave Biig: Divide {Y i } ito T equal legt itervals betwee 0 ad Let Q, Q,, Q T be te sum of te observatios i eac of te itervals VST: Let Y j = mh m (Q j ), j =,, T, ad treat Y = (Y, Y,, Y T ) as te ew equi-spaced sample for a oparametric Gaussia regressio problem 3 Gaussia Regressio: Apply your favorite oparametric regressio procedure to te bied ad trasformed data Y to obtai a estimate Ĝ (f) of G (f) 4 Iverse VST: Estimate te mea fuctio f by f = G (Ĝ (f) ) We defie G (a) = 0 we a < 0 i te case of Negative Biomial ad NEF-GHS distributios 3 Gaussia regressio ad Robust estimatio Cosider te oparametric regressio model Y i = f( i ) + ξ i, i =,, 8

9 were te errors ξ i are idepedet ad idetically distributed wit some desity Quatile couplig for media Let X,, X be iid radom variables wit desity fuctio Deote te sample media by X med Assumptio (A): 0 (x) =, (0) > 0, ad (x) is Lipscitz at x = 0 Here te Lipscitz coditio at 0 meas tat tere is a costat C > 0 suc tat (x) (0) C x i a ope eigborood of 0 Teorem 5 Let Z be a stadard ormal radom variable ad let X,, X be iid wit desity fuctio were = k + for some iteger k Let Assumptio ((A) old ) Te for every tere is a mappig X med (Z) : R R suc tat L Xmed (Z) = L (X med ) ad 4 (0) X med Z C + C 4 (0) Xmed, we Xmed ε were C, ε > 0 deped o but ot o Corollary 6 Uder te assumptio of Teorem 5, te mappig X med (Z) i Teorem 5 satisfies 4 (0) X med Z C ( + Z ), we Z ε were C, ε > 0 do ot deped o Remark 7 We = k is eve, te sample media X med is usually take to be ( ) X (k) + X (k+) / Similar quatile couplig iequalities ca be obtaied For eac i, let X i,med be te media of te origial sample wit X i removed Te X med = i= X i,med Let G be te distributio of te media of iid observatios wit desity ad defie (Z i ) i L ( Φ G (X i,med ), i ) Let X ( ) i,med = G Φ(Z i) Te L X i,med, i = L (X i,med, i ) Now a direct applicatio of Teorem 5 gives X med Z C ( + ( ) ) 4 (0) X(k) + X(k+) we X (k) + X(k+) ε, ad Z = i= Z i Te couplig result give i Teorem 5 i fact olds uiformly over a ric collectio of distributios For 0 < ɛ < ad ɛ > 0 defie { 0 H ɛ,ɛ = : (x) =, ɛ (0), (x) (0) x } for all x < ɛ ɛ ɛ It ca be sow tat Teorem 5 olds uiformly for te wole family of H ɛ,ɛ 9

10 Teorem 8 Let X,, X be iid wit desity H ɛ,ɛ For every = k ( + wit ) iteger k, tere is a mappig X med (Z) : R R suc tat L Xmed (Z) = L (X med ) ad for two costats C ɛ,ɛ, ε ɛ,ɛ > 0 depedig oly o ɛ ad ɛ 4 (0) X med Z C ɛ,ɛ + C ɛ,ɛ 4 (0) Xmed, we Xmed ε ɛ,ɛ uiformly over all H ɛ,ɛ Remark 9 Te quatile couplig iequalities i Corollary 6 ad Remark 7 also old uiformly over H ɛ,ɛ by replacig C ad ε tere wit two costats depedig ɛ ad ɛ 0

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more