Integral approximation by kernel smoothing

Size: px

Start display at page:

Download "Integral approximation by kernel smoothing"

Melanie Douglas
5 years ago
Views:

1 Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, In collaboration with Bernard Delyon

2 Topic of the talk: Given ϕ : R d R, estimation of I(ϕ) = ϕ(x)dx Monte-Carlo (X 1,..., X n) i.i.d. with law f n 1/2 ( n 1 n i=1 ) ϕ(x i) f (X I(ϕ) = O P (1) i) Importance sampling Optimal sampler f, (X 1,..., X n) i.i.d. with law f ( ) n n 1/2 n 1 ϕ(x i) f (X I(ϕ) = o P (1) i) i=1 Adaptive to ϕ Evans and Schwartz (2000, book), Zhang (1996, JASA) 2 / 27

3 Main result (X 1,..., X n) i.i.d. with law f, f is a kernel estimator of f n 1/2 ( n 1 n i=1 ) ϕ(x i) f I(ϕ) = o P (1) (Xi) ϕ is only known at the points X i Adaptive to the design Purpose Rates of convergence, asymptotic behaviour Regularity of f and ϕ with respect to the dimension, the bandwidth In practice: kernel, bandwidth... Application to regression modelling 3 / 27

4 Rates of convergence Asymptotic behaviour Simulations Conclusion (Application to regression modelling) 4 / 27

5 Definition of the estimators K a d-dimensional kernel n f (i) (x) = (nh d ) 1 K(h 1 (x X j)) j i v (i) (x) = ((n 1)(n 2)) 1 n (h d K(h 1 (x X j)) f (i) (x)) 2 j i 2 estimators of I(ϕ) Î(ϕ) = n 1 Î c(ϕ) = n 1 n i=1 n i=1 ϕ(x i) f (i) (X i) ϕ(x i) f (i) (X i) ( 1 v ) (i) (X i) f (i) (X i) 2 5 / 27

6 Assumptions Nikol ski class H s, s = k + α, k N, 0 < α 1 (ϕ (l) (x + u) ϕ (l) (x)) 2 dx C u 2α l = (l 1,... l d ), l i k ( ψ is α-hölder inside Q s = min(1/2, α)) Tsybakov (2009, book) (A1) ϕ H s on R d and has compact support Q (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f (x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 6 / 27

7 Theorem Assume (A1-A4), we have n 1/2 (Î(ϕ) I(ϕ) ) = O P ( h s + n 1/2 h r + n 1/2 h d) (1) if the O P n + 0 Remarks Curse of dimensionality: r > d For r, s large, h opt n 1 r+d, f is undersmooth because h opt < n 1 2r+d Regularity of ϕ is not crucial the rate = n r d 2(r+d) Stone (1980, AoS) Trimming method? Härdle and Stocker (1989, JASA) 7 / 27

8 Theorem Assume (A1-A4), we have ) ( n (Îc(ϕ) 1/2 I(ϕ) = O P h s + n 1/2 h r + n 1/2 h d/2 + n 1 h 3d/2) if the O P n + 0 Remarks Curse of dimensionality : r > 3d/4 instead of O P ( h s + n 1/2 h r + n 1/2 h d) For r, s large, h opt n 1 r+d/2, the optimal rate = n r d/2 2(r+d/2) Leave-one out better than the classical 8 / 27

9 Rates of convergence Asymptotic behaviour Simulations Conclusion (Application to regression modelling) 9 / 27

10 Î(ϕ) I(ϕ) = Bn + M n + neglectable Î c(ϕ) I(ϕ) = B n + M n + U n + neglectable with B n and Bn non-random, M n martingale, U n U-stat If ϕ is very smooth: M n = o P (U n) If ϕ is not regular: U n = o P (M n) Hall (1984, JMVA), Hall and Heyde (1980, book) 10 / 27

11 Regular case Theorem Under (A1) to (A4), if nh 2d +, nh r+d/2 0 and nh 2s+d 0, nh d/2 (Î c(ϕ) I(ϕ)) is asymptotically normally distributed with zero-mean and variance given by ( 2 (K(u + v) K(v))K(u)du) dv ϕ(x) 2 f (x) 2 dx 11 / 27

12 A non smooth example (B1) For some s > 1/2 the function ϕ belongs to H s on Q and is bounded, with compact support Q. (B2) The set Q is compact with C 2 boundary. L Q (x) = min( z, u(x), z, u(x) ) +K(z)K(z )dzdz u(x) the normal outer vector of Q at the point x Theorem Under the assumptions (A2) to (A4), (B1) and (B2), if nh (3d+1)/2 0 and nh 2r 1 0 (nh 1 ) 1/2 (Î c(ϕ) I(ϕ)) is asymptotically normally distributed with zero-mean and variance given by L Q (x)ϕ(x) 2 dh p 1 (x), Q where H p 1 stands for the p 1 dimensional Hausdorff measure. 12 / 27

13 Rates of convergence Asymptotic behaviour Simulations Conclusion (Application to regression modelling) 13 / 27

14 In practice Sample number = 20, h=n^1/3, Epanechnikov MC Boundary pb No boundary pb 14 / 27

15 In practice Sample number = 50, h=n^1/3, Epanechnikov MC Boundary pb No boundary pb 15 / 27

16 In practice Sample number = 100, h=n^1/3, Epanechnikov MC Boundary pb No boundary pb 16 / 27

17 In practice Sample number = 200, h=n^1/3, Epanechnikov MC Boundary pb No boundary pb 17 / 27

18 In practice Sample number = 500, h=n^1/3, Epanechnikov MC Boundary pb No boundary pb 18 / 27

19 Bandwidth choice Plug-in, e.g. Härdle, Marron and Tsybakov (1992, JASA) Simulation-validation ϕ(x) = n 1 n i=1 ( ) ϕ(x i) x Xi f h d 0 K, (Xi) h 0 ϕ looks like ϕ (convolution estimator) I( ϕ) is known ĥ = argmin h Î c( ϕ) I( ϕ) 19 / 27

20 Kernel K(x) (d + 2 (d + 3) x )1 x <1 Design Model 1 X i N ( 1 2, 1 4 Id) Model 2 X i U([0, 1] d ) d ϕ(x) = 2 sin(x k ) xk 1. k=1 20 / 27

21 Model 1 Gaussian design in dimension 1 Gaussian design in dimension 4 Error ^C Ι ^ Ι ^MC Ι Error ^C Ι ^ Ι ^MC Ι Sample size Sample size Figure : 100 estimates Î c(ϕ), Î(ϕ) and Monte-Carlo method noted Î MC 21 / 27

22 Model 2 Uniform design in dimension 1 Uniform design in dimension 4 Error ^C Ι ^ Ι ^MC Ι Error ^C Ι ^ Ι ^MC Ι Sample size Sample size Figure : 100 estimates Î c(ϕ), Î(ϕ) and Monte-Carlo method noted Î MC 22 / 27

23 Rates of convergence Asymptotic behaviour Simulations Conclusion (Application to regression modelling) 23 / 27

24 Regression model Y i = g(x i) + σ(x i)e i (X i) random i.i.d. with density f (X i) (e i) The functions g and σ are unknown Let Q R d bounded and L 2(Q) = {ψ : Q ψ(x)2 dx < + } Purpose Estimate c =< g, ψ >= g(x)ψ(x)dx Q (nonrandom design case treated by Donoho) 24 / 27

25 Plug-in estimates Plug-in of g is difficult Let ĝ such that a n(ĝ(x) g(x)) d Gaussian variable (e.g. NW, NN...) a n = o( n), but not tight, then n(< ĝ, ψ > < g, ψ >) = n < ĝ g, ψ > d Gaussian variable is difficult to handle. Plug-in of f may be better c =< g, ψ >= E [ ] Y ψ(x) f (X) ĉ = n 1 n i=1 Y iψ(x i) f (Xi) 25 / 27

26 Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f (x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 26 / 27

27 Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f (x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) (A5) ψ is Hölder on its support Q R d nonempty bounded and convex (A6) g is Hölder on Q and σ is bounded (A7) n 1/2 h r n + 0 and n 1/2 h d n / 27

28 Theorem Assume (A1-A7) we have n 1/2 (ĉ c) d N (0, v) where v is the variance of the random variable Y 1 g(x 1 ) f (X 1 ) ψ(x 1) Remarks Rates in root n The variance is smaller than when f = f is known Trimming method? (Härdle and Stoker (1989, JASA)) 27 / 27

Acceleration of some empirical means. Application to semiparametric regression

Acceleration of some empirical means. Application to semiparametric regression François Portier Université catholique de Louvain - ISBA November, 8 2013 In collaboration with Bernard Delyon Regression