Acceleration of some empirical means. Application to semiparametric regression

Acceleration of some empirical means. Application to semiparametric regression François Portier Université catholique de Louvain - ISBA November, 8 2013 In collaboration with Bernard Delyon

Regression model Y i = g(x i)+σ(x i)e i (X i) random i.i.d. with density f (X i) (e i) The functions g and σ are unknown Let Q R d bounded and L 2(Q) = {ψ : Purpose Q ψ(x)2 dx < + } Estimate c =< g, ψ >= g(x)ψ(x)dx Q (nonrandom design case treated by Donoho) 2 / 28

Plug-in estimates Plug-in of g is difficult Let ĝ such that a n(ĝ(x) g(x)) d Gaussian variable (e.g. NW, NN...) a n = o( n), but not tight, then n(< ĝ, ψ > < g, ψ >) = n < ĝ g, ψ > d Gaussian variable is diffucult to show. Plug-in of f may be better c =< g, ψ >= E [ ] Y ψ(x) f(x) ĉ = n 1 n i=1 Y iψ(x i) f(xi) 3 / 28

Fields of application Semiparametric estimation Dimension reduction g(x) = g 0(β T x) ADE: < g, ψ >= < g, ψ >= β (Vial, Härdle) Estimation of a location parameter in a regression (Vimond, Bercu) Curve estimation Orthogonal series or wavelet Kernel smoothing (Book of Härdle) K < g, ψ k=1 k > ψ k (y) K + g(y) approx.th. < g, K h ( y) > h 0 g(y) regular.th. 4 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 5 / 28

Context Approximate the quantity ϕ(x)dx Q ϕ is known at the points X i (X i) is random (if X i is regular: quasi MC) Classical Montecarlo procedure n ( n 1 n i=1 ϕ(x i) f(x ϕ(x)dx i) Q ) d Gaussian variable Kernel smoothing n ( n 1 n i=1 ϕ(x i) f (i) (X i) Q ϕ(x)dx ) P 0 f (i) (x) = (nh d ) 1 n K(h 1 (X j x)) j i 6 / 28

Assumptions Nikol ski class H s, s = k + α, k N, 0 < α 1 (ϕ (l) (x + u) ϕ (l) (x)) 2 dx C u 2α l = (l 1,... l d ), l i k ( ψ is α-hölder inside Q s = min(1/2, α)) (A1) ϕ H s on R d and has compact support Q (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 7 / 28

Theorem Assume (A1-A4), we have n ( n 1 n i=1 ϕ(x i) f (i) (X i) Q ϕ(x)dx ) = O P ( h s + n 1/2 h r + n 1/2 h d) (1) if the O P n + 0 Remarks Curse of dimensionality: r > d For r, s large, h opt n 1 r+d, f is undersmooth because h opt < n 1 2r+d Regularity of ϕ is not crucial Trimming method? the rate = n r d 2(r+d) (Stone) 8 / 28

Bandwidth choice Plug-in (Härdle, Hart, Marron and Tsybakov) Cross validation 9 / 28

Theorem v (i) (x) = ((n 1)(n 2)) 1 Assume (A1-A4) we have n (n 1 n i=1 if the O P n + 0 Remarks n (h d K(h 1 (x X j)) f (i) (x)) 2 j i ϕ(x i) ( v (i) (X i) ) 1 f (i) (X i) f (i) (X i) Q 2 Curse of dimensionality : r > 3d/4 ϕ(x)dx ( = O P h s + n 1/2 h r + n 1/2 h d/2 + n 1 h 3d/2) ( d) instead of O P h s + n 1/2 h r + n 1/2 h For r, s large, h opt n 1 r+d/2, the optimal rate = n r d/2 2(r+d/2) Leave-one out better than the classical ) 10 / 28

In practice Sample number = 20, h=n^1/3, Epanechnikov 0.00 0.05 0.10 0.15 0.20 0.25 MC Boundary pb No boundary pb 11 / 28

In practice Sample number = 50, h=n^1/3, Epanechnikov 0.00 0.04 0.08 0.12 MC Boundary pb No boundary pb 12 / 28

In practice Sample number = 100, h=n^1/3, Epanechnikov 0.00 0.02 0.04 0.06 MC Boundary pb No boundary pb 13 / 28

In practice Sample number = 200, h=n^1/3, Epanechnikov 0.00 0.01 0.02 0.03 0.04 0.05 MC Boundary pb No boundary pb 14 / 28

In practice Sample number = 500, h=n^1/3, Epanechnikov 0.000 0.010 0.020 0.030 MC Boundary pb No boundary pb 15 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 16 / 28

Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 17 / 28

Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) (A5) ψ is Hölder on its support Q R d nonempty bounded and convex (A6) g is Hölder on Q and σ is bounded (A7) n 1/2 h r n + 0 and n 1/2 h d n + + 17 / 28

Theorem Assume (A2-A7) we have n 1/2 (ĉ c) d N(0, v) where v is the variance of the random variable Y 1 g(x 1 ) f(x 1 ) ψ(x 1) Remarks Rates in root n The variance is smaller than when f is known Trimming method? (Härdle and Stoker (1989)) 18 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 19 / 28

Single index model g(x) = g 0(β T x) vect(β) = E g(x) E x R d Estimation of g and then E[ g(x)] (Hristache, Juditsky et Spokoiny (2001), n consistent when dim(e) 4) Estimation of f E[ g(x)] = E[Y f(x) f(x) ] (Härdle et Stoker (1989), n consistent when dim(e) = 1) Idea < xψ(, t), g >= < ψ(, t), g >= β t E 20 / 28

Results β t = ψ(x, t) g(x)dx βt = n 1 n i=1 Y i xψ(x i, t) f(xi) Theorem Under (A3-A7) we have n( βt β t) d Gaussian for each t 21 / 28

corollaries Corollary Under (A3-A7), and some regularity conditions on xψ, we have n ( βt β t ) Gaussian process Corollary Under (A3-A7), and some regularity conditions on xψ, we have ( ) n β t βt t dt β tβ T d t dt Gaussian variable 22 / 28

Implémentation β t = ψ(x, t) g(x)dx βt = n 1 n i=1 Y i xψ(x i, t) f(xi) 1.Compute β t βt t dt 2. β eigenvectors associated to the d largest eigenvalues Radial kernel with order r Bandwidth h = 2n 1/(r+p) 23 / 28

( ) π Y = cos 2 (X(1) µ) + 0.4e, X = d N(0, I) R 6, e = d N(0, 1), µ R. Model II Dist 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 SIR SP SAVE 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 µ Figure: Boxplot over 100 samples of the error of SIR, SAVE and our method, n = 150, µ varies 24 / 28

Adaptive method Estimation of f curse of dimensionality One has [ ] [ ] Y1( ψ)(ax) g(x)ψ(ax) AE = E E, f AX (AX) f AX (AX) for every vect(a) E in particular for A 0 = ββ T Procedure Â ǫ = β βt + ǫi ǫ 0 25 / 28

Simulations Y = X (1) + 0.4e, X d = N(0, I) R 6, e d = N(0, 1) Model III Dist 0.0 0.2 0.4 0.6 0.8 SPadap SIR 1 5 10 20 final (23) Number of iterations Figure: Boxplots over 100 samples of the error for SIR and the adaptive method, n = 150 26 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 27 / 28

Proof and generalization Proof Taylor devlopment U-stat Kernel regularization Generalization Estimation of the functionals ϕ(x)η(f(x))dx 28 / 28