Acceleration of some empirical means. Application to semiparametric regression

Similar documents
Integral approximation by kernel smoothing

Additive Isotonic Regression

Modelling Non-linear and Non-stationary Time Series

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Physics 443, Solutions to PS 1 1

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

Variance Function Estimation in Multivariate Nonparametric Regression

Bickel Rosenblatt test

41903: Introduction to Nonparametrics

Time Series and Forecasting Lecture 4 NonLinear Time Series

Introduction to Regression

On variable bandwidth kernel density estimation

Kernel Density Estimation

Introduction to Regression

Estimating a frontier function using a high-order moments method

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Ordinary Differential Equations II

The memory centre IMUJ PREPRINT 2012/03. P. Spurek

Function Spaces. 1 Hilbert Spaces

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

The high order moments method in endpoint estimation: an overview

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities

Introduction to Regression

Wavelets in Scattering Calculations

Nonparametric confidence intervals. for receiver operating characteristic curves

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Econ 582 Nonparametric Regression

Nonparametric Econometrics

Statistica Sinica Preprint No: SS

Analysis methods of heavy-tailed data

Monte Carlo Solution of Integral Equations

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Semiparametric modeling and estimation of the dispersion function in regression

A review of some semiparametric regression models with application to scoring

Math Camp Notes: Everything Else

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Fractal functional regression for classification of gene expression data by wavelets

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Nonparametric Density Estimation (Multidimension)

Local Polynomial Regression

Optimal Estimation of a Nonsmooth Functional

Inverse problems in statistics

Wavelet Analysis. Willy Hereman. Department of Mathematical and Computer Sciences Colorado School of Mines Golden, CO Sandia Laboratories

By 3DYHOýtåHN, Wolfgang Härdle

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Kernel Density Estimation

Thermodynamic limit for a system of interacting fermions in a random medium. Pieces one-dimensional model

S6880 #13. Variance Reduction Methods

Adaptive Kernel Estimation of The Hazard Rate Function

CS 556: Computer Vision. Lecture 21

Nonparametric Estimation of Functional-Coefficient Autoregressive Models

ECO Class 6 Nonparametric Econometrics

TD 1: Hilbert Spaces and Applications

4 Invariant Statistical Decision Problems

TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Introduction to Curve Estimation

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Smoothness Adaptive Average Derivative Estimation

Statistical inference on Lévy processes

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

7 Semiparametric Estimation of Additive Models

g(x) = P (y) Proof. This is true for n = 0. Assume by the inductive hypothesis that g (n) (0) = 0 for some n. Compute g (n) (h) g (n) (0)

Log Concavity and Density Estimation

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Quantum Mechanics Exercises and solutions

Simple and Honest Confidence Intervals in Nonparametric Regression

Fast learning rates for plug-in classifiers under the margin condition

Asymptotics of minimax stochastic programs

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Local Modelling of Nonlinear Dynamic Systems Using Direct Weight Optimization

Gaussian processes for inference in stochastic differential equations

Nonparametric Density Estimation

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form.

Sobolev Spaces. Chapter 10

Three Papers by Peter Bickel on Nonparametric Curve Estimation

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Efficient Estimation for the Partially Linear Models with Random Effects

Density estimators for the convolution of discrete and continuous random variables

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

On Efficiency of the Plug-in Principle for Estimating Smooth Integrated Functionals of a Nonincreasing Density

Kullback-Leibler Designs

DIFFERENT KINDS OF ESTIMATORS OF THE MEAN DENSITY OF RANDOM CLOSED SETS: THEORETICAL RESULTS AND NUMERICAL EXPERIMENTS.

Minimax lower bounds I

Additive Models: Extensions and Related Models.

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Can we do statistical inference in a non-asymptotic way? 1

Smooth simultaneous confidence bands for cumulative distribution functions

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Transcription:

Acceleration of some empirical means. Application to semiparametric regression François Portier Université catholique de Louvain - ISBA November, 8 2013 In collaboration with Bernard Delyon

Regression model Y i = g(x i)+σ(x i)e i (X i) random i.i.d. with density f (X i) (e i) The functions g and σ are unknown Let Q R d bounded and L 2(Q) = {ψ : Purpose Q ψ(x)2 dx < + } Estimate c =< g, ψ >= g(x)ψ(x)dx Q (nonrandom design case treated by Donoho) 2 / 28

Plug-in estimates Plug-in of g is difficult Let ĝ such that a n(ĝ(x) g(x)) d Gaussian variable (e.g. NW, NN...) a n = o( n), but not tight, then n(< ĝ, ψ > < g, ψ >) = n < ĝ g, ψ > d Gaussian variable is diffucult to show. Plug-in of f may be better c =< g, ψ >= E [ ] Y ψ(x) f(x) ĉ = n 1 n i=1 Y iψ(x i) f(xi) 3 / 28

Fields of application Semiparametric estimation Dimension reduction g(x) = g 0(β T x) ADE: < g, ψ >= < g, ψ >= β (Vial, Härdle) Estimation of a location parameter in a regression (Vimond, Bercu) Curve estimation Orthogonal series or wavelet Kernel smoothing (Book of Härdle) K < g, ψ k=1 k > ψ k (y) K + g(y) approx.th. < g, K h ( y) > h 0 g(y) regular.th. 4 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 5 / 28

Context Approximate the quantity ϕ(x)dx Q ϕ is known at the points X i (X i) is random (if X i is regular: quasi MC) Classical Montecarlo procedure n ( n 1 n i=1 ϕ(x i) f(x ϕ(x)dx i) Q ) d Gaussian variable Kernel smoothing n ( n 1 n i=1 ϕ(x i) f (i) (X i) Q ϕ(x)dx ) P 0 f (i) (x) = (nh d ) 1 n K(h 1 (X j x)) j i 6 / 28

Assumptions Nikol ski class H s, s = k + α, k N, 0 < α 1 (ϕ (l) (x + u) ϕ (l) (x)) 2 dx C u 2α l = (l 1,... l d ), l i k ( ψ is α-hölder inside Q s = min(1/2, α)) (A1) ϕ H s on R d and has compact support Q (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 7 / 28

Theorem Assume (A1-A4), we have n ( n 1 n i=1 ϕ(x i) f (i) (X i) Q ϕ(x)dx ) = O P ( h s + n 1/2 h r + n 1/2 h d) (1) if the O P n + 0 Remarks Curse of dimensionality: r > d For r, s large, h opt n 1 r+d, f is undersmooth because h opt < n 1 2r+d Regularity of ϕ is not crucial Trimming method? the rate = n r d 2(r+d) (Stone) 8 / 28

Bandwidth choice Plug-in (Härdle, Hart, Marron and Tsybakov) Cross validation 9 / 28

Theorem v (i) (x) = ((n 1)(n 2)) 1 Assume (A1-A4) we have n (n 1 n i=1 if the O P n + 0 Remarks n (h d K(h 1 (x X j)) f (i) (x)) 2 j i ϕ(x i) ( v (i) (X i) ) 1 f (i) (X i) f (i) (X i) Q 2 Curse of dimensionality : r > 3d/4 ϕ(x)dx ( = O P h s + n 1/2 h r + n 1/2 h d/2 + n 1 h 3d/2) ( d) instead of O P h s + n 1/2 h r + n 1/2 h For r, s large, h opt n 1 r+d/2, the optimal rate = n r d/2 2(r+d/2) Leave-one out better than the classical ) 10 / 28

In practice Sample number = 20, h=n^1/3, Epanechnikov 0.00 0.05 0.10 0.15 0.20 0.25 MC Boundary pb No boundary pb 11 / 28

In practice Sample number = 50, h=n^1/3, Epanechnikov 0.00 0.04 0.08 0.12 MC Boundary pb No boundary pb 12 / 28

In practice Sample number = 100, h=n^1/3, Epanechnikov 0.00 0.02 0.04 0.06 MC Boundary pb No boundary pb 13 / 28

In practice Sample number = 200, h=n^1/3, Epanechnikov 0.00 0.01 0.02 0.03 0.04 0.05 MC Boundary pb No boundary pb 14 / 28

In practice Sample number = 500, h=n^1/3, Epanechnikov 0.000 0.010 0.020 0.030 MC Boundary pb No boundary pb 15 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 16 / 28

Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) 17 / 28

Assumptions (A2) The r-th order derivatives of f are bounded (A3) For every x Q, f(x) b > 0 (A4) K symmetric with order r and K(x) C 1 exp( C 2 x ) (A5) ψ is Hölder on its support Q R d nonempty bounded and convex (A6) g is Hölder on Q and σ is bounded (A7) n 1/2 h r n + 0 and n 1/2 h d n + + 17 / 28

Theorem Assume (A2-A7) we have n 1/2 (ĉ c) d N(0, v) where v is the variance of the random variable Y 1 g(x 1 ) f(x 1 ) ψ(x 1) Remarks Rates in root n The variance is smaller than when f is known Trimming method? (Härdle and Stoker (1989)) 18 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 19 / 28

Single index model g(x) = g 0(β T x) vect(β) = E g(x) E x R d Estimation of g and then E[ g(x)] (Hristache, Juditsky et Spokoiny (2001), n consistent when dim(e) 4) Estimation of f E[ g(x)] = E[Y f(x) f(x) ] (Härdle et Stoker (1989), n consistent when dim(e) = 1) Idea < xψ(, t), g >= < ψ(, t), g >= β t E 20 / 28

Results β t = ψ(x, t) g(x)dx βt = n 1 n i=1 Y i xψ(x i, t) f(xi) Theorem Under (A3-A7) we have n( βt β t) d Gaussian for each t 21 / 28

corollaries Corollary Under (A3-A7), and some regularity conditions on xψ, we have n ( βt β t ) Gaussian process Corollary Under (A3-A7), and some regularity conditions on xψ, we have ( ) n β t βt t dt β tβ T d t dt Gaussian variable 22 / 28

Implémentation β t = ψ(x, t) g(x)dx βt = n 1 n i=1 Y i xψ(x i, t) f(xi) 1.Compute β t βt t dt 2. β eigenvectors associated to the d largest eigenvalues Radial kernel with order r Bandwidth h = 2n 1/(r+p) 23 / 28

( ) π Y = cos 2 (X(1) µ) + 0.4e, X = d N(0, I) R 6, e = d N(0, 1), µ R. Model II Dist 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 SIR SP SAVE 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 µ Figure: Boxplot over 100 samples of the error of SIR, SAVE and our method, n = 150, µ varies 24 / 28

Adaptive method Estimation of f curse of dimensionality One has [ ] [ ] Y1( ψ)(ax) g(x)ψ(ax) AE = E E, f AX (AX) f AX (AX) for every vect(a) E in particular for A 0 = ββ T Procedure  ǫ = β βt + ǫi ǫ 0 25 / 28

Simulations Y = X (1) + 0.4e, X d = N(0, I) R 6, e d = N(0, 1) Model III Dist 0.0 0.2 0.4 0.6 0.8 SPadap SIR 1 5 10 20 final (23) Number of iterations Figure: Boxplots over 100 samples of the error for SIR and the adaptive method, n = 150 26 / 28

Integral approximation by kernel smoothing Asymptotic normality of ĉ Application to dimension reduction Proof, generalization, concluding remarks 27 / 28

Proof and generalization Proof Taylor devlopment U-stat Kernel regularization Generalization Estimation of the functionals ϕ(x)η(f(x))dx 28 / 28