Nonparametric Inference via Bootstrapping the Debiased Estimator

Similar documents
Confidence intervals for kernel density estimation

arxiv: v2 [stat.me] 12 Sep 2017

Statistica Sinica Preprint No: SS

Optimal Ridge Detection using Coverage Risk

Simple and Honest Confidence Intervals in Nonparametric Regression

Statistics: Learning models from data

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms

Lecture 2: CDF and EDF

Statistical inference on Lévy processes

STAT Section 2.1: Basic Inference. Basic Definitions

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016

optimal inference in a class of nonparametric models

A Simple Adjustment for Bandwidth Snooping

41903: Introduction to Nonparametrics

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Test for Discontinuities in Nonparametric Regression

Bickel Rosenblatt test

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Inference on distributions and quantiles using a finite-sample Dirichlet process

A Simple Adjustment for Bandwidth Snooping

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

A Novel Nonparametric Density Estimator

Smooth simultaneous confidence bands for cumulative distribution functions

On variable bandwidth kernel density estimation

Time Series and Forecasting Lecture 4 NonLinear Time Series

Nonparametric Function Estimation with Infinite-Order Kernels

finite-sample optimal estimation and inference on average treatment effects under unconfoundedness

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Selective Inference for Effect Modification

CS 147: Computer Systems Performance Analysis

Quantile Regression for Extraordinarily Large Data

Adaptivity to Local Smoothness and Dimension in Kernel Regression

Semi-Nonparametric Inferences for Massive Data

Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Multiscale Adaptive Inference on Conditional Moment Inequalities

Defect Detection using Nonparametric Regression

Integral approximation by kernel smoothing

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Inference For High Dimensional M-estimates. Fixed Design Results

Can we do statistical inference in a non-asymptotic way? 1

Nonparametric Density Estimation

12 - Nonparametric Density Estimation

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Program Evaluation with High-Dimensional Data

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas

Large Sample Properties of Partitioning-Based Series Estimators

Econ 582 Nonparametric Regression

The Intersection of Statistics and Topology:

Inference in Regression Discontinuity Designs with a Discrete Running Variable

Nonparametric Heterogeneity Testing for Massive Data

Locally Robust Semiparametric Estimation

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers

A simple bootstrap method for constructing nonparametric confidence bands for functions

arxiv: v1 [stat.me] 7 Jun 2015

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Density Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics)

EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION

From CDF to PDF A Density Estimation Method for High Dimensional Data

7 Influence Functions

On Fractile Transformation of Covariates in Regression 1

Nonparametric confidence intervals. for receiver operating characteristic curves

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 Revised May 2017

Exploiting k-nearest Neighbor Information with Many Data

Chapter 9. Non-Parametric Density Function Estimation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Modal Regression using Kernel Density Estimation: a Review

Coverage Error Optimal Confidence Intervals

Nonparametric Regression. Changliang Zou

Nonparametric Methods

Kernel density estimation

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Diagnostics. Gad Kimmel

Better Bootstrap Confidence Intervals

Understanding Generalization Error: Bounds and Decompositions

Introduction to Regression

Inference in Regression Discontinuity Designs with a Discrete Running Variable

Lecture 3: Introduction to Complexity Regularization

Single Index Quantile Regression for Heteroscedastic Data

Additive Isotonic Regression

OPTIMAL INFERENCE IN A CLASS OF REGRESSION MODELS. Timothy B. Armstrong and Michal Kolesár. May 2016 COWLES FOUNDATION DISCUSSION PAPER NO.

A Simple Adjustment for Bandwidth Snooping

A new method of nonparametric density estimation

IDEAL Inference on Conditional Quantiles via Interpolated Duals of Exact Analytic L-statistics

Chapter 2: Resampling Maarten Jansen

Cross-Validation with Confidence

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

Introduction to Regression

Independent and conditionally independent counterfactual distributions

Bootstrap tuning in model choice problem

Introduction to Regression

Transcription:

Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21

Problem Setup Let X 1,, X n be an IID random sample from an unknown distribution function with a density function p. For simplicity, we assume p is supported on [0, 1] d. Goal: given a level α, we want to find L α (x), U α (x) using the random sample such that ( P L α (x) p(x) U α (x) x [0, 1] d) 1 α + o(1). Namely, [L α (x), U α (x)] forms an asymptotic simultaneous confidence band of p(x). 2 / 21

Simple Approach: using the KDE A classical approach is to construct L α (x), U α (x) using the kernel density estimator (KDE). Let p h (x) = 1 nh d n ( ) Xi x K h i=1 be the KDE where h > 0 is the smoothing bandwidth and K(x) is a smooth function such as a Gaussian. We pick t α such that L α (x) = p h (x) t α, U α (x) = p h (x) + t α. As long as we choose t α wisely, the resulting confidence band is asymptotically valid. 3 / 21

Simple Approach: the L Error How do we choose t α to obtain a valid confidence band? A simple idea: inverting the L error. 4 / 21

Simple Approach: the L Error How do we choose t α to obtain a valid confidence band? A simple idea: inverting the L error. Let F n (t) be the CDF of p h p = sup x p h (x) p(x). Then the value tα = Fn 1 (1 α) has a nice property: P( p h p t α) = 1 α. 4 / 21

Simple Approach: the L Error How do we choose t α to obtain a valid confidence band? A simple idea: inverting the L error. Let F n (t) be the CDF of p h p = sup x p h (x) p(x). Then the value tα = Fn 1 (1 α) has a nice property: This implies Thus, P( p h p t α) = 1 α. P( p h (x) p(x) t α x [0, 1] d ) = 1 α. L α(x) = p h (x) t α, U α(x) = p h (x) + t α leads to a simultaneous confidence band. 4 / 21

Simple Approach: the Bootstrap - 1 The previous method is great it works even in a finite sample case. However, it has a critical problem: we do not know the distribution F n! So we cannot compute the quantile. 5 / 21

Simple Approach: the Bootstrap - 1 The previous method is great it works even in a finite sample case. However, it has a critical problem: we do not know the distribution F n! So we cannot compute the quantile. A simple solution: using the bootstrap (we will use the empirical bootstrap). 5 / 21

Simple Approach: the Bootstrap - 2 Let X1,, X n be a bootstrap sample. We first compute the bootstrap KDE: p h (x) = 1 n ( X nh d K i i=1 x h ). Then we compute the bootstrap L error W = p h p h. After repeating the bootstrap procedure B times, we obtain realizations W 1,, W B. Compute the empirical CDF F n (t) = 1 B B I (W l t). l=1 Finally, we use t α = F 1 n (1 α) and construct the confidence band as L α(x) = p h (x) t α, Ûα(x) = p h (x) + t α. 6 / 21

Simple Approach: the Bootstrap - 3 Does the bootstrap approach work? 7 / 21

Simple Approach: the Bootstrap - 3 Does the bootstrap approach work? It depends. The bootstrap works if in the sense that p h p h p h p sup P( p h p h < t) P( p h p < t) = o(1). t 7 / 21

Simple Approach: the Bootstrap - 3 Does the bootstrap approach work? It depends. The bootstrap works if in the sense that p h p h p h p sup P( p h p h < t) P( p h p < t) = o(1). t However, the above bound holds if we undersmooth the data (Neumann and Polzehl 1998, Chernozhukov et al. 2014). Namely, we choose the smoothing bandwidth h = o(n 1 4+d ). 7 / 21

Simple Approach: the Bootstrap - 4 Why do we need to undersmooth the data? 8 / 21

Simple Approach: the Bootstrap - 4 Why do we need to undersmooth the data? The L error has a bias-variance tradeoff: ( ) log n p h p = O(h 2 ) +O }{{} P nh d. Bias }{{} stochastic error 8 / 21

Simple Approach: the Bootstrap - 4 Why do we need to undersmooth the data? The L error has a bias-variance tradeoff: ( ) log n p h p = O(h 2 ) +O }{{} P nh d. Bias }{{} stochastic error The bootstrap L error is capable of capturing the errors in the stochastic part. However, it does not capture the bias. 8 / 21

Simple Approach: the Bootstrap - 4 Why do we need to undersmooth the data? The L error has a bias-variance tradeoff: ( ) log n p h p = O(h 2 ) +O }{{} P nh d. Bias }{{} stochastic error The bootstrap L error is capable of capturing the errors in the stochastic part. However, it does not capture the bias. Undersmooth guarantees that the bias is of a smaller order so we can ignore it. 8 / 21

Problem of Undersmoothing ( ) log n p h p = O(h 2 ) + O }{{} P nh d. Bias }{{} stochastic error Undermoothing has a problem: we do not have the optimal convergence rate. The optimal rate occurs when we balance the bias and stochastic error: h = h opt n 1 d+4 (ignoring the log n factor). 9 / 21

Problem of Undersmoothing ( ) log n p h p = O(h 2 ) + O }{{} P nh d. Bias }{{} stochastic error Undermoothing has a problem: we do not have the optimal convergence rate. The optimal rate occurs when we balance the bias and stochastic error: h = h opt n 1 d+4 (ignoring the log n factor). A remedy to this problem: choose h optimally but correct the bias (debiased method). 9 / 21

The Debiased Method - 1 The idea of the debiased method is based on the fact that a leading term of O(h 2 ) is h 2 2 C K 2 p(x), where C K is a known constant depending on the kernel function and 2 is the Laplacian operator. 10 / 21

The Debiased Method - 1 The idea of the debiased method is based on the fact that a leading term of O(h 2 ) is h 2 2 C K 2 p(x), where C K is a known constant depending on the kernel function and 2 is the Laplacian operator. We can estimate 2 p via applying the Laplacian operator to a KDE p h. However, such an estimator is inconsistent when we choose h opt n 1 d+4 because 2 p h (x) 2 p(x) = O(h 2 ) + O P ( 1 nh d+4 ). 10 / 21

The Debiased Method - 1 The idea of the debiased method is based on the fact that a leading term of O(h 2 ) is h 2 2 C K 2 p(x), where C K is a known constant depending on the kernel function and 2 is the Laplacian operator. We can estimate 2 p via applying the Laplacian operator to a KDE p h. However, such an estimator is inconsistent when we choose h opt n 1 d+4 because 2 p h (x) 2 p(x) = O(h 2 ) + O P ( 1 nh d+4 The choice h = h opt n 1 d+4 implies 2 p h (x) 2 p(x) = o(1) + O P (1). ). 10 / 21

The Debiased Method - 2 To handle this situation, people suggested using two KDE s, one for estimating the density and the other for estimating the bias. 1 This idea has been used in Calonico et al. (2015) for a pointwise confidence interval. 11 / 21

The Debiased Method - 2 To handle this situation, people suggested using two KDE s, one for estimating the density and the other for estimating the bias. However, actually we ONLY need one KDE. 1 This idea has been used in Calonico et al. (2015) for a pointwise confidence interval. 11 / 21

The Debiased Method - 2 To handle this situation, people suggested using two KDE s, one for estimating the density and the other for estimating the bias. However, actually we ONLY need one KDE. We propose using the same KDE p h (x) to debias the estimator 1. 1 This idea has been used in Calonico et al. (2015) for a pointwise confidence interval. 11 / 21

The Debiased Method - 2 To handle this situation, people suggested using two KDE s, one for estimating the density and the other for estimating the bias. However, actually we ONLY need one KDE. We propose using the same KDE p h (x) to debias the estimator 1. Namely, we propose to use with h = h opt n 1 d+4. p h (x) = p h (x) h2 2 C K 2 p h (x) The estimator p h (x) is called the debiased estimator. 1 This idea has been used in Calonico et al. (2015) for a pointwise confidence interval. 11 / 21

The Debiased Method + Bootstrap To construct a confidence band, we use the bootstrap again but this time we compute the bootstrap debiased estimator and evaluate p h p h. p h (x) = p h2 h (x) 2 C K 2 p h (x) After repeating the bootstrap procedure many times, we compute the EDF F n of the realizations of p h p h and obtain the quantile t α = (1 α). The confidence band is F 1 n L α (x) = p h (x) t α, Ũ α (x) = p h (x) + t α. 12 / 21

Theory of the Debiased Method Theorem (Chen 2017) Assume p belongs to β-hölder class with β > 2 and the kernel function satisfies smoothness conditions. When h n 1 d+4, P ( Lα (x) p(x) Ũα(x) x [0, 1] d) = 1 α + o(1). Namely, the debiased estimator leads to an asymptotic simultaneous confidence band under the choice h n 1 d+4. 13 / 21

Why the Debiased Method Work? - 1 Why the debiased method work? Didn t we have an inconsistent bias estimator? 14 / 21

Why the Debiased Method Work? - 1 Why the debiased method work? Didn t we have an inconsistent bias estimator? We indeed do not have a consistent bias estimator but this is fine! 14 / 21

Why the Debiased Method Work? - 1 Why the debiased method work? Didn t we have an inconsistent bias estimator? We indeed do not have a consistent bias estimator but this is fine! Recall that when h n 1 d+4, 2 p h (x) 2 p(x) = o(1) + }{{} O P (1) }{{}. bias stochastic variation 14 / 21

Why the Debiased Method Work? - 1 Why the debiased method work? Didn t we have an inconsistent bias estimator? We indeed do not have a consistent bias estimator but this is fine! Recall that when h n 1 d+4, 2 p h (x) 2 p(x) = o(1) + }{{} O P (1) }{{}. bias stochastic variation Thus, our debiased estimator has three errors: p h (x) p(x) = p h (x) h2 2 C K p h (x) p(x) = h2 ( 1 2 C K 2 p(x) + o(h 2 ) +O P }{{} nh d bias ) h2 2 C K 2 p h (x) 14 / 21

Why the Debiased Method Work? - 2 The above equation equals (h n 1 d+4 ) ( ) p h (x) p(x) = h2 1 2 C K p(x) + o(h 2 ) +O P }{{} nh d h2 2 C K p h (x) bias = o(h 2 ) + O P ( 1 nh d ) + h2 2 C ( K 2 p(x) 2 p h (x) ) }{{} =o(1)+o P (1) ( ) 1 = o(h 2 ) + O P nh d + o(h 2 ) + O P (h 2 ) ( ) 1 = o(h 2 ( ) + O P nh d + O P h 2 ). Both the orange and purple terms are stochastic variation. Orange: from estimating the density. Purple: from estimating the bias. 15 / 21

Why the Debiased Method Work? - 3 When h n 1 d+4, the error rate ( ) 1 p h (x) p(x) = o(h 2 ( ) + O P nh d + O P h 2 ) = O P (n 2 d+4 ) is dominated by the stochastic variation. As a result, the bootstrap can capture the errors, leading to an asymptotic valid confidence band. 16 / 21

Why the Debiased Method Work? - 4 Actually, after closely inspecting the debiased estimator, you can find that p h (x) = p h (x) h2 2 C K 2 p h (x) = 1 n ( ) Xi x nh d K h2 h 2 C 1 K nh d i=1 = 1 n ( ) Xi x nh d M, h i=1 n ( ) 2 Xi x K h where M(x) = K(x) C K 2 2 K(x). Namely, the debiased estimator is a KDE with kernel function M(x)! i=1 17 / 21

Why the Debiased Method Work? - 5 The kernel function M(x) = K(x) C K 2 2 K(x) is actually a higher order kernel. 18 / 21

Why the Debiased Method Work? - 5 The kernel function M(x) = K(x) C K 2 2 K(x) is actually a higher order kernel. You can show that if the kernel function K(x) is a γ-th order kernel function, then the corresponding M(x) will be a (γ + 2)-th order kernel (Calonico et al. 2015, Scott 2015). 18 / 21

Why the Debiased Method Work? - 5 The kernel function M(x) = K(x) C K 2 2 K(x) is actually a higher order kernel. You can show that if the kernel function K(x) is a γ-th order kernel function, then the corresponding M(x) will be a (γ + 2)-th order kernel (Calonico et al. 2015, Scott 2015). Because the debiased estimator p h (x) uses a higher order kernel, the bias is moved to the next order, leaving the stochastic variation dominating the error. 18 / 21

Simulation Density Estimation, 95% CI Confidence Sets, Gaussian, N=2000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Original KDE Debiased KDE Bootstrap Coverage 0.0 0.2 0.4 0.6 0.8 1.0 Original KDE Debiased KDE 4 2 0 2 4 6 8 X 0.0 0.2 0.4 0.6 0.8 1.0 Nominal Coverage 19 / 21

Conclusion We illustrate a bootstrap approach to construct a simultaneous confidence band via a debiased KDE. This approach allows us to choose the smoothing bandwidth optimally and still leads to an asymptotic confidence band. A similar idea can also be applied to regression problem and local polynomial estimator. More details can be found in Chen, Yen-Chi. "Nonparametric Inference via Bootstrapping the Debiased Estimator." arxiv preprint arxiv:1702.07027 (2017). 20 / 21

Thank you! 21 / 21

References 1 Y.-C. Chen. Nonparametric Inference via Bootstrapping the Debiased Estimator. arxiv preprint arxiv:1702.07027, 2017. 2 Y.-C. Chen. A Tutorial on Kernel Density Estimation and Recent Advances. arxiv preprint arxiv:1704.03924, 2017. 3 S. Calonico, M. D. Cattaneo, and M. H. Farrell. On the effect of bias estimation on coverage accuracy in nonparametric inference. arxiv preprint arxiv:1508.02973, 2015. 4 V. Chernozhukov, D. Chetverikov, and K. Kato. Anti-concentration and honest, adaptive confidence bands. The Annals of Statistics, 42(5):1787 1818, 2014. 5 M. H. Neumann and J. Polzehl. Simultaneous bootstrap confidence bands in nonparametric regression. Journal of Nonparametric Statistics, 9(4):307 333, 1998. 6 D. W. Scott. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, 2015. 22 / 21