arxiv: v3 [math.st] 24 Feb 2017

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

1 Introduction to reducing variance in Monte Carlo simulations

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

32 estimating the cumulative distribution function

Chapter 6 Infinite Series

Study the bias (due to the nite dimensional approximation) and variance of the estimators

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

7.1 Convergence of sequences of random variables

Stochastic Simulation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Lecture 19: Convergence

An Introduction to Randomized Algorithms

Optimally Sparse SVMs

Random Variables, Sampling and Estimation

Advanced Stochastic Processes.

4. Partial Sums and the Central Limit Theorem

Lecture 2: Monte Carlo Simulation

Lecture 33: Bootstrap

Distribution of Random Samples & Limit theorems

Monte Carlo Integration

Notes 19 : Martingale CLT

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Efficient GMM LECTURE 12 GMM II

A statistical method to determine sample size to estimate characteristic value of soil parameters

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Basics of Probability Theory (for Theory of Computation courses)

Properties and Hypothesis Testing

Sequences and Series of Functions

Bayesian Methods: Introduction to Multi-parameter Models

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

LECTURE 8: ASYMPTOTICS I

Kernel density estimator

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Sequences. Notation. Convergence of a Sequence

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

7.1 Convergence of sequences of random variables

Lecture 3: August 31

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

1 Convergence in Probability and the Weak Law of Large Numbers

6.3 Testing Series With Positive Terms

Detailed proofs of Propositions 3.1 and 3.2

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Self-normalized deviation inequalities with application to t-statistic

Information-based Feature Selection

REGRESSION WITH QUADRATIC LOSS

This is an introductory course in Analysis of Variance and Design of Experiments.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Math 2784 (or 2794W) University of Connecticut

Chapter 7 Isoperimetric problem

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Department of Mathematics

Expectation and Variance of a random variable

Lesson 10: Limits and Continuity

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis and Run-Length Control

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal

The standard deviation of the mean

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

1 Inferential Methods for Correlation and Regression Analysis

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Chapter 6 Sampling Distributions

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Mathematical Statistics - MS

Simulation. Two Rule For Inverting A Distribution Function

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

GUIDELINES ON REPRESENTATIVE SAMPLING

Regression with quadratic loss

Lecture 9: September 19

Empirical Processes: Glivenko Cantelli Theorems

Rates of Convergence by Moduli of Continuity

Chapter 6 Principles of Data Reduction

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

Lecture 11 October 27

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Accuracy Assessment for High-Dimensional Linear Regression

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Lecture 3 The Lebesgue Integral

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1


Sieve Estimators: Consistency and Rates of Convergence

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

Exponential Families and Bayesian Inference

2.1. Convergence in distribution and characteristic functions.

Rank tests and regression rank scores tests in measurement error models

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

Transcription:

UNIFOM CONFIDENCE BANDS FO NONPAAMETIC EOS-IN-VAIABLES EGESSION KENGO KATO AND YUYA SASAKI arxiv:1702.03377v3 [math.st] 24 Feb 2017 Abstract. This paper develops a method to costruct uiform cofidece bads for a oparametric regressio fuctio where a predictor variable is subject to a measuremet error. We allow for the distributio of the measuremet error to be ukow, but assume that there is a idepedet sample from the measuremet error distributio. The sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. The availability of a sample from the measuremet error distributio is satisfied if, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed, are available. The proposed cofidece bad builds o the decovolutio kerel estimatio ad a ovel applicatio of the multiplier (or wild) bootstrap method. We establish asymptotic validity of the proposed cofidece bad uder ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric errors-i-variables regressio. We also propose a ovel data-drive method to choose a badwidth, ad coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. Applyig our method to a combiatio of two empirical data sets, we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. 1. Itroductio Cosider the oparametric errors-i-variables (EIV) regressio model classical measuremet error Y = g(x) + U, E[U X, ε] = 0, (1.1) W = X + ε, where each of Y, X, U, W, ad ε is a uivariate radom variable, ad ε is idepedet from X. We observe (Y, W ), but observe either X or ε. Furthermore, we assume that the distributio of ε is ukow. The variable X is a latet predictor variable, while ε is a measuremet error. Of iterest are estimatio of ad iferece o the regressio fuctio g(x) = E[Y X = x]. I Date: First arxiv versio: February 11, 2017. This versio: February 27, 2017. K. Kato is supported by Grat-i-Aid for Scietific esearch (C) (15K03392) from the JSPS. We would like to thak Tatsushi Oka ad Holger Dette for useful commets ad discussios. 1

2 K. KATO AND Y. SASAKI particular, we are iterested i costructig uiform cofidece bads for g. Cofidece bads provide a simple graphical descriptio of the extet to which a oparametric estimator varies at desig poits, thereby quatifyig ucertaities of the oparametric estimator. However, costructio of cofidece bads teds to be challegig, especially for complex oparametric models. 1 Ideed, despite the rich literature o cosistet estimatio of oparametric EIV regressio, the literature o poitwise or uiform cofidece bads for oparametric EIV regressio is limited see below for a literature review likely because of its complexity. Eve poitwise iferece o g uder the assumptio that the measuremet error distributio is kow is cosidered by experts to be difficult. 2 This is because, as discussed i Delaigle et al. (2015): 1) the asymptotic variace of the decovolutio kerel estimator of g is o-trivial to estimate ad so iferece based o limitig distributios is difficult to implemet; ad 2) it is ot straightforward to devise a way to implemet bootstrap for iferece o g due to the uobservability of X i data. With all these challeges recogized i the literature, the preset paper attempts to solve a eve more challegig problem of costructig uiform cofidece bads for the regressio fuctio g without assumig that the measuremet error distributio is kow. To deal with ukow measuremet error distributio, we assume that, i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} from the distributio of (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio where m = m as. (The auxiliary sample {η 1,..., η m } eed ot be idepedet from {(Y 1, W 1 ),..., (Y, W )}.) For example, i atural sciece, measuremet errors are ofte due to measurig devices; i such cases, oe ca obtai prelimiary calibratio measures i the absece of sigal, which produce a sample from the measuremet error distributio; see the itroductio of Comte ad Lacour (2011), for example. Other real data scearios of such additioal data availability that are plausible i ecoomics, social scieces, ad biomedical scieces iclude: the case where validatio data is available for data combiatio; ad the case where repeated measuremets (pael data) o X with errors oe of which is symmetrically distributed are available. These patters of data requiremets are ofte cosidered i the existig literature with measuremet errors that we review below. Uder this setup, we develop a method to costruct cofidece bads for the regressio fuctio g. Our method builds o the decovolutio kerel estimatio (Fa ad Truog, 1993), ad a 1 We refer to Wasserma (2006) ad Gié ad Nickl (2016) as geeral refereces o cofidece bads i oparametric statistical models. 2 Delaigle et al. (2015), who study poitwise cofidece bads for oparametric EIV regressio uder the assumptio that the measuremet error distributio is kow, state that despite their practical importace, to our kowledge cofidece bads i oparametric EIV regressio have largely bee igored so far. We show that the problem is particularly complex, much more so tha i the stadard error-free settig. (Delaigle et al., 2015, p.149)

ovel applicatio of the multiplier (or wild) bootstrap method. Our costructio of the multiplier process differs from the stadard approach i the error-free case (cf. Neuma ad Polzehl, 1998), ad is tailored to EIV regressio; see emark 2.1 ahead. Buildig o o-trivial applicatios of the probabilistic techiques developed i Cherozhukov et al. (2014a,b, 2016), we establish asymptotic validity of the proposed cofidece bad, i.e., the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), Delaigle et al. (2015) that study iferece i decovolutio ad EIV regressio, we focus for a techical reaso o the case where the measuremet error desity is ordiary smooth, i.e., the characteristic fuctio of the measuremet error distributio decays at most polyomially fast i the tail (cf. Fa, 1991a; Fa ad Truog, 1993). I additio to these cotributios, we also propose a ovel data-drive method to choose a badwidth. I the theoretical study, we require to take the badwidth i such a way that it udersmoothes the decovolutio kerel estimate, so that the bias is egligible relative to the variace part. Existig data-drive methods for badwidth selectio typically aim at choosig a badwidth miimizig the MISE, thereby yieldig a o-udersmoothig badwidth (cf. Delaigle ad Hall, 2008). We propose a alterative method for badwidth selectio that aims at yieldig a udersmoothig badwidth. We coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. The simulatio studies show that the proposed cofidece bad, combied with the proposed badwidth selectio rule, works well. Applyig our method to a combiatio of the two data sets, the Natioal Health ad Nutritio Examiatio Survey (NHANES) ad the Pael Survey of Icome Dyamics (PSID), we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. I order to locate the preset paper i the cotext of the relevat literature, it is useful to first review measuremet error models ad decovolutio. We refer to books by Fuller (1987), Carroll et al. (2006), Meister (2009) ad Horowitz (2009, Chapter 5) ad surveys by Che et al. (2011) ad Scheach (2016) for geeral refereces. The geesis of this literature features the decovolutio kerel desity estimatio with kow error distributios (Carroll ad Hall, 1988; Stefaski ad Carroll, 1990; Fa, 1991a,b), followed by that with ukow error distributios (Diggle ad Hall, 1993; Horowitz ad Markatou, 1996; Neuma, 1997; Efromovich, 1997; Li ad Vuog, 1998; Delaigle et al., 2008; Johaes, 2009; Comte ad Lacour, 2011). Diggle ad Hall (1993); Neuma (1997); Efromovich (1997); Johaes (2009); Comte ad Lacour (2011) assume the availability of a sample from the measuremet error distributio, while Horowitz ad Markatou (1996); Delaigle et al. (2008) assume repeated measuremets (pael data) with symmetrically ad 3

4 K. KATO AND Y. SASAKI idetically distributed errors. For repeated measuremets (pael data) without symmetry of error distributios, Li ad Vuog (1998) propose a alterative desity estimator based o Kotlarski s lemma (cf. Kotlarski, 1967; ao, 1992) that does ot require kow error distributio; see also Bohomme ad obi (2010) ad Comte ad Kappus (2015) for further developmets. Methods to costruct cofidece bads i decovolutio are developed by Bissatz et al. (2007); Bissatz ad Holzma (2008); va Es ad Gugushvili (2008); Louici ad Nickl (2011); Schmidt-Hieber et al. (2013) for the case of kow error distributio, ad more recetly by Kato ad Sasaki (2016) for the case of ukow error distributio. Similarly to the desity estimatio, the literature o oparametric EIV regressio estimatio ofte takes the decovolutio kerel approach. Fa ad Truog (1993) propose to substitute the decovolutio kerel i the Nadaraya-Watso estimator also see Fa ad Masry (1992) for poitwise asymptotic ormality, Delaigle ad Meister (2007) for extesios to heteroscedastic measuremet errors, Delaigle et al. (2009) for local polyomial extesios, ad Delaigle et al. (2015) for poitwise iferece. These papers focus o the case of kow error distributio. Delaigle et al. (2008) estimate the error characteristic fuctio usig repeated measuremets o X with symmetrically ad idetically distributed errors, ad substitute the estimated error characteristic fuctio ito the decovolutio kerel. Scheach (2004) also works with cases with repeated measuremets but without assumig symmetry of error distributios, ad proposes a alterative approach to estimate the regressio fuctio based o Kotlarski s lemma. See also Carroll et al. (1999); Scheach et al. (2012); Scheach ad Hu (2013); Hu ad Sasaki (2015). Our method of iferece is based o the decovolutio kerel estimatio. We maily focus o (i) the case where a sample draw from the error distributio is available; (ii) the case where validatio data is available for data combiatio; ad (iii) the case where repeated measuremets with errors oe of which is symmetrically distributed are available. For (ii) data combiatio with validatio data, our model shares similarities albeit differet assumptios to that of the oparametric istrumetal variables (NPIV) regressio, for which Horowitz ad Lee (2012), Che ad Christese (2015) ad Babii (2016) develop methods to costruct cofidece bads as we do for oparametric EIV regressio. We ote the followig two refereces as particularly relevat bechmarks for idetifyig our cotributios. Oe referece is Scheach (2004) that derives poitwise asymptotic ormality for the oparametric EIV regressio estimator differet from ours, uder ukow error distributio. To this existig result, our cotributios are four-fold. First, we provide a method of uiform iferece as opposed to a poitwise oe. Secod, we propose a method of badwidth selectio for valid iferece. Third, while the existig result left aside the issue of variace estimatio ad thus are ot readily applicable i practice, we provide a bootstrap method for ease of practical implemetatio. Fourth, we devise lower-level assumptios which are easier to verify with cocrete examples of distributio ad coditioal momet fuctios. The other referece

5 is Delaigle et al. (2015) that suggests a method of poitwise iferece via bootstrap for oparametric EIV regressio with kow error distributio. To this existig result, our cotributios are three-fold. First, our method allows for ukow error distributio. Secod, we provide a method of uiform iferece as opposed to a poitwise oe. Third, we provide formal theories to support the asymptotic validity of our bootstrap method. Delaigle et al. (2015) metio how to modify their methodology to the case where the measuremet error distributio is ukow, ad to costructio of uiform cofidece bads. However, their theoretical results do ot formally cover those cases. Fially, Birke et al. (2010) ad Proksh et al. (2015) obtai cofidece bads for iverse regressio with fixed equidistat desigs (the fixed equidistat desig assumptio is substatial i their setups ad aalyses); the iverse regressio is related to but differet from our EIV regressio (1.1), ad our setup does ot allow fixed equidistat desigs because of measuremet errors. The methodologies ad the proof strategies are also differet; for example both of those papers rely o Gumbel approximatios for validity of the cofidece bads, which we do ot. Importatly, to the best of our kowledge, oe of the existig results covers uiform cofidece bads for EIV regressio (1.1), eve uder the simpler settig that the measuremet error distributio is kow. The preset paper fills this importat void. The rest of the paper is orgaized as follows. I Sectio 2, we iformally preset our methodology to costruct uiform cofidece bads for g. I Sectio 3, we preset asymptotic validity of the proposed cofidece bad uder suitable regularity coditios. I Sectio 4, we propose a practical method to choose the badwidth. I Sectio 5, we coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. I Sectio 6, we apply the proposed method to a combiatio of two empirical data sets. I Sectio 7, we discuss extesios of our results to specificatio testig of the coditioal mea fuctio, cases with additioal regressors without measuremet errors, ad costructio of cofidece bads for the coditioal distributio fuctio. Sectio 8 cocludes. All the proofs are deferred to Appedix. 1.1. Notatios. For a o-empty set T ad a (complex-valued) fuctio f o T, we use the otatio f T = sup t T f(t). Let l (T ) deote the Baach space of all bouded real-valued fuctios o T with orm T. The Fourier trasform of a itegrable fuctio f o is defied by ϕ f (t) = e itx f(x)dx, t, where i = 1 deotes the imagiary uit throughout the paper. We refer to Follad (1999) as a basic referece o Fourier aalysis. For ay positive sequeces a ad b, we write a b if a /b is bouded ad bouded away from zero. For ay a, b, let a b = mi{a, b} ad a b = max{a, b}. For a, b > 0, we use the shorthad otatio [a ± b] = [a b, a + b]. Let = d deote the equality i distributio.

6 K. KATO AND Y. SASAKI 2. Methodology I this sectio, we iformally preset our methodology to costruct cofidece bads for g. The formal aalysis of our cofidece bads will be carried out i the ext sectio. We will also discuss some examples of situatios where a auxiliary sample from the measuremet error distributio is available. 2.1. Decovolutio kerel estimatio. We first itroduce a decovolutio kerel method to estimate f X ad g uder the assumptio that the distributio of ε is kow. Let {(Y 1, W 1 ),..., (Y, W )} be a idepedet sample from the distributio of (Y, W ). I this paper, we assume that the desities of X ad ε exist ad are deoted by f X ad f ε, respectively. Let ϕ W, ϕ X, ad ϕ ε deote the characteristic fuctios of W, X, ad ε, respectively. By the idepedece betwee X ad ε, the desity of W exists ad is give by the covolutio of the desities of X ad ε, amely, f W (w) = (f X f ε )(w) = f X (w x)f ε (x)dx, w, where deotes the covolutio. This i tur implies that the characteristic fuctio of W is idetical to the product of those of X ad ε, amely, ϕ W (t) = ϕ X (t)ϕ ε (t), t. Provided that ϕ ε is o-vaishig o ad ϕ X is itegrable o with respect to the Lebesgue measure (we hereafter omit with respect to the Lebesgue measure ), the Fourier iversio formula yields that f X (x) = 1 2π e itx ϕ X (t)dt = 1 2π e itx ϕ W (t) dt, x. (2.1) ϕ ε (t) The expressio (2.1) leads to a method to estimate f X. However, simply replacig ϕ W by the empirical characteristic fuctio of W, amely, ϕ W (t) = 1 e itw j, t does ot work. Specifically, the fuctio t e itx ϕ W (t)/ϕ ε (t) is ot itegrable o because ϕ ε (t) 0 as t by the iema-lebesgue lemma while ϕ W is the characteristic fuctio of the discrete distributio (i.e., the empirical distributio) ad lim sup t ϕ ε (t) = 1 (cf. Sato, 1999, Propositio 27.28). A stadard approach to dealig with this problem is to use a kerel fuctio to restrict the itegral regio i (2.1) to a compact iterval. Let K : be a kerel fuctio such that K is itegrable o, K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1] (i.e., ϕ K (t) = 0 for all t > 1). Whe f ε is kow, the decovolutio kerel

desity estimator of f X is give by f X(x) = 1 2π e itx ϕ W (t) ϕ K(th ) dt. ϕ ε (t) This estimator was first cosidered by Carroll ad Hall (1988) ad Stefaski ad Carroll (1990). ates of covergece ad poitwise asymptotic ormality of f X are studied i Fa (1991a,b), amog others. Alteratively, by a chage of variables, we may rewrite f X as f X(x) = 1 K ((x W j )/h ), (2.2) h where the fuctio K, called the decovolutio kerel, is defied by K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that K is real-valued sice K (x) = 1 2π e itx ϕ K(t) ϕ ε (t/h ) dt = 1 2π e itx ϕ K( t) ϕ ε ( t/h ) dt = K (x), where z deotes the complex cojugate of a complex umber z. The secod expressio (2.2) resembles a stadard kerel desity estimator without measuremet errors. Aalogously, Fa ad Truog (1993) propose to estimate the regressio fuctio g(x) by ĝ (x) = µ (x)/ f X (x), where µ (x) = 1 2π e itx ( 1 Y je itw j ) ϕk (th ) dt = 1 ϕ ε (t) h Y j K ((x W j )/h ). To uderstad the ratioal behid this estimator, observe that E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = E[g(X)e it(x+ε) ] = E[g(X)e itx ]ϕ ε (t), ad E[g(X)e itx ] is the Fourier trasform of gf X, i.e., E[g(X)e itx ] = ϕ gfx (t). Hece ϕ gfx (t) = E[Y e itw ]/ϕ ε (t), ad provided that ϕ gfx o, the Fourier iversio formula yields that g(x)f X (x) = 1 2π 7 is itegrable e itx E[Y eitw ] dt. (2.3) ϕ ε (t) It is worth poitig out that estimatio of f X ad gf X correspods to solvig certai Fredholm itegral equatios of the first kid, ad therefore estimatio of f X ad gf X (or g) is a statistical illposed iverse problem. I fact, f X ad gf X satisfy f X f ε = f W ad (gf X ) f ε = E[Y W = ]f W ; these are Fredholm itegral equatios of the first kid where the right had side fuctios are directly estimable. 3 ates of covergece ad poitwise asymptotic ormality of ĝ are studied by Fa ad Truog (1993); Fa ad Masry (1992), amog others. The discussio so far has presumed that the distributio of ε is kow. However, i may applicatios, the distributio of ε is ukow, ad hece the estimators f X ad ĝ are ifeasible. 3 See, for example, Che (2007), Carrasco et al. (2007), Cavalier (2008), ad Horowitz (2009) for overview of statistical ill-posed iverse problems.

8 K. KATO AND Y. SASAKI I the preset paper, we assume that there is a idepedet sample {η 1,..., η m } from the distributio of ε: where m = m as. η 1,..., η m f ε i.i.d., We do ot assume that η 1,..., η m are idepedet from {(Y 1, W 1 ),..., (Y, W )}. I Sectio 2.3, we will discuss examples where such observatios from the measuremet error distributio are available. Give {η 1,..., η m }, we may estimate ϕ ε by the empirical characteristic fuctio, amely, ϕ ε (t) = 1 m m e itη j, ad estimate the decovolutio kerel K by the plug-i method: K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that uder the regularity coditios stated below, if t h 1 ϕ ε(t) > 0 with probability approachig oe, so that K is well-defied with probability approachig oe. Note also that K is real-valued. Now, we estimate g(x) by ĝ(x) = µ(x)/ f X (x), where µ(x) = 1 h Y j K ((x W j )/h ) ad fx (x) = 1 h K ((x W j )/h ). Desity estimators of the form f X are studied i Diggle ad Hall (1993), Neuma (1997), ad Efromovich (1997), amog others, ad oparametric regressio estimators of the form ĝ are studied i Delaigle et al. (2008), amog others. 2.2. Costructio of cofidece bads. We ow describe our method to costruct cofidece bads for g based o the estimator ĝ. Uder the regularity coditios stated below, we will show that ĝ(x) g(x) ca be approximated by 1 [{Y j g(x)}k ((x W j )/h ) A (x)] f X (x)h uiformly i x I, where I is a compact iterval i o which f X is bouded away from zero, ad A (x) = E[{Y g(x)}k ((x W )/h )]. Let ad cosider the process Z (x) = s 2 (x) = Var ({Y g(x)}k ((x W )/h )), 1 s (x) [{Y j g(x)}k ((x W j )/h )) A (x)], x I, where s (x) = s 2 (x). Note that uder the regularity coditios stated below, if x I s (x) > 0 for sufficietly large, so that Z is well-defied. Furthermore, we will show that there exists a

tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, P{ Z I z} P{ Z G I z} 0. sup z ecall that Z I = sup x I Z(x). This i tur yields that { } sup P Ẑ I z P { Z G I z } 0, z where {Ẑ(x) : x I} is a process defied by Therefore, if we deote by Ẑ (x) = f X(x) h (ĝ(x) g(x)), x I. (2.4) s (x) c G (1 τ) = (1 τ)-quatile of Z G I for τ (0, 1), the a bad of the form [ ] Ĉ1 τ s (x) (x) = ĝ(x) ± f X (x) c G (1 τ), x I h will cotai g(x), x I with probability at least 1 τ + o(1) as. I fact, it holds that { } { } P g(x) Ĉ 1 τ (x) x I = P Ẑ I c G (1 τ) = P { Z G I c G (1 τ) } + o(1) 1 τ + o(1). I practice, f X (x), s 2 (x), ad c G (1 τ) are all ukow, ad we have to estimate them. We estimate f X (x) ad s 2 (x) by f X (x) ad ŝ 2 (x) = 1 {Y j ĝ(x)} 2 K2 ((x W j )/h ), respectively. Note that (E[A (x)]) 2 is egligible relative to s 2 (x) so that we have igored (E[A (x)]) 2 i estimatio of s 2 (x). Note also that (Y j ĝ(x)) K ((x W j )/h ) = 0. Next, we estimate the quatile c G (1 τ) by the Gaussia multiplier bootstrap. Geerate ξ 1,..., ξ N(0, 1) i.i.d., idepedetly of the data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m }, ad cosider the multiplier process Ẑ ξ (x) = 1 ŝ (x) ξ j {Y j ĝ(x)} K ((x W j )/h ), (2.5) where ŝ (x) = ŝ 2 (x). Note that uder the regularity coditios stated below, if x I ŝ (x) > 0 with probability approachig oe. Coditioally o the data D, Ẑξ is a Gaussia process with mea zero ad covariace fuctio (presumably) close to that of Z. Ideed, for f,x (y, w) = 9

10 K. KATO AND Y. SASAKI {y g(x)}k ((x w)/h )/s (x) ad f,x (y, w) = {y ĝ(x)} K ((x w)/h )/ŝ (x), the covariace fuctio of Ẑξ coditioally o D is E[Ẑξ (x)ẑξ (x ) D ] = 1 f,x (Y j, W j ) f,x (Y j, W j ) for x, x I, which estimates the covariace fuctio of Z G give by E[Z G (x)z G (x )] = E[f,x (Y, W )f,x (Y, W )] E[f,x (Y, W )]E[f,x (Y, W )] for x, x I. Hece, we estimate c G (1 τ) by ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ I give D, which ca be computed via simulatios. Now, the resultig cofidece bad is defied by [ ] ŝ (x) Ĉ 1 τ (x) = ĝ(x) ± f X (x) ĉ (1 τ), x I. (2.6) h Note that, except for the choice of the badwidth, this cofidece bad is completely data-drive. We will discuss practical choice of the badwidth i Sectio 4. emark 2.1 (Novelty of our costructio of the multiplier process). I the error-free case, amely whe we ca observe (Y 1, X 1 ),..., (Y, X ), the deviatio of a stadard kerel regressio estimator ǧ with kerel K from the true regressio fuctio g is uiformly approximated as {f X (x)h } 1 U jk((x X j )/h ) uder suitable regularity coditios. So, to costruct cofidece bads for g via the multiplier bootstrap method, oe would costruct a multiplier stochastic process of the form 1 x σ (x) ξ j U j K((x X j )/h ) with σ (x) = Var(UK((x X)/h )), (2.7) ad the compute the coditioal (1 τ)-quatile of the supremum i absolute value of the multiplier process. I practice, we replace U j ad σ (x) by suitable estimators; for example, a atural estimator of U j would be Ûj = Y j ǧ(x j ). See, for example, Neuma ad Polzehl (1998); see also Sectio 4.3 i Cherozhukov et al. (2013) for applicatios of the multiplier bootstrap method to a differet but related problem of iferece i itersectio boud models usig kerel methods. I the measuremet error case, give a cosmetic similarity betwee the decovolutio kerel estimatio ad the error-free kerel estimatio, oe might be tempted to modify the multiplier process (2.7) by just replacig σ (x) ad K((x X j )/h ) with Var(UK ((x W )/h )) ad K ((x W j )/h ), respectively, but this will ot result i a valid cofidece bad eve if U 1,..., U were assumed to be kow. The reaso is that, i cotrast to the error-free case, approximatio to ĝ(x) g(x) by {f X (x)h } 1 U jk ((x W j )/h )) is icorrect, which highlights oe

distictive feature of oparametric EIV regressio. Hece, i the preset paper, we develop a ovel costructio of the multiplier process (2.5) tailored to oparametric EIV regressio. 2.3. Examples. I this sectio, we preset a couple of examples where a auxiliary sample from the measuremet error distributio are available. Example 2.1 (epeated measuremets or pael data, Carroll et al. (2006), p.298). Suppose that we observe repeated measuremets or pael data o X with measuremet errors: W (1) = X + ε (1), W (2) = X + ε (2) where X ad (ε (1), ε (2) ) are idepedet, ad the coditioal distributio of ε (2) give ε (1) is symmetric. The distributio of ε (1) eed ot be symmetric (i particular, the distributios of ε (1) ad ε (2) may be differet), ad idepedece betwee ε (1) ad ε (2) is ot ecessary. If we defie W = (X (1) + X (2) )/2, ε = (ε (1) + ε (2) )/2, ad η = (W (1) W (2) )/2 = (ε (1) ε (2) )/2, the we have that W = X + ε, ε = d η, where η is observable. For this pael data setup, Scheach (2004) proposes a alterative estimator of g based o Kotlarski s lemma which does ot require the symmetry assumptio. The form of Scheach s estimator is more complex tha ours, ad to the best of our kowledge, there is o existig result o asymptotically valid uiform cofidece bads for Scheach s estimator. It is worth otig that while Scheach s approach ca drop the symmetry assumptio, it requires aother techical assumptio that the characteristic fuctio ϕ X (t) = E[e itx ] of X does ot vaish o the etire real lie. Both Scheach (2004) ad we (ad i fact most of papers o decovolutio ad EIV regressio) assume that the characteristic fuctios of the error variables do ot vaish o, but our approach does allow ϕ X to take zeros. The assumptio that ϕ X does ot vaish o is ot iocuous; it is o-trivial to fid desities that are compactly supported ad have o-vaishig characteristic fuctios (though these properties are ot mutually exclusive; see, e.g., Scheach (2016), Footote 4), ad the assumptio excludes desities covolved with distributios whose characteristic fuctios take zeros, ad so o. 4 So, we believe that Scheach s approach ad ours are complemetary to each other. Example 2.2 (Data combiatio 5 ). Suppose that we have access to data o (Y, W ) ad (W, X), separately, but do ot have access to data o (Y, X). This case is ofte faced by empirical 4 For example, covolutios of k uiform desities o [a, b] are piecewise polyomials with degrees k 1, ad covex combiatios of such piecewise polyomials form a rich family of desities, but their characteristic fuctios take zeros. 5 We thak Tatsushi Oka for poitig out this example. 11

12 K. KATO AND Y. SASAKI researchers, ad various techiques are proposed to combie the two separate samples see a survey by idder ad Moffitt (2007). To fix ideas, cosider the demad model Y = g(x) + U, where Y deotes the quatity purchased of a product ad X deotes the logarithm of its price. Marketig scietists ad ecoomists ofte use Nielse Homesca data for quatities ad prices to aalyze this demad model, but the home-scaed prices i this data are subject to imputatio errors ε = W X. To overcome this issue, Eiav et al. (2010) collect data o (W, X) from a large grocery retailer by matchig trasactio prices X that were recorded by the retailer (at the store) to the prices W recorded by the Homesca paelists. Together with Nielse Homesca data o (Y, W ), Eiav et al. suggest to combie the two separate data sets to aalyze the demad model. Specifically, we ca costruct a sample {Y 1,..., Y, W 1,..., W, η 1,..., η m } from the two separate data o (Y, W ) ad (W, X). I the literature, validatio data are used as a way to relax the classical measuremet error assumptio that X ad ε are idepedet; see, for example, Che et al. (2005). While they allow for o-classical measuremet errors, Che et al. (2005) focus o the case where the parameter of iterest is fiite dimesioal. It is worth otig that, whe validatio data o (X, W ) are available, the problem of estimatio of g ca be cosidered as a oparametric istrumetal variable (NPIV) problem treatig X as a edogeous variable ad W as a istrumetal variable (see, for example, Newey ad Powell, 2003; Hall ad Horowitz, 2005; Bludell et al., 2007; Che ad eiss, 2011; Horowitz, 2011, for NPIV models). I fact, observe that E[Y W ] = E[g(X) W ]. For NPIV models, Horowitz ad Lee (2012) ad the more recet paper by Che ad Christese (2015) develop methods to costruct cofidece bads for the structural fuctio usig series methods, although these papers do ot formally cosider cases where samples o (Y, W ) ad (X, W ) are differet. 6 However, we would like to poit out that there are differece i uderlyig assumptios betwee series estimatio of NPIV models ad decovolutio kerel estimatio i EIV regressio. For example, i series estimatio of NPIV models, it is ofte assumed that the distributio of W is compactly supported ad the desity of W is bouded away from zero o its support (cf. Bludell et al., 2007; Che ad Christese, 2015). O the other had, i EIV regressio, it is commoly assumed that the characteristic fuctio of the measuremet error ε is o-vaishig o (which leads to idetificatio of the fuctio g via (2.3)), ad i may cases the measuremet error ε the has ubouded support, which i tur implies that W has ubouded support. Further, while both NPIV ad EIV regressios are statistical ill-posed iverse problems, the ways i which the ill-posedess is defied are differet; i series estimatio of NPIV models, the ill-posedess is defied for give basis fuctios, while i EIV regressio, the ill-posedess is defied via how 6 Babii (2016) also develop methods to costruct cofidece bads for Tikhoov regularized estimators i NPIV models, but his cofidece bads are asymptotically coservative i the sese that the coverage probabilities are i geeral strictly larger tha the omial level eve asymptotically.

fast the characteristic fuctio of the measuremet error distributio decays. Hece we believe that our iferece results cover differet situatios tha those developed i the NPIV literature. 3. Mai results I this sectio, we study asymptotic validity of the proposed cofidece bad (2.6). To this ed, we make the followig assumptio. For ay give costats β, B > 0, let Σ(β, B) deote a class of fuctios defied by { Σ(β, B) = f : : f is k-times differetiable, } f (k) (x) f (k) (y) B x y β k, x, y, where k is the iteger such that k < β k + 1, ad f (k) deotes the k-derivative of f (f (0) = f). Let I be a compact iterval i. Assumptio 3.1. We assume the followig coditios. (i) E[Y 4 ] <, the fuctio w E[Y 2 W = w]f W (w) is bouded ad cotiuous, ad for each l = 1, 2, the fuctio w E[ Y 2+l W = w]f W (w) is bouded. (ii) The fuctios ϕ X (t) = E[e itx ] ad ψ X (t) = E[g(X)e itx ] for t are itegrable o. (iii) The measuremet error ε has fiite mea, E[ ε ] <, ad its characteristic fuctio, ϕ ε (t) = E[e itε ], t, does ot vaish o. Furthermore, there exist costats C 1 > 1 ad α > 0 such that C 1 1 t α ϕ ε (t) C 1 t α, ϕ ε(t) C 1 t α 1, t 1. (iv) The fuctios f X ad gf X belog to Σ(β, B) for some β > 1/2 ad B > 0. Let k deote the iteger such that k < β k + 1. (v) Let K be a real-valued itegrable fuctio (kerel) o, ot ecessarily o-egative, such that K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1]. Furthermore, ϕ K is (k + 3)-times cotiuously differetiable with ϕ (l) K (0) = 0 for l = 1,..., k. (vi) For all x I, f X (x) > 0 ad E[{Y g(x)} 2 W = x]f W (x) > 0. (log(1/h )) 2 ( m)h 2α+2 0, h log(1/h ) m 13 0, ad h α+β h log(1/h ) 0. (3.1) Coditio (i) is a momet coditio o Y, which we believe is ot restrictive. Note that, for each l = 0, 1, 2, if E[ Y 2+l X, ε] = E[ Y 2+l X], the by comparig the Fourier trasforms of both sides, we arrive at the idetity E[ Y 2+l W = w]f W (w) = ((Υ l f X ) f ε ) (w), where Υ l (x) = E[ Y 2+l X = x], ad the right had side is bouded ad cotiuous if Υ l f X is bouded (which allows Υ l to be ubouded globally). For Coditio (ii), we first ote that ψ X is the Fourier trasform of gf X (which is itegrable by E[ Y ] < ). Coditio (ii) implies that f X

14 K. KATO AND Y. SASAKI ad gf X are (cotiuous ad) bouded, which i tur implies that f W (w) = f X(w x)f ε (x)dx is bouded ad cotiuous. Coditio (ii) is satisfied if, for example, f X ad gf X are twice cotiuously differetiable with itegrable derivatives up to the secod order; i fact, uder such coditios, ϕ X (t) = o( t 2 ) ad ψ X (t) = o( t 2 ) as t. However, differetiability of f X ad gf X is ot strictly ecessary for Coditio (ii) to hold; for example, a Laplace desity is ot differetiable but its Fourier trasform is itegrable. Coditio (iii) is cocered with the characteristic fuctio of the measuremet error. Note that fiiteess of the first momet of ε esures that ϕ ε is cotiuously differetiable. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015), we assume that the measuremet error desity is ordiary smooth, amely, ϕ ε (t) decays at most polyomially fast as t (cf. Fa, 1991a). Iformally, the smoother f ε is, the faster ϕ ε (t) decays as t, so Coditio (iii) restricts smoothess of f ε. Laplace ad Gamma distributios, together with their covolutios, (suitable) mixtures, ad symmetrizatios 7, are typical examples of distributios satisfyig Coditio (iii), but ormal ad Cauchy distributios do ot satisfy Coditio (iii). Normal ad Cauchy desities are examples of super-smooth desities, i.e., their characteristic fuctios decay expoetially fast as t. 8 Coditio (iv) is cocered with smoothess of the fuctios f X ad g. Coditio (v) is about a kerel fuctio. By chages of variables, Coditio (iv) esures that x k+1 K(x) dx < ad xl K(x)dx = i l ϕ (l) K (0) = 0 for l = 1,..., k, that is, K is a (k + 1)-th order kerel (but we allow for the possibility that xk+1 K(x)dx = 0). 9 Coditio (vi) esures that if x I f X (x) > 0 (sice f X is cotiuous) ad if x I E[{Y g(x)} 2 W = x]f W (x) > 0 (see the proof of Lemma A.4-(ii)). Note that sice gf X is bouded, we have that g I gf X I / if x I f X(x) <. It is worth metioig that uder these coditios, we have that s 2 (x) = Var({Y g(x)}k ((x W )/h )) h 2α+1 uiformly i x I (see Lemma A.4), ad the right had side is larger by factor h 2α tha the correspodig term i the error-free case (recall that i stadard kerel regressio without measuremet errors, the variace of UK((x X)/h ) is h ). This results i slower rates of covergece of kerel regressio estimators i presece of measuremet errors tha those i the error-free case, ad the value of α is a key parameter that cotrols the difficulty of estimatig g, 7 ecall that if a radom variable η has characteristic fuctio ϕη, the η η for a idepedet copy η of η has characteristic fuctio ϕ η 2. 8 Covolutios of ordiary smooth ad super-smooth desities are super-smooth, but mixtures of ordiary smooth ad super-smooth desities are ordiary smooth. 9 I the simulatio studies, we will use a flap top kerel (McMurry ad Politis, 2004), which is a ifiite order kerel.

amely, the larger the value of α is, the more difficult estimatio of g will be. I other words, the value of α quatifies the degree of ill-posedess of estimatio of g. Coditio (vii) restricts the badwidth h ad the sample size m from the measuremet error distributio. The secod coditio i (3.1) allows m to be of smaller order tha, which i particular covers the pael data setup discussed i Example 2.1. The last coditio i (3.1) meas that we are choosig udersmoothig badwidths, that is, choosig badwidths that are of smaller order tha optimal rates for estimatio of g. Ispectio of the proof of Theorem 3.1 shows that without the last coditio i (3.1), we have that ĝ g I = O P {h α (h ) 1/2 log(1/h )} + O(h β ), where the O(h β ) term comes from the determiistic bias. So, choosig h (/ log ) 1/(2α+2β+1) optimizes the rate o the right had side, ad the resultig rate of covergece of ĝ g I is O P {(/ log ) β/(2α+2β+1) }. The last coditio i (3.1) requires to choose h of smaller order tha (/ log ) 1/(2α+2β+1) (by log factors), so that the variace term domiates the bias term. We will later discuss the problem of bias after presetig the theorems (see emark 3.3). For Coditio (vii) to be o-void, we require β > 1/2. We first state a theorem that establishes that, uder Assumptio 3.1, the distributio of Ẑ I = sup x I Ẑ(x), where {Ẑ(x) : x I} is defied i (2.4), ca be approximated by that of the supremum of a certai Gaussia process, which is a buildig block for provig validity of the proposed cofidece bad. ecall that a Gaussia process {Z(x) : x I} idexed by I is a tight radom variable i l (I) if ad oly if I is totally bouded for the itrisic pseudo-metric ρ 2 (x, y) = E[{Z(x) Z(y)} 2 ] for x, y I, ad Z has sample paths almost surely uiformly ρ 2 -cotiuous; see va der Vaart ad Weller (1996, p.41). Theorem 3.1 (Gaussia approximatio). Uder Assumptio 3.1, for each sufficietly large, there exists a tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, { } sup P Ẑ I z P { Z G I z } 0. (3.2) z Theorem 3.1 derives a itermediate Gaussia approximatio to the process Ẑ, i the sese that the approximatig Gaussia process Z G depeds o the sample size. It could be possible to further show that, if I is ot sigleto, uder additioal coditios, for some sequeces a > 0 ad b, a ( ẐG I b ) coverges i distributio to a Gumbel distributio. However, while it is mathematically itriguig, we avoid to use the Gumbel approximatio, sice 1) the Gumbel approximatio is slow ad the coverage error of the resultig cofidece bad is of order 1/ log (see Hall, 1991), ad 2) derivig the Gumbel approximatio would require additioal restrictive coditios o the measuremet error distributio. For example, i a problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) derive a 15

16 K. KATO AND Y. SASAKI Gumbel approximatio to the supremum deviatio of the decovolutio kerel desity estimator, thereby establishig a Smirov-Bickel-oseblatt type theorem (Smirov, 1950; Bickel ad oseblatt, 1973) for the decovolutio kerel desity estimator. But to do so, they require more restrictive coditios o the measuremet error distributio tha those i the preset paper (see their Assumptio 2). The followig theorem shows asymptotic validity of the proposed cofidece bad. Theorem 3.2 (Validity of multiplier bootstrap cofidece bad). Uder Assumptio 3.1, as, { Ẑξ } sup P I z D P { Z G I z } P 0, (3.3) z where Z G is a Gaussia radom variable i l (I) give i Theorem 3.1. Therefore, for the cofidece bad Ĉ1 τ defied i (2.6), we have as, { } P g(x) Ĉ1 τ (x) x I = 1 τ + o(1). (3.4) Fially, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 3.1. Ispectio of the proof shows that the result (3.4) holds eve whe τ = τ 0 as. Furthermore, the supremum width of the bad is O P {h α (h ) 1/2 log(1/h ) log(1/τ )}. emark 3.2. If we take h = v (/ log ) 1/(2α+2β+1) for v (log ) 1, the the supremum width of the bad Ĉ1 τ is (/ log ) β/(2α+2β+1) (log ) α+1/2. emark 3.3 (Bias). For ay oparametric iferece problem, how to deal with the determiistic bias is a delicate ad difficult problem. See Sectio 5.7 i Wasserma (2006) for related discussios. I the preset paper, we employ udersmoothig badwidths so that the bias is egligible relative to the variace part. A alterative approach is to estimate the bias at each poit, ad costruct a bias correct cofidece bad. See, for example, Eubak ad Speckma (1993) ad Xia (1998) for the error-free case. 10 However, i EIV regressio, estimatio of the bias is ot quite attractive for a couple of reasos. First, the bias cosists of higher order derivatives of g ad f X, ad estimatio of these higher order derivatives is difficult, especially i the EIV case. This is because estimatio of g ad f X is a ill-posed iverse problem ad rates of covergece of the derivative estimators of g ad f X are eve slower tha those i the error-free case. Secod, oe of popular kerels used i EIV regressio ad decovolutio is a flap top kerel (McMurry ad Politis, 2004) which is a ifiite order kerel, ad if we use a flap top kerel, the the bias 10 More recet discussios regardig the problem of bias i oparametric iferece problems iclude Hall ad Horowitz (2013), Cherozhukov et al. (2014b), Armstrog ad Kolesár (2014), Caloico et al. (2015), ad Scheach (2015). These paper do ot cover EIV regressio.

is ot calculated i a closed form. 11 See emark 1 i Bissatz et al. (2007) for a related issue i the decovolutio case. emark 3.4 (Super-smooth case). I the preset paper, we focus o the case where the measuremet error desity is ordiary smooth, similarly to Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015) that study iferece i decovolutio ad oparametric EIV regressio. If the measuremet error desity is super-smooth, i.e., its characteristic fuctio decays expoetially fast as t, the 1) i view of the poitwise asymptotic ormality result i Fa ad Masry (1992), the asymptotic behavior of the variace fuctio s 2 (x) is much more complex; 2) miimax rates of covergece for estimatio of g uder the sup-orm loss are logarithmically slow (i.e., of the form (log ) c for some costat c > 0), eve whe the measuremet error distributio is assumed to be kow (Fa ad Truog, 1993). These difficulties prevet us from directly extedig our aalysis to the super-smooth case. Hece the super-smooth case is left for future research. The proofs of Theorems 3.1 ad 3.2 build o o-trivial applicatios of the itermediate Gaussia ad multiplier bootstrap approximatio theorems developed i Cherozhukov et al. (2014a,b, 2016). However, we stress that Theorems 3.1 ad 3.2 do ot follow directly from the geeral theorems i Cherozhukov et al. (2014a,b, 2016) ad require substatial work. This is because 1) first of all, how to device a multiplier bootstrap i EIV regressio is ot apparet, ad as discussed i emark 2.1 our costructio of the multiplier process appears to be ovel; 2) the populatio decovolutio kerel K is implicitly defied via the Fourier iversio ad substatially differet from stadard kerels i the error-free case; ad 3) the decovolutio kerel K is i fact ukow ad estimated, so that its estimatio error has to be take ito accout. A alterative stadard techique to derive Gaussia approximatios similar to (3.2) is to apply the Komlós-Major-Tusády (KMT) strog approximatio (Komlós et al., 1975). problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) (ad Schmidt-Hieber et al. (2013)) use the KMT approximatio to derive Gaussia approximatios to the decovolutio kerel desity estimator. However, the KMT approximatio is tailored to empirical processes idexed by uivariate fuctios ad hece is ot applicable to our problem. Alteratively, we ca use io s couplig (see io, 1994), but to apply io s couplig, we would have to assume (at least) that Y is bouded (rather tha fiite fourth momet) ad K has total variatio of order h α (which requires additioal coditios o the measuremet error distributio). By employig the techiques developed i Cherozhukov et al. (2014a,b, 2016), we are able to avoid such restrictive coditios. 17 I a 11 For example, Scheach (2004) ad Bissatz et al. (2007) use flap top kerels i their simulatio studies.

18 K. KATO AND Y. SASAKI 4. Badwidth selectio The theory developed i the previous sectio prescribes admissible rates for the badwidth h that require udersmoothig. The literature provides data-drive approaches to badwidth selectio, which typically aim at miimizig the MISE (cf. Delaigle ad Hall, 2008). These datadrive approaches ted to yield o-udersmoothig rates for the badwidth, ad are cotrary to our requiremets. I this light, we propose here a ovel alterative approach to the badwidth selectio. To emphasize the depedece o a arbitrary cadidate badwidth h > 0, write s 2 (x; h) = Var({Y g(x)}k ((x W )/h; h)), A (x; h) = E[{Y g(x)}k ((x W )/h; h)], K (x; h) = 1 e itx ϕ K(t) 2π ϕ ε (t/h) dt. ad Note that A (x) = A (x; h ), s 2 (x) = s (x; h ), ad K (x) = K (x; h ). A optimal choice h (igorig the log factor) balaces the uiform squared bias A 2 ( ; h) I ad the uiform variace s 2 ( ; h)/ I, i.e., h A2 ( ; h) I = h=h h s2 ( ; h)/ I h=h A atural way of udersmoothig is to choose the smallest h > 0 such that c h A2 ( ; h) I h s2 ( ; h)/ I for some c > 1 where c is icreasig i. We will try alterative sequeces {c } =1 i the subsequet simulatio studies to recommed practical choices. I practice, we do ot kow g or the distributio of (Y, X). For g to be used for badwidth selectio, we use a polyomial regressio g uder EIV, e.g., g(x) = g 0 + g 1 x where ) ( ( g0 = g 1 1 1 W j 1 W j 1 W 2 j m 1 m η2 j ) 1 ( 1 Y ) j 1 W. jy j A polyomial of degree three will be employed throughout i the simulatio studies. We make a grid 0 < h,1 <... < h,j of cadidate badwidths, ad the choose h,j with the smallest j {2,..., J} such that c ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ),

where ŝ 2 (x; h) = 1 Â (x; h) = 1 K (x; h) = 1 2π {Y j g(x)} 2 K2 ((x W j )/h; h) Â2 (x; h), {Y j g(x)} K ((x W j )/h; h), ad e itx ϕ K(t) ϕ ε (t/h) dt Because we use the fiite sample estimates, either ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ) eeds to be mootoe icreasig i the idex j i geeral. As ( Â2 such, we mootoize these differeces ) of the estimates i the followig maer. Let ( ; h,j ) I Â2 ( ; h,j 1 ) I ad ŝ 2,j = ( ŝ 2 ( ; h,j )/ I ŝ 2 ) ( ; h,j 1 )/ I. Â,j = The mootoizatio algorithm executes the followig assigmets i the icreasig order of j: Â,j if Â,j > Â,j+1 Â,j+1 := ad Â,j+1 if Â,j Â,j+1 ŝ 2 ŝ 2,j if ŝ 2,j,j+1 := < ŝ2,j+1. ŝ 2,j+1 if ŝ 2,j ŝ2,j+1 emark 4.1. The above guide to badwidth selectio applies to the case of α > 1/2. We could accommodate the case of α 1/2 if we modify this method by replacig s 2 (x; h), A (x; h), ŝ 2 (x; h) ad Â(x; h) by s 2 (x; h)/h 2, A (x; h)/h, ŝ 2 (x; h)/h 2 ad Â(x; h)/h, respectively. We implemeted simulatio studies uder both of these two alterative methods of badwidth selectio, ad foud that the method described above shows superior performaces i terms of the distace betwee omial ad simulated coverage probabilities for the data geeratig models that we cosider. Therefore, we oly suggest the method which we describe above, ad preset simulatio studies below oly for this versio of badwidth selectio rule. 5. Simulatio studies 5.1. Simulatio Framework. We cosider two data geeratig models, reflectig two commo patters of data availability. For the first model, the data D = {(Y j, W j, η j )} is costructed by Model 1 Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) d W j = X j + ε j ε j = ηj Laplace (0, 2 1/2 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε j, ad η j are mutually idepedet. The characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /2) 1, which is o-vaishig o ad ordiary smooth of order α = 2. The sigal-to-oise ratio is Var(X)/ Var(ε) = σ X. 19 or

20 K. KATO AND Y. SASAKI For the secod model, we cosider the followig repeated measuremet or pael data setup. Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) Model 2 W (1) j = X j + ε (1) j ε (1) j Laplace (0, 2 1 ) W (2) j = X j + ε (2) j ε (1) j Laplace (0, 2 1 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε (1) j, ad ε (2) j are mutually idepedet. We observe {(Y j, W (1) j, W (2) j )}. By defiig W j := (W (1) j + W (2) j )/2 ad η j := (W (1) j W (2) j )/2, we obtai the geerated data D = {(Y j, W j, η j )} such that W j = X j + ε j with ε j = (ε 1 +ε 2 )/2 d = η j. For Model 2, the characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /16) 2, which is o-vaishig o ad ordiary smooth with order α = 4. The sigal-to-oise ratio is give by Var(X)/ Var(ε) = 2σ X. Simulatios are ru across five differet specificatios of g, ad alterative values of the sigalto-oise ratio σ X {2, 4}. The five specificatios of g are g(x) = x, g(x) = x 2, g(x) = x 3, g(x) = si(x), ad g(x) = cos(x). We use Mote Carlo simulatios to evaluate the coverage probabilities of our cofidece bads for g o the iterval I = [ σ X, σ X ]. We use the kerel fuctio K defied by its Fourier trasform ϕ K give by 1 if t c { } ϕ K (t) = exp b exp( b/( t c) 2 ) if c < t < 1 ( t 1) 2 0 if 1 t where b = 1 ad c = 0.05 (cf. McMurry ad Politis, 2004; Bissatz et al., 2007). The fuctio ϕ K is ifiitely differetiable with support [ 1, 1], ad its iverse Fourier trasform K is realvalued ad itegrable with K(x)dx = 1. We follow the badwidth selectio rule discussed i Sectio 4. I this simulatio study, we try alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) 0.5. 5.2. Simulatio esults. Tables 1, 2, 3, 4, ad 5 show simulatio results for g(x) = x, x 2, x 3, si(x), ad cos(x), respectively. Each table cotais results for each of Model 1 ad Model 2, for each of the three sample sizes = 250, 500, ad 1000, ad for each of σ X = 2.0 ad 4.0 that cotrols the sigal-to-oise ratio. Simulated coverage probabilities are reported for each of the three omial coverage probabilities, 0.800, 0.900, ad 0.950. I all the cases, simulated coverage probabilities are reasoably close to the desiged omial coverage probabilities for large sample sizes. I particular, the results for polyomial specificatios exhibit a very high coverage accuracy. The high performace for the polyomial specificatios may well be imputed to our method of badwidth selectio which relies o a prelimiary polyomial regressio uder EIV. However, it is otable that the coverage accuracy is reasoably high eve for o-polyomial periodic fuctios like g(x) = si(x) ad g(x) = cos(x).

There seems o systematic patter as to which of the alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) 0.5 ted to yield better coverage results. As such, we recommed the itermediate choice c = (/100) 0.3 as a practical guidelie. 21 6. eal data aalysis Accordig to Ceters for Disease Cotrol ad Prevetio (CDC) of the US Departmet of Health ad Huma Services, more tha oe-third (36.5%) of US adults have obesity (defied by body mass idex or BMI > 30) i the period betwee 2011 ad 2014 (Ogde et al., 2015). The estimated aual medical cost of obesity i the Uited States was 147 billio 2008 U.S. dollars, with the medical costs for people who are obese beig $1,429 higher tha those of ormal weight (Fikelstei et al., 2009). While there is a extesive body of literature o cost estimatio of obesity, it is a limitatio that commoly used data sets cotai oly self-reported body measures, ad hece the values of BMI geerated from them are proe to biases (Boud, et al., 2001). More recetly, Cawley ad Meyerhoefer (2012) use the istrumetal variable approach to address this issue i cost estimatio of obesity. I this sectio, we employ our data combiatio approach to treat the self-reportig errors, ad draw cofidece bads for oparametric regressios of medical costs o BMI. We focus o costs measured by medical expeditures. With this said, we ote that there are also idirect costs of obesity which we do ot accout for, e.g., the costs of obesity are kow to be passed o to obese workers with employer-sposored health isurace i the form of lower cash wages ad labor market discrimiatio agaist obese job seekers by isurace-providig employers (Bhattacharya ad Budorf, 2009) see also Cawley (2004). Details of the two data sets which we combie are as follows. The Natioal Health ad Nutritio Examiatio Survey (NHANES) of CDC cotais data of survey resposes, medical examiatio results, ad laboratory test results. The survey resposes iclude demographic characteristics, such as geder ad age. I additio to the demographic characteristics, the survey resposes also cotai self-reported body measures ad self-reported health coditios. Amog the self reported body measures are height i iches ad weight i pouds. These two variables allow us to costruct the BMI i lbs/i 2 as a geerated variable. We covert this uit ito the metric uit (kg/m 2 ). The NHANES also cotais medical examiatio results, icludig cliically measured BMI i kg/m 2. We treat the BMI costructed from the self-reported body measures as W j, ad the cliically measured BMI as X j. From the NHANES as a validatio data set of size m, we ca compute η j = W j X j for each j = 1,..., m. The Pael Survey of Icome Dyamics (PSID) is a logitudial pael survey of America families coducted by the Survey esearch Ceter at the Uiversity of Michiga. This data set cotais a log list of variables icludig demographic characteristics, socio-ecoomic attributes, expeses, ad health coditios, amog others. I particular, the PSID cotais self-reported body measures of the household head, icludig height i iches ad weight i pouds. These

22 K. KATO AND Y. SASAKI two variables allow us to costruct the body mass idex (BMI) i lbs/i 2 as a geerated variable. Agai, we covert this uit ito the metric uit (kg/m 2 ). The PSID also cotais medical ad prescriptio expeses. We treat the BMI costructed from the self-reported body measures as W j, ad the medical ad prescriptio expeses as Y j. We ote that the iformatio cotaied i the PSID are mostly at the household level, as opposed to the idividual level, ad thus Y j idicates the total medical ad prescriptio expeses of household j. To focus o the idividual medical ad prescriptio expeses rather tha household expeses, we oly cosider the subsample of the households of sigle me with o depedet family, for which the total medical ad prescriptio expeses of the household equal to the idividual medical ad prescriptio expeses of the household head. Hece, the reported regressio results cocer these selected subpopulatios. Combiig the NHANES of size m ad the PSID of size, we obtai the geerated data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m } to which we ca apply our method i order to draw cofidece bads for the regressio fuctio g of the model Y = g(x) + U with E[U X, ε] = 0. We set I = [15, 35] as the iterval o which we draw cofidece bads. This iterval I has 25 (the WHO cut-off poit for overweight) as the midpoit, ad is cotaied i the covex hull of the empirical support of W. The kerel fuctio ad the badwidth rule carry over form our simulatio studies. The sequece {c } =1 used for badwidth choice is defied by c = (/100) 0.3 followig the recommedatio which we made from our simulatio results. To accout for the differet medical coditios across ages, we categorize the sample ito the followig subsamples: (a) male idividuals aged 20 34, (b) male idividuals aged 35 49, (c) male idividuals aged 50 64, ad (d) male idividuals aged 65 or above. Note that this stratificatio takes ito accout the fact that 64 ad 65 make the cutoff of medicare eligibility, ad hece that group (d) faces differet expediture schedules ad differet ecoomic icetives of health care utilizatio from groups (a) (c) see Card et al. (2008). After deletig observatios with missig fields from the NHANES 2009-2010, we obtai the followig sample sizes of these four subsamples: (a) m = 407, (b) m = 435, (c) m = 407, ad (d) m = 431. After deletig observatios with missig fields from the PSID 2009 for total medical expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 413, (b) = 181, (c) = 180, ad (d) = 64. Similarly, after deletig observatios with missig fields from the PSID 2009 for prescriptio expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 528, (b) = 243, (c) = 247, ad (d) = 106. Note that we use similar survey periods aroud 2009 for both the NHANES ad PSID to remove potetial time effects. Figure 1 displays estimates ad cofidece bads for total medical expeses i 2009 US dollars as the depedet variable. Figure 2 similarly displays estimates ad cofidece bads for prescriptio expeses i 2009 US dollars as the depedet variable. I both figures, the estimates

are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads. The four parts of the figure represet (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. We see that the levels of both total medical expeses ad prescriptio expeses ted to icrease i age, as expected. For the groups (a) (b) of youg me, both total medical expeses ad ad prescriptio expeses exhibit little partial correlatio with BMI. For the group (c) of middle aged me, o the other had, the relatios tur ito positive oes. For the group (d) of seior me, total medical expeses ad BMI cotiue to have a positive relatioship, but prescriptio expeses exhibit little partial correlatio with BMI. If we look at the 90% cofidece bad for the group (c) of me aged from 50 to 64, aual average total medical expeses are approximately $5,399 $17,015 if BMI = 20, approximately $7,316 $18,119 if BMI = 25, ad approximately $7,868 $21,934 if BMI = 30. Likewise, aual average prescriptio expeses are approximately $283 $636 if BMI = 20, approximately $372 $761 if BMI = 25, ad approximately $429 $951 if BMI = 30. These cocrete umbers illustrate that cofidece bads are useful to make iterval predictios of icurred average costs, ad this coveiet feature has practical values added to the existig methods which oly allow for reportig estimates with ukow extets of ucertaities. 7. Extesios 7.1. Applicatio to specificatio testig. The results of the preset paper ca be used for specificatio testig of the regressio fuctio g. 23 Specificatio testig i EIV models is importat sice oparametric estimatio of a regressio fuctio has slow rates of covergece, eve slower tha stadard error-free oparametric regressio, while correct specificatio of a parametric model eables us to estimate the regressio fuctio with faster rates, ofte of oder 1/. Suppose that we wat to test whether the regressio fuctio g belogs to a parametric class {g θ : θ Θ} where Θ is a subset of a metric space (i most cases a Euclidea space). Popular specificatios of g iclude liear ad polyomial fuctios. I cases where g is liear or polyomial, it is possible to estimate the coefficiets with -rate uder suitable regularity coditios (Fuller, 1987; Cha ad Mak, 1985; Hausma et al., 1991; Cheg ad Scheeweiss, 1998). Suppose ow that g = g θ for some θ Θ ad θ ca be estimated by θ with a sufficietly fast rate, i.e., g g θ θ I = o P {h α {h log(1/h )} 1/2 }, ad that Assumptio 3.1 is satisfied with g = g θ. The it is ot difficult to see from the proof of Theorem 3.2 that f X (x) h (ĝ(x) g θ(x)) ŝ (x) uiformly i x I, so that = f X (x) h (ĝ(x) g θ (x)) ŝ (x) = f X (x) h (ĝ(x) g θ (x)) ŝ (x) { } P g θ(x) / Ĉ1 τ (x) for some x I τ. + f X (x) h (g θ (x) g θ(x)) ŝ (x) + o P {(log(1/h )) 1/2 },

24 K. KATO AND Y. SASAKI Therefore, the test that rejects the hypothesis that g = g θ for some θ Θ if g θ(x) / Ĉ1 τ (x) for some x I is asymptotically of level τ. We summarize the above discussio as a corollary. Corollary 7.1. Suppose that g = g θ for some θ Θ where Θ is a subset of a metric space, ad that Assumptio 3.1 is satisfied with g = g θ. Let θ be ay estimator of θ such that g θ g θ I = o P {h α {h log(1/h )} 1/2 }; the P{g θ(x) / Ĉ1 τ (x) for some x I} τ. emark 7.1 (Literature o specificatio testig i EIV regressio). The literature o specificatio testig for EIV regressio is large. See Zhu et al. (2003), Zhu ad Cui (2005), Hall ad Ma (2007), Sog (2008), Otsu ad Taylor (2016), ad refereces therei. However, oe of those papers cosiders L -based specificatio tests. 7.2. Additioal regressors without measuremet errors. I practical applicatios, we may have additioal regressors Z, possibly vector valued, without measuremet errors. Suppose that we are iterested i estimatio ad makig iferece o g(x, z) = E[Y X = x, Z = z]. We assume that E[Y g(x, Z) X, Z, ε] = 0, ad ε is idepedet from X coditioally o Z. I priciple, the aalysis ca be reduced to the case where there are o additioal regressors by coditioig o Z = z. If Z is discretely distributed with fiitely may mass poits, the g(x, z), where z is a mass poit, ca be estimated by usig oly observatios j for which Z j = z. If Z is cotiuously distributed, the g(x, z) ca be estimated by usig observatios j for which Z j is close to z, which ca be implemeted by usig kerel weights. However, the detailed aalysis of this case is ot preseted here for brevity. 7.3. Cofidece bads for coditioal distributio fuctios. The techiques used to derive cofidece bads for the coditioal mea i EIV regressio ca be exteded to the coditioal distributio fuctio. Suppose ow that we are iterested i costructig cofidece bads for the coditioal distributio fuctio g(y, x) = P(Y y X = x) o a compact rectagle J I where J ad I are compact itervals, ad where we do ot observe X but istead observe W = X + ε with ε (measuremet error) beig idepedet of (Y, X). As before, we assume that i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} o (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio. Sice g(y, x) = E[1(Y y) X = x] where 1( ) deotes the idicator fuctio, we may estimate g(y, x) by ĝ(y, x) = µ(y, x)/ f X (x), where µ(y, x) = 1 h 1(Y j y) K ((x W j )/h ). To costruct a cofidece bad for g(y, x), we apply the methodology developed i Sectio 2 with Y j replaced by 1(Y j y) for each y. Let ŝ 2 (y, x) = 1 {1(Y j y) ĝ(y, x)} 2 K2 ((x W j )/h ),

ad geerate idepedet stadard ormal radom variables ξ 1,..., ξ idepedet of the data D. Cosider the multiplier stochastic process Ẑ(y, ξ 1 x) = ŝ (y, x) ξ j {1(Y j y) ĝ(y, x)} K ((x W j )/h ), ad for τ (0, 1), let ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ J I give D. The the resultig cofidece bad for g(y, x) o J I is give by [ ] Ĉ 1 τ (y, x) = ĝ(y, x) ± ŝ(y, x) f X (x) ĉ (1 τ), (y, x) J I. h We make the followig assumptio, which is aalogous to Assumptio 3.1. Assumptio 7.1. Let I, J be compact itervals i. (i) The fuctio (y, w) P(Y y W = w)f W (w) is cotiuous i w uiformly i y J. (ii) The characteristic fuctio of X, ϕ X (t) = E[e itx ], t, is itegrable o. Furthermore, sup y J E[g(y, X)eitX ] dt <. (iii) Coditio (iii) i Assumptio 3.1. (iv) The fuctios f X ad g(y, )f X ( ) belog to Σ(β, B) for some β > 1/2 ad B > 0 for all y J. Let k deote the iteger such that k < β k + 1. (v) Coditio (v) i Assumptio 3.1. (vi) For all x I, f X (x) > 0, ad if (y,x) J I E[{1(Y y) g(y, x)} 2 W = x]f W (x) > 0. (vii) Coditio (vii) i Assumptio 3.1. Theorem 7.1. Uder Assumptio 7.1, as, P{g(y, x) Ĉ1 τ (y, x) (y, x) J I} 1 τ. Furthermore, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 7.2. To the best of our kowledge, Theorem 7.1 is also a ew result. 8. Coclusio I this paper, we develop a method to costruct uiform cofidece bads for oparametric EIV regressio fuctio g. We cosider the practically relevat case where the distributio of the measuremet error is ukow. We assume that there is a idepedet sample from the measuremet error distributio, where the sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. Such a sample from the measuremet error distributio is available if there is, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed. We establish asymptotic validity of the proposed cofidece bad for ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric EIV regressio. We also propose a practical 25

26 K. KATO AND Y. SASAKI method to choose a udersmoothig badwidth for valid iferece. Simulatio studies verify the fiite sample performace of the proposed cofidece bad. Fially, we discuss extesios of our results to specificatio testig, cases with additioal regressors without measuremet errors, ad cofidece bads for coditioal distributio fuctios.

27 Appedix A. Proofs A.1. Techical tools. I this sectio, we collect techical tools that will be used i the proofs of Theorems 3.1 ad 3.2. The proofs rely o moder empirical process theory. For a probability measure Q o a measurable space (S, S) ad a class of measurable fuctios F o S such that F L 2 (Q), let N(F, Q,2, δ) deote the δ-coverig umber for F with respect to the L 2 (Q)- semiorm Q,2. The class F is said to be poitwise measurable if there exists a coutable subclass G F such that for every f F there exists a sequece g m G with g m f poitwise. A fuctio F : S [0, ) is said to be a evelope for F if F (x) sup f F f(x) for all x S. See Sectio 2.1 i va der Vaart ad Weller (1996) for details. Lemma A.1 (A useful maximal iequality). Let X, X 1,..., X be i.i.d. radom variables takig values i a measurable space (S, S), ad let F be a poitwise measurable class of (measurable) real-valued fuctios o S with measurable evelope F. Suppose that there exist costats A e ad V 1 such that sup N(F, Q,2, ε F Q,2 ) (A/δ) V, 0 < δ 1, Q where sup Q is take over all fiitely discrete distributios o S. Furthermore, suppose that 0 < E[F 2 (X)] <, ad let σ 2 > 0 be ay positive costat such that sup f F E[f 2 (X)] σ 2 E[F 2 (X)]. Defie B = E[max 1 j F 2 (X j )]. The E 1 {f(x j ) E[f(X)]} F ( C V σ 2 log where C > 0 is a uiversal costat. A ) ( E[F 2 (X)] + V B log σ A ) E[F 2 (X)], σ Proof. See Corollary 5.1 i Cherozhukov et al. (2014a). Lemma A.2 (A auxiliary maximal iequality). Let ζ 1,..., ζ be radom variables such that E[ ζ j r ] < for all j = 1,..., for some r 1. The [ ] E max ζ j 1/r max (E[ ζ j r ]) 1/r. 1 j 1 j Proof. This iequality is well kow, ad follows from Jese s iequality. Ideed, E[max 1 j ζ j ] (E[max 1 j ζ j r ]) 1/r ( E[ ζ j r ]) 1/r 1/r max 1 j (E[ ζ j r ]) 1/r. The followig ati-cocetratio iequality for the supremum of a Gaussia process will play a crucial role i the proofs of Theorems 3.1 ad 3.2.

28 K. KATO AND Y. SASAKI Lemma A.3 (Ati-cocetratio for the supremum of a Gaussia process). Let T be a oempty set, ad let X = (X t : t T ) be a tight Gaussia radom variable i l (T ) with mea zero ad E[Xt 2 ] = 1 for all t T. The for ay h > 0, sup P{ X T x h} 4h(1 + E[ X T ]). x Proof. See Corollary 2.1 i Cherozhukov et al. (2014b); see also Theorem 3 i Cherozhukov et al. (2015). A.2. Proof of Theorem 3.1. I what follows, we always assume Assumptio 3.1. Before provig Theorem 3.1, we first prove some prelimiary lemmas. ecall that A (x) = E[{Y g(x)}k ((x W )/h )] ad s 2 (x) = Var({Y g(x)}k ((x W )/h )). Observe that K = O(h α ) uder our assumptio. I what follows, the otatio sigifies that the left had side is bouded by the right had side up to a positive costat idepedet of ad x. Lemma A.4. The followig bouds hold: (i) A I = O(h β+1 ). (ii) For sufficietly large, if x I s 2 (x) h 2α+1. (iii) For l = 0, 1, 2, we have sup x E[ Y K ((x W )/h ) 2+l ] = O(h (2+l)α+1 ). Proof. (i). Sice E[Y e itw ] = E[{g(X) + U}e it(x+ε) ] = ψ X (t)ϕ ε (t), we have that E[Y K ((x W )/h )] = h 2π = h 2π e itx E[Y e itw ] ϕ K(th ) dt ϕ ε (t) e itx ψ X (t)ϕ K (th )dt. Sice ψ X ( ) ad ϕ K ( h ) are the Fourier trasforms of gf X ad h 1 K( /h ), respectively, the Fourier iversio formula yields that h 2π e itx ( ψ X (t)ϕ K (th )dt = h gfx (h 1 = K( /h )) ) (x) g(w)f X (w)k((x w)/h )dw. Note that the far left ad right had sides are cotiuous i x, ad so the equality holds for all x. Likewise, we have E[K ((x W )/h )] = f X(w)K((x w)/h )dw for all x, so that A (x) = {g(w) g(x)}k((x w)/h )f X (w)dw = h {g(x h w) g(x)}f X (x h w)k(w)dw.

29 By the Taylor expasio, for ay x, w, {g(x h w) g(x)}f X (x h w) = k 1 (gf X ) (j) (x) g(x)f (j) X (x) ( h w) j j! + (gf X) (k) (x θh w) g(x)f (k) X (x θh w) ( h w) k, k! for some θ [0, 1]. Sice wj K(w)dw = 0 for j = 1,..., k ad f X, gf X Σ(β, B), we have {g(x h w) g(x)}f X (x h w)k(w)dw k = (gf X ) {g(x (j) (x) g(x)f (j) X h w) g(x)}f X (x h w) (x) ( h w) j K(w)dw j! (1 + g I)Bh β w β K(w) dw. k! This shows that A I = O(h β+1 ). (ii). Sice A (x) = E[{Y g(x)}k ((x W )/h )] = O(h β+1 ) uiformly i x I, it suffices to show that if E[{Y x I g(x)}2 K((x 2 W )/h )] (1 o(1))h 2α+1. Observe that E[Y W = w]f W (w) = ((gf X ) f ε ) (w) (compare the Fourier trasforms of both sides), ad defie V (x, w) = E[{Y g(x)} 2 W = w]f W (w) = (E[Y 2 W = w] + g 2 (x))f W (w) 2g(x) ((gf X ) f ε ) (w). The fuctio (gf X ) f ε is bouded ad cotiuous by boudedess of gf X. Sice E[Y 2 W = ], f W, ad (gf X ) f ε are bouded ad cotiuous o, ad g is bouded ad cotiuous o I, we have that the fuctio (x, w) V (x, w) is bouded ad cotiuous o I. I particular, sice V (x, x) > 0 for all x I uder our assumptio, we have that if x I V (x, x) > 0. Now, observe that E[{Y g(x)} 2 K((x 2 W )/h )] = V (x, w)k((x 2 w)/h )dw = h V (x, x h w)k(w)dw. 2 Furthermore, we have that K(w)dw 2 = 1 ϕ K (t) 2 dt h 2α 2π ϕ ε (t/h ) 2 by Placherel s theorem. Hece, it suffices to show that h2α {V (x, x h w) V (x, x)}k(w)dw 2 0. sup x I (A.1)

30 K. KATO AND Y. SASAKI From the proof of Lemma 3 i Kato ad Sasaki (2016), we have that h 2α K 2 (x) mi{1, x 2 }. By the defiitio of V (x, w), for ay ρ > 0, there exists sufficietly small δ > 0 such that V (x, x + w) V (x, x) ρ for all x I wheever w δ. Therefore, sup V (x, x h w) V (x, x) h 2α K(w)dw 2 x I ρ mi{1, w 2 }dw + 2 V I w 2 dw ρ + o(1). w δ/h w >δ/h (iii). Pick ay l = 0, 1, 2. Sice K h α, we have that E[ Y K ((x W )/h ) 2+l ] = h E[ Y 2+l W = x h w] K (w) 2+l f W (x h w)dw h lα+1 V l (x h w)k 2 (w)dw h lα+1 V l K(w)dw 2 h (2+l)α+1, where V l (w) = E[ Y 2+l W = w]f W (w). This completes the proof. Lemma A.5. ϕ ε ϕ ε [ h 1,h 1 ] = O P{m 1/2 log(1/h )}. Proof. See Lemma 4 i Kato ad Sasaki (2016); see also Theorem 4.1 i Neuma ad eiß (2009). Cosider the followig classes of fuctios F (1) = {(y, w) yk ((x w)/h ) : x }, F (2) = F (3) F (4) = { (y, w) 1 s (x) {y g(x)}k ((x w)/h ) : x I = {(y, w) {y g(x)}k((x 2 w)/h ) : x I}, { (y, w) 1 s 2 (x){y g(x)}2 K((x 2 w)/h ) : x I }, }. (A.2) I view of the fact that K h α (idepedet of ) such that K D 1 h α ad if x I s (x) h α+1/2, choose costats D 1, D 2 > 0 ad 1/s I D 2 h α 1/2. Let F (1) (y, w) = D 1 y h α, F (2) (y, w) = D 1 D 2 ( y + g I )/ h, F (3) (y, w) = D 1 ( y + g I )h 2α, F (4) (y, w) = {F (2) (y, w)} 2. Note that F (l) is a evelope fuctio for F (l) for each l = 1,..., 4. Lemma A.6. There exist costats A, v e idepedet of such that sup Q N(F (l), Q,2, δ F (l) Q,2 ) (A/δ) v, 0 < δ 1, (A.3) for all l = 1,..., 4, where sup Q is take over all fiitely discrete distributios o 2.

31 Proof. Cosider the followig classes of fuctios K = {w K ((x w)/h ) : x }, K 2 = {f 2 : f K }. Lemma 1 i Kato ad Sasaki (2016) ad Corollary A.1 i Cherozhukov et al. (2014a) yield that there exist costats A 1, v 1 e idepedet of such that sup Q N(K, Q,2, D 1 h α δ) (A 1 /δ) v 1 ad sup Q N(K, 2 Q,2, D1 2h 2α δ) (A 1 /δ) v 1 for all 0 < δ 1. I what follows, we oly prove (A.3) for l = 2; the proofs for the other cases are completely aalogous give the above bouds o the coverig umbers for K ad K. 2 Let H = {y {y g(x)}/s (x) : x I}, ad observe that, sice 1/s I D 2 h α 1/2, there exist costats A 2, v 2 e idepedet of such that sup Q N(H, Q,2, δ H Q,2 ) (A 2 /δ) v 2 for all 0 < δ 1, where H (y) = D 2 ( y + g I )h α 1/2 is a evelope fuctio for H. This ca be verified by a direct calculatio, or observig that H ( {y ay + b : a > 0, b }) is a VC subgraph class with VC idex at most 4 (cf. va der Vaart ad Weller, 1996, Lemma 2.6.15), ad applyig Theorem 2.6.7 i va der Vaart ad Weller (1996). Let H K := {(y, w) f 1 (y)f 2 (w) : f 1 H, f 2 K } F (2), ad ote that H (y)d 1 h α = F (2) (y, w). From Corollary A.1 i Cherozhukov et al. (2014a), there exist costats A 3, v 3 e idepedet of such that sup Q N(H K, Q,2, δ F (2) Q,2 ) (A 3 /δ) v 3 for all 0 < δ 1. Now, the desired result follows from the observatio that N(F (2), Q,2, 2δ) N(H K, Q,2, δ) for all δ > 0. Lemma A.7. We have f X ( ) E[ f X ( )] = O P {h α (h ) 1/2 log(1/h )} ad E[ f X ( )] f X ( ) = O(h β ) = o{h α (h log(1/h )) 1/2 }. Furthermore, µ ( ) E[ µ ( )] = O P {h α (h ) 1/2 log(1/h )}. Proof. The first two results are implicit i the proofs of Corollaries 1 ad 2 i Kato ad Sasaki (2016). To prove the last result, we shall apply Lemma A.1 to the class of fuctios F (1). From Lemma A.4-(iii), we have that sup x E[Y 2 K((x 2 W )/h )] = O(h 2α+1 ). I view of the coverig umber boud for F (1) give i Lemma A.6, we may apply Lemma A.1 to F (1) to coclude that (h )E[ µ ( ) E[ µ ( )] ] = E {f(y j, W j ) E[f(Y, W )]} h α h log(1/h ) + h α F (1) E[ max Y 2 1 j j ] log(1/h ). From Lemma A.2, we have E[max 1 j Yj 2] = O(1/2 ), so that we have (h )E[ µ ( ) E[ µ ( )] ] h α h log(1/h ) + h α 1/4 log(1/h ) h α h log(1/h ), where the secod iequality follows from the first coditio i (3.1). This completes the proof.

32 K. KATO AND Y. SASAKI We are ow i positio to prove Theorem 3.1. Proof of Theorem 3.1. We divide the proof ito two steps. Step 1. Let r = h α {h log(1/h )} 1/2. We first prove that ĝ(x) g(x) = 1 1 f X (x) h uiformly i x I. [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) To this ed, we shall show that µ µ = o P (r ). First, observe from Lemma A.5 that if t h 1 ϕ ε (t) if t h 1 ϕ ε (t) O P {m 1/2 log(1/h )} (1 o P (1))h α. Let ψ Y W (t) = E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = ψ X (t)ϕ ε (t), ad let ψ Y W (t) = 1 Y je itw j. Decompose µ(x) µ (x) as µ(x) µ (x) = 1 e itx ψy W (t) ϕ K(th ) dt 1 e itx ψy W (t) ϕ K(th ) dt 2π ϕ ε (t) 2π ϕ ε (t) = 1 e itx ϕ K (th ) ψ Y W (t) ϕ ε (t) 2π {ψ X 0} ψ Y W (t) ϕ ε (t) ψ X(t)dt + 1 e itx ϕ K(th ) ψ Y W (t) ϕ ε(t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) dt 1 e itx ϕ K (th ) ψ Y W (t) 2π {ψ X 0} ψ Y W (t) ψ X(t)dt 1 e itx ϕ K(th ) ψ Y W (t)dt 2π {ψ X =0} ϕ ε (t) = 1 { } {ϕε } ψy e itx W (t) ϕ K (th ) 2π {ψ X 0} ψ Y W (t) 1 (t) ϕ ε (t) 1 ψ X (t)dt + 1 e itx ϕ { } K(th ) ϕε (t) ψ Y W (t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) 1 dt + 1 { } e itx ϕε (t) ϕ K (th ) 2π ϕ ε (t) 1 ψ X (t)dt. Hece the Cauchy-Schwarz iequality yields that µ(x) µ (x) 2 ψ Y W (t) 2 { h 1 } {ψ X 0} [ h 1,h 1 ] ψ Y W (t) 1 ψ X (t) 2 dt ϕ ε (t) h 1 ϕ ε (t) 1 2 dt { } { h 1 } + h 2α ψ Y W (t) 2 dt ϕ ε (t) {ψ X =0} [ h 1,h 1 ] h 1 ϕ ε (t) 1 2 dt h 1 + ϕ ε (t) h 1 ϕ ε (t) 1 2 ψ X (t) dt. (A.4) We shall boud each term o the right had side. Observe that h 1 h 1 ϕ ε (t) ϕ ε (t) 1 2 h 1 dt O P (h 2α ) ϕ ε (t) ϕ ε (t) 2 dt h 1

33 ad the itegral o the right had side is O P {(mh ) 1 } sice h 1 h 1 h 1 E[ ϕ ε (t) ϕ ε (t) 2 ]dt m 1 h 1 dt = 2(mh ) 1. Likewise, usig the fact that ψ X is itegrable, we have that the last term o the right had side of (A.4) is O P (h 2α m 1 ). For ay t with ψ X (t) 0, we have E[ ψ Y W (t)/ψ Y W (t) 1 2 ] E[Y 2 ]/{ ψ Y W (t) 2 }, so that E {ψ X 0} [ h 1,h 1 ] ψ Y W (t) ψ Y W (t) 1 2 ψ X (t) 2 dt h 1 1 h 1 Fially, for ay t with ψ X (t) = 0, we have ψ Y W (t) = 0, so that [ ] E ψ Y W (t) 2 dt (h ) 1. {ψ X =0} [ h 1,h 1 ] 1 dt h 2α ϕ ε (t) 2 (h ) 1. Therefore, we have µ µ 2 = O P(h 4α 2 1 m 1 + h 2α m 1 ) = o P (r). 2 From Step 2 i the proof of Theorem 1 of Kato ad Sasaki (2016), it follows that f X f X = o P (r ), which i particular implies that f X f X I f X f X I + f X f X I = o P (1) so that 1/ f X I = O P (1). Furthermore, µ I E[ µ ( )] I + µ ( ) E[ µ ( )] I ψ X(t) dt + o P (1) = O P (1). Therefore, Now, observe that ĝ ĝ I 1/ f X I µ µ I + µ I 1/ f X 1/ f X I o P (r ) + O P (1) f X f X I = o P (r ). ĝ (x) g(x) = 1 f X (x) 1 h {Y j g(x)}k ((x W j )/h ). Sice A I = O(h β+1 ) = o(h r ), we have ĝ (x) g(x) = 1 1 f X (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) h uiformly i x I. Sice uiformly i x I, ad 1 h [{Y j g(x)}k ((x W j )/h ) A (x)] = µ (x) E[ µ (x)] g(x){ f X(x) E[ f X(x)]} = O P {h α (h ) 1/2 log(1/h )} 1/ f X 1/f X I O P (1) f X f X I = O P {h α (h ) 1/2 log(1/h )},

34 K. KATO AND Y. SASAKI we coclude that ĝ (x) g(x) = 1 1 f X (x) h [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) uiformly i x I. This leads to the desired result of Step 1. Furthermore, the derivatio so far yields that ĝ g I = O P {h α (h ) 1/2 log(1/h )}. Step 2. By Step 1 together with the fact that if x I s (x) h α+1/2, we have Ẑ (x) = f X(x) h (ĝ(x) g(x)) s (x) 1 = s (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P {(log(1/h )) 1/2 } = Z (x) + o P {(log(1/h )) 1/2 } uiformly i x I. ecall the class of fuctios F (2) process idexed by F (2) : ν (f) = 1 defied i (A.2), ad cosider the empirical {f(y j, W j ) E[f(Y, W )]}, f F (2). We apply Theorem 2.1 i Cherozhukov et al. (2016) to approximate ν (2) F = Z I by the supremum of a Gaussia process. To this ed, we shall verify the coditios i Cherozhukov et al. (2016). First, from the coverig umber boud for F (2) give i Lemma A.6 ad fiiteess of the secod momet of F (2) (Y, W ), there exists a tight Gaussia radom variable G i l (F (2) ) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) }. Exted ν liearly to F (2) ( F (2) ) = {f, f : f F (2) }, ad observe that ν (2) F = sup (2) f F ( F (2) ) ν (f). Note that from Theorem 3.7.28 i Gié ad Nickl (2016), G exteds to the liear hull of F (2) i such a way that G has liear sample paths, so that G (2) F = sup (2) f F ( F (2) ) G (f), ad i additio G has uiformly cotiuous paths o the symmetric covex hull of F (2). It is ot difficult to verify that the coverig umber of F (2) ( F (2) ) is at most twice that of F (2). I particular, {G (f) : f F (2) ( F (2) )} is a tight Gaussia radom variable i l (F (2) ( F (2) )) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) ( F (2) )}. Next, observe that E[ Y K ((x W )/h ) 2+l ] h (2+l)α+1 for l = 0, 1, 2 from Lemma A.4-(iii), so that sup f F (2) E[ F (2) (Y, W ) 4 ] h 2 E[ f(y, W ) 2+l ] h l/2 (E[Y 4 ] + g 4 I et al. (2016) to F (2) ( F (2) for l = 0, 1, 2. Furthermore, observe that ) h 2. Therefore, applyig Theorem 2.1 i Cherozhukov ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V havig the same distributio as

35 G F (2) such that { } (log ) 5/4 ν (2) F V = OP 1/4 h 1/2 + log (h ) 1/6 Now, for f,x (y, w) = {y g(x)}k ((x w)/h )/s (x), defie Z G (x) = G (f,x ), x I, = o P {(log(1/h )) 1/2 }. ad observe that Z G is a tight Gaussia radom variable i l (I) with mea zero ad the same covariace fuctio as Z such that Z G I has the same distributio as V. Sice Ẑ I V Ẑ I Z I + Z I V = o P {(log(1/h )) 1/2 }, there exists a sequece 0 such that P{ Ẑ I V > (log(1/h )) 1/2 } (which follows from the fact that the Ky Fa metric metrizes covergece i probability; see Theorem 9.2.2 i Dudley (2002)). Observe that for ay z, P{ Ẑ I z} P{V z + (log(1/h )) 1/2 } + P{ Ẑ I V > (log(1/h )) 1/2 } = P{ Z G I z + (log(1/h )) 1/2 } +. The ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) the yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + 4 (log(1/h )) 1/2 {1 + E[ Z G I ]}. From the coverig umber boud for F (2) give i Lemma A.6, together with the facts that E[F (2) (Y, W ) 2 ] h 1 ad Var(f,x (Y, W )) = 1 for all x I, Dudley s etropy itegral boud (cf. va der Vaart ad Weller, 1996, Corollary 2.2.8) yields that which implies that E[ Z G I ] = E[ G (2) F ] 1 0 1 + log(1/(δ h ))dδ log(1/h ), P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Ẑ I z} P{ Z G I z} o(1) uiformly i z. This completes the proof. A.3. Proof of Theorem 3.2. We first prove the followig techical lemma. Lemma A.8. ŝ 2 ( )/s 2 ( ) 1 I = o P {(log(1/h )) 1 }. Proof. Observe that {Y j ĝ(x)} 2 K2 ((x W j )/h ) = {Y j g(x)} 2 K 2 ((x W j )/h ) + {g(x) ĝ(x)} 2 K 2 ((x W j )/h ) + 2{g(x) ĝ(x)}{y j g(x)}k 2 ((x W j )/h ) + {Y j ĝ(x)} 2 { K 2 ((x W j )/h ) K 2 ((x W j )/h )},

36 K. KATO AND Y. SASAKI so that 1 {Y j ĝ( )} 2 K2 (( W j )/h ) 1 {Y j g( )} 2 K 2 (( W j )/h ) I O P (h 2α ) ĝ g 2 I + 2 ĝ g I 1 {Y j g( )}K 2 (( W j )/h ) + 2 (Y 2 j + ĝ 2 I) K 2 K 2. (A.5) I From Step 1 i the proof of Theorem 3.1, ĝ g I = O P {h α (h ) 1/2 log(1/h )}, so that the first term o the right had side of (A.5) is O P {h 4α (h ) 1 log(1/h )}. Sice K K 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt O P (h 2α ) ϕ ε (t/h ) ϕ ε (t/h ) ϕ K (t) dt we have that = O P (h 2α m 1/2 ), K 2 K 2 K K K + K = O P (h 3α m 1/2 ), which implies that the last term o the right had side o (A.5) is O P (h 3α m 1/2 ). To boud the secod term, observe first that, sice E[Y W = w]f W (w) = ((gf X ) f ε ) (w) is bouded (i absolute value) by gf X, Hece, 1 E[{Y g( )}K(( 2 W )/h )] I h ( gf X + g I f W ) K 2 (w)dw h 2α+1. {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )] }{{} I I + 1 =O(h 2α+1 ) {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )]. I The secod term o the right had side is idetical to 1 {f(y j, W j ) E[f(Y, W )]} F (3).

I view of the coverig umber boud for F (3) give i Lemma A.6, together with Theorem 2.14.1 i va der Vaart ad Weller (1996), the expectatio of the last term is 1/2 E[{F (3) (Y, W )]} 2 ] h 2α 1/2. Therefore, the right had side o (A.5) is { O P h 4α which is o P {h 2α+1 (h ) 1 log(1/h ) + h α (h ) 1/2 log(1/h )(h 2α+1 1 s 2 (x) = (log(1/h )) 1 }. Hece, sice if x I s 2 (x) h 2α+1 1 s 2 (x) {Y j ĝ(x)} 2 K2 ((x W j )/h ) 37 + h 2α 1/2 ) + h 3α m 1/2},, we have {Y j g(x)} 2 K((x 2 W j )/h ) + o P {(log(1/h )) 1 } uiformly i x I. Sice A 2 ( )/s 2 ( ) I = O(h 2α+2β+1 ), it remais to prove that 1 [ 1 s 2 {Y j g( )} 2 K(( 2 W j )/h ) E ( ) s 2 ( ) {Y g( )}2 K(( 2 W ))/h )] I = 1 {f(y j, W j ) E[f(Y, W )]} F (4) is o P {(log(1/h )) 1 }. I view of the coverig umber boud for F (4) give i Lemma A.6, together with Theorem 2.14.1 i va der Vaart ad Weller (1996), the expectatio of the last term is This completes the proof. 1/2 E[{F (4) (Y, W )]} 2 ] h 1 1/2 = o{(log(1/h )) 1 }. Proof of Theorem 3.2. We divide the proof ito several steps. Step 1. Defie Z ξ (x) = 1 s (x) ξ j [{Y j g(x)}k ((x W j )/h ) 1 ] j =1 {Y j g(x)}k ((x W j )/h ) for x I. We first prove that sup P{ Z ξ I z D } P{ Z G I z} P 0. z To this ed, we shall apply Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ). Let ν(f) ξ = 1 ξ j {f(y j, W j ) 1 j =1 f(y j, W (2) j )}, f F.

38 K. KATO AND Y. SASAKI The applyig Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V ξ of which the coditioal distributio give D is idetical to the distributio of G (2) F (= Z G I ), ad such that ν ξ F (2) { V ξ (log ) 9/4 = O P 1/4 h 1/2 + } (log )2 (h ) 1/4 = o P {(log(1/h )) 1/2 }, which shows that there exists a sequece 0 such that { } ν ξ P (2) F V ξ > (log(1/h )) 1/2 P D 0. Sice ν ξ F (2) = Z ξ I, we have P{ Z ξ I z D } P{V ξ z + (log(1/h )) 1/2 D } + o P (1) = P{ Z G I z + (log(1/h )) 1/2 } + o P (1) uiformly i z, ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Z ξ I z D } P{ Z G I z} o P (1) uiformly i z. Step 2. I view of the proof of Step 1, i order to prove the result (3.3), it is eough to prove that Ẑξ Z ξ I = o P {(log(1/h )) 1/2 }. To this ed, defie Z ξ (x) = for x I, ad we first prove that 1 s (x) ξ j {Y j ĝ(x)} K ((x W j )/h ) Z ξ Z ξ I = o P {(log(1/h )) 1/2 }. (A.6) We begi with otig that 1 {Y j g(x)}k ((x W j )/h ) = h { µ (x) E[ µ (x)]} h g(x){ f X(x) E[ f X(x)]} + A (x) = O P {h α+1 (h ) 1/2 log(1/h )} uiformly i x I, so that it suffices to verify that 1 s ( ) ξ j {Y j ĝ( )} K (( W j )/h ) ξ j {Y j g( )} K (( W j )/h ) I

is o P {(log(1/h )) 1/2 }. Sice 1/s I h α 1/2, the last term is h α 1/2 { 1/2 ξ j Y j { K (( W j )/h ) K (( W j )/h )} + ĝ g I ξ j K (( W j )/h ) + g I ξ j { K (( W j )/h ) K ((x W j )/h )} I =: h α 1/2 1/2 {I + II + III }. Step 2 i the proof of Theorem 2 i Kato ad Sasaki (2016) shows that h α 1/2 1/2 III = o P {(log(1/h )) 1/2 }. For the secod term II, observe that II ĝ g I ξ j e itw j/h ϕ K (t) ϕ ε (t/h ) dt 1 O P (h α ) ĝ g I ξ j e ity j/h 1 dt = O P {h 2α 1/2 log(1/h )}, so that h α 1/2 1/2 II = O P {h α 1 1/2 log(1/h )} = o P {(log(1/h )) 1/2 }. For the first term I, observe that I ξ j Y j e itw j/h 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt 1 1 ξ j Y j e itw j/h = O P ( 1/2 h 2α m 1/2 ), 2 1 I 1/2 { 1 dt 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) so that h α 1/2 1/2 I = o P {(log(1/h )) 1 }. Hece we have proved (A.6). Note that the result of Step 1 ad the fact that E[ Z G I ] = O( log(1/h )) imply that 2 dt } 1/2 Z ξ I = O P ( log(1/h )), which i tur implies that Z ξ I = O P ( log(1/h )). Hece which leads to (3.3). Ẑξ Z ξ I s ( )/ŝ ( ) 1 I Z ξ I = o P {(log(1/h )) 1/2 }, Step 3. We shall prove the last two assertios of the theorem. Observe that f X (x) { } h (ĝ(x) g(x)) Ẑ(x) = { ŝ (x) f h (ĝ(x) g(x)) s (x) X (x) f X (x)} + ŝ }{{} (x) ŝ (x) 1 Ẑ (x), =:Ẑ (x) 39 I }

40 K. KATO AND Y. SASAKI ad the right had side is o P {(log(1/h )) 1/2 } uiformly i x I. To see this, sice ĝ g I = O P {h α (h ) 1/2 log(1/h )} ad f X f X I = O P {h α (h ) 1/2 log(1/h )} (which follows from Corollary 1 i Kato ad Sasaki (2016)), the right had side o the above displayed equatio is O P {h α (h ) 1/2 log(1/h )} O P ( log(1/h )) + o P {(log(1/h )) 1 } O P ( log(1/h )) = o P {(log(1/h )) 1/2 } uiformly i x I. Now, Theorem 3.1 ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yield that sup P{ Ẑ I z} P{ Z G I z} 0. z We are to show that P{ Ẑ I ĉ (1 τ)} 1 τ. From the result (3.3), there exists a sequece 0 such that with probability greater tha 1, sup P{ Ẑξ I z D } P{ Z G I z}, z (A.7) ad let E be the evet that (A.7) holds. Takig 0 more slowly if ecessary, we have that sup z P{ Ẑ I z} P{ Z G I z}. ecall that c G (1 τ) is the (1 τ)-quatile of Z G I, ad observe that o the evet E, P{ Ẑξ I c G (1 τ + )} P{ Z G I c G (1 τ + )} = 1 τ, where the last equality holds sice the distributio fuctio of Z G I is cotiuous (which follows from Lemma A.3). Hece o the evet E, it holds that ĉ (1 τ) c G (1 τ + ), so that P{ Ẑ I ĉ (1 τ)} P{ Ẑ I c G (1 τ + )} + P{ Z G I c G (1 τ + )} + 2 = 1 τ + 3. Likewise, we have P{ Ẑ I ĉ (1 τ)} 1 τ 3, which shows that P{ Ẑ I ĉ (1 τ)} 1 τ ad thus (3.4) holds. Fially, the Borell-Sudakov-Tsirelso iequality (va der Vaart ad Weller, 1996, Lemma A.2.2) yields that c G (1 τ + ) E[ Z G I ] + 2 log(1/(τ )) log(1/h ), which implies that ĉ (1 τ) = O P ( log(1/h )). Furthermore, sup x I ŝ (x) ŝ (x) sup s (x) sup x I x I s (x) = O P(h α+1/2 ). Therefore, the supremum width of the bad Ĉ1 τ is 2 sup x I This completes the proof. ŝ (x) h ĉ (1 τ) = O P {h α (h ) 1/2 log(1/h ) }.

A.4. Proof of Theorem 7.1. The proof is completely aalogous to those of Theorems 3.1 ad 3.2, give the facts that g(y, x) = E[1(Y y) X = x] ad the fuctio class {1( y) : y J} is a VC class. Hece we omit the detail for brevity. 41

42 K. KATO AND Y. SASAKI Appedix B. Tables for Sectio 5 egressio: g(x) = x σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( 100 100 100 100 100 100 1 0.800 250 0.770 0.763 0.754 0.813 0.795 0.803 500 0.781 0.792 0.761 0.798 0.812 0.797 1,000 0.784 0.788 0.779 0.800 0.789 0.795 1 0.900 250 0.845 0.836 0.819 0.908 0.889 0.900 500 0.883 0.879 0.851 0.900 0.897 0.895 1,000 0.891 0.880 0.879 0.900 0.897 0.889 1 0.950 250 0.886 0.865 0.852 0.955 0.934 0.948 500 0.926 0.912 0.889 0.950 0.952 0.947 1,000 0.933 0.917 0.925 0.954 0.948 0.945 2 0.800 250 0.794 0.804 0.800 0.812 0.821 0.800 500 0.794 0.783 0.790 0.793 0.782 0.789 1,000 0.810 0.792 0.791 0.761 0.759 0.756 2 0.900 250 0.894 0.896 0.899 0.911 0.916 0.912 500 0.902 0.890 0.893 0.900 0.893 0.893 1,000 0.906 0.887 0.893 0.875 0.874 0.863 2 0.950 250 0.944 0.940 0.944 0.958 0.963 0.957 500 0.950 0.942 0.947 0.953 0.945 0.947 1,000 0.957 0.939 0.947 0.936 0.927 0.922 Table 1. Simulated uiform coverage probabilities of g(x) = x by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

43 egressio: g(x) = x 2 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( 100 100 100 100 100 100 1 0.800 250 0.744 0.748 0.760 0.787 0.786 0.766 500 0.751 0.784 0.771 0.769 0.773 0.753 1,000 0.760 0.776 0.767 0.730 0.746 0.738 1 0.900 250 0.819 0.825 0.830 0.896 0.882 0.880 500 0.848 0.866 0.861 0.878 0.884 0.869 1,000 0.866 0.873 0.879 0.852 0.857 0.856 1 0.950 250 0.856 0.854 0.864 0.943 0.939 0.932 500 0.903 0.907 0.903 0.937 0.935 0.933 1,000 0.919 0.924 0.925 0.916 0.929 0.922 2 0.800 250 0.779 0.760 0.786 0.784 0.775 0.789 500 0.736 0.748 0.745 0.768 0.762 0.784 1,000 0.714 0.693 0.727 0.722 0.707 0.729 2 0.900 250 0.874 0.878 0.888 0.885 0.886 0.889 500 0.865 0.868 0.859 0.885 0.880 0.893 1,000 0.841 0.824 0.846 0.853 0.835 0.856 2 0.950 250 0.927 0.933 0.939 0.945 0.939 0.941 500 0.927 0.931 0.920 0.940 0.939 0.941 1,000 0.908 0.898 0.915 0.924 0.909 0.925 Table 2. Simulated uiform coverage probabilities of g(x) = x 2 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

44 K. KATO AND Y. SASAKI egressio: g(x) = x 3 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( 100 100 100 100 100 100 1 0.800 250 0.818 0.800 0.787 0.841 0.853 0.857 500 0.823 0.832 0.830 0.842 0.832 0.841 1,000 0.835 0.851 0.832 0.815 0.799 0.818 1 0.900 250 0.872 0.847 0.837 0.924 0.929 0.935 500 0.896 0.898 0.888 0.911 0.924 0.928 1,000 0.925 0.920 0.911 0.909 0.898 0.903 1 0.950 250 0.884 0.863 0.857 0.936 0.962 0.966 500 0.921 0.919 0.904 0.956 0.963 0.965 1,000 0.956 0.951 0.946 0.955 0.944 0.953 2 0.800 250 0.823 0.821 0.820 0.815 0.828 0.829 500 0.792 0.804 0.812 0.788 0.796 0.792 1,000 0.772 0.778 0.789 0.752 0.748 0.742 2 0.900 250 0.922 0.911 0.917 0.919 0.925 0.922 500 0.895 0.909 0.908 0.897 0.892 0.892 1,000 0.875 0.885 0.883 0.860 0.850 0.849 2 0.950 250 0.963 0.955 0.957 0.959 0.968 0.964 500 0.949 0.956 0.948 0.953 0.939 0.939 1,000 0.936 0.939 0.931 0.910 0.909 0.910 Table 3. Simulated uiform coverage probabilities of g(x) = x 3 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

45 egressio: g(x) = si(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( 100 100 100 100 100 100 1 0.800 250 0.726 0.726 0.723 0.729 0.732 0.728 500 0.782 0.769 0.768 0.768 0.773 0.772 1,000 0.785 0.797 0.798 0.765 0.763 0.782 1 0.900 250 0.793 0.796 0.787 0.812 0.826 0.823 500 0.861 0.849 0.840 0.857 0.863 0.858 1,000 0.865 0.883 0.878 0.858 0.866 0.877 1 0.950 250 0.829 0.831 0.821 0.864 0.870 0.872 500 0.895 0.883 0.875 0.912 0.910 0.906 1,000 0.904 0.920 0.916 0.911 0.913 0.927 2 0.800 250 0.760 0.769 0.780 0.765 0.765 0.769 500 0.804 0.809 0.799 0.743 0.744 0.748 1,000 0.806 0.789 0.797 0.681 0.687 0.702 2 0.900 250 0.866 0.860 0.868 0.855 0.861 0.864 500 0.894 0.904 0.897 0.861 0.850 0.851 1,000 0.912 0.890 0.897 0.802 0.809 0.811 2 0.950 250 0.912 0.911 0.921 0.906 0.916 0.916 500 0.938 0.946 0.944 0.919 0.913 0.906 1,000 0.959 0.952 0.950 0.882 0.882 0.887 Table 4. Simulated uiform coverage probabilities of g(x) = si(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

46 K. KATO AND Y. SASAKI egressio: g(x) = cos(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( 100 100 100 100 100 100 1 0.800 250 0.739 0.722 0.741 0.736 0.742 0.726 500 0.786 0.791 0.806 0.774 0.760 0.767 1,000 0.802 0.809 0.808 0.770 0.792 0.797 1 0.900 250 0.805 0.790 0.794 0.823 0.835 0.812 500 0.858 0.860 0.869 0.869 0.866 0.863 1,000 0.888 0.898 0.885 0.869 0.879 0.888 1 0.950 250 0.841 0.824 0.819 0.871 0.879 0.862 500 0.894 0.898 0.897 0.914 0.909 0.910 1,000 0.936 0.934 0.925 0.923 0.923 0.933 2 0.800 250 0.805 0.793 0.820 0.755 0.751 0.775 500 0.822 0.828 0.824 0.768 0.781 0.770 1,000 0.828 0.807 0.832 0.696 0.731 0.711 2 0.900 250 0.892 0.888 0.911 0.860 0.854 0.868 500 0.914 0.917 0.911 0.875 0.873 0.873 1,000 0.918 0.907 0.922 0.820 0.839 0.837 2 0.950 250 0.937 0.936 0.953 0.906 0.907 0.913 500 0.958 0.959 0.960 0.927 0.925 0.926 1,000 0.958 0.962 0.960 0.899 0.911 0.913 Table 5. Simulated uiform coverage probabilities of g(x) = cos(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5 Appedix C. Figures for Sectio 6

47 (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 1. Estimates ad cofidece bads for the oparametric regressio of medical expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the medical expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.

48 K. KATO AND Y. SASAKI (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 2. Estimates ad cofidece bads for the oparametric regressio of prescriptio expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the prescriptio expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.