arxiv: v3 [math.st] 24 Feb 2017

Size: px
Start display at page:

Download "arxiv: v3 [math.st] 24 Feb 2017"

Transcription

1 UNIFOM CONFIDENCE BANDS FO NONPAAMETIC EOS-IN-VAIABLES EGESSION KENGO KATO AND YUYA SASAKI arxiv: v3 [math.st] 24 Feb 2017 Abstract. This paper develops a method to costruct uiform cofidece bads for a oparametric regressio fuctio where a predictor variable is subject to a measuremet error. We allow for the distributio of the measuremet error to be ukow, but assume that there is a idepedet sample from the measuremet error distributio. The sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. The availability of a sample from the measuremet error distributio is satisfied if, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed, are available. The proposed cofidece bad builds o the decovolutio kerel estimatio ad a ovel applicatio of the multiplier (or wild) bootstrap method. We establish asymptotic validity of the proposed cofidece bad uder ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric errors-i-variables regressio. We also propose a ovel data-drive method to choose a badwidth, ad coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. Applyig our method to a combiatio of two empirical data sets, we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. 1. Itroductio Cosider the oparametric errors-i-variables (EIV) regressio model classical measuremet error Y = g(x) + U, E[U X, ε] = 0, (1.1) W = X + ε, where each of Y, X, U, W, ad ε is a uivariate radom variable, ad ε is idepedet from X. We observe (Y, W ), but observe either X or ε. Furthermore, we assume that the distributio of ε is ukow. The variable X is a latet predictor variable, while ε is a measuremet error. Of iterest are estimatio of ad iferece o the regressio fuctio g(x) = E[Y X = x]. I Date: First arxiv versio: February 11, This versio: February 27, K. Kato is supported by Grat-i-Aid for Scietific esearch (C) (15K03392) from the JSPS. We would like to thak Tatsushi Oka ad Holger Dette for useful commets ad discussios. 1

2 2 K. KATO AND Y. SASAKI particular, we are iterested i costructig uiform cofidece bads for g. Cofidece bads provide a simple graphical descriptio of the extet to which a oparametric estimator varies at desig poits, thereby quatifyig ucertaities of the oparametric estimator. However, costructio of cofidece bads teds to be challegig, especially for complex oparametric models. 1 Ideed, despite the rich literature o cosistet estimatio of oparametric EIV regressio, the literature o poitwise or uiform cofidece bads for oparametric EIV regressio is limited see below for a literature review likely because of its complexity. Eve poitwise iferece o g uder the assumptio that the measuremet error distributio is kow is cosidered by experts to be difficult. 2 This is because, as discussed i Delaigle et al. (2015): 1) the asymptotic variace of the decovolutio kerel estimator of g is o-trivial to estimate ad so iferece based o limitig distributios is difficult to implemet; ad 2) it is ot straightforward to devise a way to implemet bootstrap for iferece o g due to the uobservability of X i data. With all these challeges recogized i the literature, the preset paper attempts to solve a eve more challegig problem of costructig uiform cofidece bads for the regressio fuctio g without assumig that the measuremet error distributio is kow. To deal with ukow measuremet error distributio, we assume that, i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} from the distributio of (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio where m = m as. (The auxiliary sample {η 1,..., η m } eed ot be idepedet from {(Y 1, W 1 ),..., (Y, W )}.) For example, i atural sciece, measuremet errors are ofte due to measurig devices; i such cases, oe ca obtai prelimiary calibratio measures i the absece of sigal, which produce a sample from the measuremet error distributio; see the itroductio of Comte ad Lacour (2011), for example. Other real data scearios of such additioal data availability that are plausible i ecoomics, social scieces, ad biomedical scieces iclude: the case where validatio data is available for data combiatio; ad the case where repeated measuremets (pael data) o X with errors oe of which is symmetrically distributed are available. These patters of data requiremets are ofte cosidered i the existig literature with measuremet errors that we review below. Uder this setup, we develop a method to costruct cofidece bads for the regressio fuctio g. Our method builds o the decovolutio kerel estimatio (Fa ad Truog, 1993), ad a 1 We refer to Wasserma (2006) ad Gié ad Nickl (2016) as geeral refereces o cofidece bads i oparametric statistical models. 2 Delaigle et al. (2015), who study poitwise cofidece bads for oparametric EIV regressio uder the assumptio that the measuremet error distributio is kow, state that despite their practical importace, to our kowledge cofidece bads i oparametric EIV regressio have largely bee igored so far. We show that the problem is particularly complex, much more so tha i the stadard error-free settig. (Delaigle et al., 2015, p.149)

3 ovel applicatio of the multiplier (or wild) bootstrap method. Our costructio of the multiplier process differs from the stadard approach i the error-free case (cf. Neuma ad Polzehl, 1998), ad is tailored to EIV regressio; see emark 2.1 ahead. Buildig o o-trivial applicatios of the probabilistic techiques developed i Cherozhukov et al. (2014a,b, 2016), we establish asymptotic validity of the proposed cofidece bad, i.e., the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), Delaigle et al. (2015) that study iferece i decovolutio ad EIV regressio, we focus for a techical reaso o the case where the measuremet error desity is ordiary smooth, i.e., the characteristic fuctio of the measuremet error distributio decays at most polyomially fast i the tail (cf. Fa, 1991a; Fa ad Truog, 1993). I additio to these cotributios, we also propose a ovel data-drive method to choose a badwidth. I the theoretical study, we require to take the badwidth i such a way that it udersmoothes the decovolutio kerel estimate, so that the bias is egligible relative to the variace part. Existig data-drive methods for badwidth selectio typically aim at choosig a badwidth miimizig the MISE, thereby yieldig a o-udersmoothig badwidth (cf. Delaigle ad Hall, 2008). We propose a alterative method for badwidth selectio that aims at yieldig a udersmoothig badwidth. We coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. The simulatio studies show that the proposed cofidece bad, combied with the proposed badwidth selectio rule, works well. Applyig our method to a combiatio of the two data sets, the Natioal Health ad Nutritio Examiatio Survey (NHANES) ad the Pael Survey of Icome Dyamics (PSID), we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. I order to locate the preset paper i the cotext of the relevat literature, it is useful to first review measuremet error models ad decovolutio. We refer to books by Fuller (1987), Carroll et al. (2006), Meister (2009) ad Horowitz (2009, Chapter 5) ad surveys by Che et al. (2011) ad Scheach (2016) for geeral refereces. The geesis of this literature features the decovolutio kerel desity estimatio with kow error distributios (Carroll ad Hall, 1988; Stefaski ad Carroll, 1990; Fa, 1991a,b), followed by that with ukow error distributios (Diggle ad Hall, 1993; Horowitz ad Markatou, 1996; Neuma, 1997; Efromovich, 1997; Li ad Vuog, 1998; Delaigle et al., 2008; Johaes, 2009; Comte ad Lacour, 2011). Diggle ad Hall (1993); Neuma (1997); Efromovich (1997); Johaes (2009); Comte ad Lacour (2011) assume the availability of a sample from the measuremet error distributio, while Horowitz ad Markatou (1996); Delaigle et al. (2008) assume repeated measuremets (pael data) with symmetrically ad 3

4 4 K. KATO AND Y. SASAKI idetically distributed errors. For repeated measuremets (pael data) without symmetry of error distributios, Li ad Vuog (1998) propose a alterative desity estimator based o Kotlarski s lemma (cf. Kotlarski, 1967; ao, 1992) that does ot require kow error distributio; see also Bohomme ad obi (2010) ad Comte ad Kappus (2015) for further developmets. Methods to costruct cofidece bads i decovolutio are developed by Bissatz et al. (2007); Bissatz ad Holzma (2008); va Es ad Gugushvili (2008); Louici ad Nickl (2011); Schmidt-Hieber et al. (2013) for the case of kow error distributio, ad more recetly by Kato ad Sasaki (2016) for the case of ukow error distributio. Similarly to the desity estimatio, the literature o oparametric EIV regressio estimatio ofte takes the decovolutio kerel approach. Fa ad Truog (1993) propose to substitute the decovolutio kerel i the Nadaraya-Watso estimator also see Fa ad Masry (1992) for poitwise asymptotic ormality, Delaigle ad Meister (2007) for extesios to heteroscedastic measuremet errors, Delaigle et al. (2009) for local polyomial extesios, ad Delaigle et al. (2015) for poitwise iferece. These papers focus o the case of kow error distributio. Delaigle et al. (2008) estimate the error characteristic fuctio usig repeated measuremets o X with symmetrically ad idetically distributed errors, ad substitute the estimated error characteristic fuctio ito the decovolutio kerel. Scheach (2004) also works with cases with repeated measuremets but without assumig symmetry of error distributios, ad proposes a alterative approach to estimate the regressio fuctio based o Kotlarski s lemma. See also Carroll et al. (1999); Scheach et al. (2012); Scheach ad Hu (2013); Hu ad Sasaki (2015). Our method of iferece is based o the decovolutio kerel estimatio. We maily focus o (i) the case where a sample draw from the error distributio is available; (ii) the case where validatio data is available for data combiatio; ad (iii) the case where repeated measuremets with errors oe of which is symmetrically distributed are available. For (ii) data combiatio with validatio data, our model shares similarities albeit differet assumptios to that of the oparametric istrumetal variables (NPIV) regressio, for which Horowitz ad Lee (2012), Che ad Christese (2015) ad Babii (2016) develop methods to costruct cofidece bads as we do for oparametric EIV regressio. We ote the followig two refereces as particularly relevat bechmarks for idetifyig our cotributios. Oe referece is Scheach (2004) that derives poitwise asymptotic ormality for the oparametric EIV regressio estimator differet from ours, uder ukow error distributio. To this existig result, our cotributios are four-fold. First, we provide a method of uiform iferece as opposed to a poitwise oe. Secod, we propose a method of badwidth selectio for valid iferece. Third, while the existig result left aside the issue of variace estimatio ad thus are ot readily applicable i practice, we provide a bootstrap method for ease of practical implemetatio. Fourth, we devise lower-level assumptios which are easier to verify with cocrete examples of distributio ad coditioal momet fuctios. The other referece

5 5 is Delaigle et al. (2015) that suggests a method of poitwise iferece via bootstrap for oparametric EIV regressio with kow error distributio. To this existig result, our cotributios are three-fold. First, our method allows for ukow error distributio. Secod, we provide a method of uiform iferece as opposed to a poitwise oe. Third, we provide formal theories to support the asymptotic validity of our bootstrap method. Delaigle et al. (2015) metio how to modify their methodology to the case where the measuremet error distributio is ukow, ad to costructio of uiform cofidece bads. However, their theoretical results do ot formally cover those cases. Fially, Birke et al. (2010) ad Proksh et al. (2015) obtai cofidece bads for iverse regressio with fixed equidistat desigs (the fixed equidistat desig assumptio is substatial i their setups ad aalyses); the iverse regressio is related to but differet from our EIV regressio (1.1), ad our setup does ot allow fixed equidistat desigs because of measuremet errors. The methodologies ad the proof strategies are also differet; for example both of those papers rely o Gumbel approximatios for validity of the cofidece bads, which we do ot. Importatly, to the best of our kowledge, oe of the existig results covers uiform cofidece bads for EIV regressio (1.1), eve uder the simpler settig that the measuremet error distributio is kow. The preset paper fills this importat void. The rest of the paper is orgaized as follows. I Sectio 2, we iformally preset our methodology to costruct uiform cofidece bads for g. I Sectio 3, we preset asymptotic validity of the proposed cofidece bad uder suitable regularity coditios. I Sectio 4, we propose a practical method to choose the badwidth. I Sectio 5, we coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. I Sectio 6, we apply the proposed method to a combiatio of two empirical data sets. I Sectio 7, we discuss extesios of our results to specificatio testig of the coditioal mea fuctio, cases with additioal regressors without measuremet errors, ad costructio of cofidece bads for the coditioal distributio fuctio. Sectio 8 cocludes. All the proofs are deferred to Appedix Notatios. For a o-empty set T ad a (complex-valued) fuctio f o T, we use the otatio f T = sup t T f(t). Let l (T ) deote the Baach space of all bouded real-valued fuctios o T with orm T. The Fourier trasform of a itegrable fuctio f o is defied by ϕ f (t) = e itx f(x)dx, t, where i = 1 deotes the imagiary uit throughout the paper. We refer to Follad (1999) as a basic referece o Fourier aalysis. For ay positive sequeces a ad b, we write a b if a /b is bouded ad bouded away from zero. For ay a, b, let a b = mi{a, b} ad a b = max{a, b}. For a, b > 0, we use the shorthad otatio [a ± b] = [a b, a + b]. Let = d deote the equality i distributio.

6 6 K. KATO AND Y. SASAKI 2. Methodology I this sectio, we iformally preset our methodology to costruct cofidece bads for g. The formal aalysis of our cofidece bads will be carried out i the ext sectio. We will also discuss some examples of situatios where a auxiliary sample from the measuremet error distributio is available Decovolutio kerel estimatio. We first itroduce a decovolutio kerel method to estimate f X ad g uder the assumptio that the distributio of ε is kow. Let {(Y 1, W 1 ),..., (Y, W )} be a idepedet sample from the distributio of (Y, W ). I this paper, we assume that the desities of X ad ε exist ad are deoted by f X ad f ε, respectively. Let ϕ W, ϕ X, ad ϕ ε deote the characteristic fuctios of W, X, ad ε, respectively. By the idepedece betwee X ad ε, the desity of W exists ad is give by the covolutio of the desities of X ad ε, amely, f W (w) = (f X f ε )(w) = f X (w x)f ε (x)dx, w, where deotes the covolutio. This i tur implies that the characteristic fuctio of W is idetical to the product of those of X ad ε, amely, ϕ W (t) = ϕ X (t)ϕ ε (t), t. Provided that ϕ ε is o-vaishig o ad ϕ X is itegrable o with respect to the Lebesgue measure (we hereafter omit with respect to the Lebesgue measure ), the Fourier iversio formula yields that f X (x) = 1 2π e itx ϕ X (t)dt = 1 2π e itx ϕ W (t) dt, x. (2.1) ϕ ε (t) The expressio (2.1) leads to a method to estimate f X. However, simply replacig ϕ W by the empirical characteristic fuctio of W, amely, ϕ W (t) = 1 e itw j, t does ot work. Specifically, the fuctio t e itx ϕ W (t)/ϕ ε (t) is ot itegrable o because ϕ ε (t) 0 as t by the iema-lebesgue lemma while ϕ W is the characteristic fuctio of the discrete distributio (i.e., the empirical distributio) ad lim sup t ϕ ε (t) = 1 (cf. Sato, 1999, Propositio 27.28). A stadard approach to dealig with this problem is to use a kerel fuctio to restrict the itegral regio i (2.1) to a compact iterval. Let K : be a kerel fuctio such that K is itegrable o, K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1] (i.e., ϕ K (t) = 0 for all t > 1). Whe f ε is kow, the decovolutio kerel

7 desity estimator of f X is give by f X(x) = 1 2π e itx ϕ W (t) ϕ K(th ) dt. ϕ ε (t) This estimator was first cosidered by Carroll ad Hall (1988) ad Stefaski ad Carroll (1990). ates of covergece ad poitwise asymptotic ormality of f X are studied i Fa (1991a,b), amog others. Alteratively, by a chage of variables, we may rewrite f X as f X(x) = 1 K ((x W j )/h ), (2.2) h where the fuctio K, called the decovolutio kerel, is defied by K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that K is real-valued sice K (x) = 1 2π e itx ϕ K(t) ϕ ε (t/h ) dt = 1 2π e itx ϕ K( t) ϕ ε ( t/h ) dt = K (x), where z deotes the complex cojugate of a complex umber z. The secod expressio (2.2) resembles a stadard kerel desity estimator without measuremet errors. Aalogously, Fa ad Truog (1993) propose to estimate the regressio fuctio g(x) by ĝ (x) = µ (x)/ f X (x), where µ (x) = 1 2π e itx ( 1 Y je itw j ) ϕk (th ) dt = 1 ϕ ε (t) h Y j K ((x W j )/h ). To uderstad the ratioal behid this estimator, observe that E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = E[g(X)e it(x+ε) ] = E[g(X)e itx ]ϕ ε (t), ad E[g(X)e itx ] is the Fourier trasform of gf X, i.e., E[g(X)e itx ] = ϕ gfx (t). Hece ϕ gfx (t) = E[Y e itw ]/ϕ ε (t), ad provided that ϕ gfx o, the Fourier iversio formula yields that g(x)f X (x) = 1 2π 7 is itegrable e itx E[Y eitw ] dt. (2.3) ϕ ε (t) It is worth poitig out that estimatio of f X ad gf X correspods to solvig certai Fredholm itegral equatios of the first kid, ad therefore estimatio of f X ad gf X (or g) is a statistical illposed iverse problem. I fact, f X ad gf X satisfy f X f ε = f W ad (gf X ) f ε = E[Y W = ]f W ; these are Fredholm itegral equatios of the first kid where the right had side fuctios are directly estimable. 3 ates of covergece ad poitwise asymptotic ormality of ĝ are studied by Fa ad Truog (1993); Fa ad Masry (1992), amog others. The discussio so far has presumed that the distributio of ε is kow. However, i may applicatios, the distributio of ε is ukow, ad hece the estimators f X ad ĝ are ifeasible. 3 See, for example, Che (2007), Carrasco et al. (2007), Cavalier (2008), ad Horowitz (2009) for overview of statistical ill-posed iverse problems.

8 8 K. KATO AND Y. SASAKI I the preset paper, we assume that there is a idepedet sample {η 1,..., η m } from the distributio of ε: where m = m as. η 1,..., η m f ε i.i.d., We do ot assume that η 1,..., η m are idepedet from {(Y 1, W 1 ),..., (Y, W )}. I Sectio 2.3, we will discuss examples where such observatios from the measuremet error distributio are available. Give {η 1,..., η m }, we may estimate ϕ ε by the empirical characteristic fuctio, amely, ϕ ε (t) = 1 m m e itη j, ad estimate the decovolutio kerel K by the plug-i method: K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that uder the regularity coditios stated below, if t h 1 ϕ ε(t) > 0 with probability approachig oe, so that K is well-defied with probability approachig oe. Note also that K is real-valued. Now, we estimate g(x) by ĝ(x) = µ(x)/ f X (x), where µ(x) = 1 h Y j K ((x W j )/h ) ad fx (x) = 1 h K ((x W j )/h ). Desity estimators of the form f X are studied i Diggle ad Hall (1993), Neuma (1997), ad Efromovich (1997), amog others, ad oparametric regressio estimators of the form ĝ are studied i Delaigle et al. (2008), amog others Costructio of cofidece bads. We ow describe our method to costruct cofidece bads for g based o the estimator ĝ. Uder the regularity coditios stated below, we will show that ĝ(x) g(x) ca be approximated by 1 [{Y j g(x)}k ((x W j )/h ) A (x)] f X (x)h uiformly i x I, where I is a compact iterval i o which f X is bouded away from zero, ad A (x) = E[{Y g(x)}k ((x W )/h )]. Let ad cosider the process Z (x) = s 2 (x) = Var ({Y g(x)}k ((x W )/h )), 1 s (x) [{Y j g(x)}k ((x W j )/h )) A (x)], x I, where s (x) = s 2 (x). Note that uder the regularity coditios stated below, if x I s (x) > 0 for sufficietly large, so that Z is well-defied. Furthermore, we will show that there exists a

9 tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, P{ Z I z} P{ Z G I z} 0. sup z ecall that Z I = sup x I Z(x). This i tur yields that { } sup P Ẑ I z P { Z G I z } 0, z where {Ẑ(x) : x I} is a process defied by Therefore, if we deote by Ẑ (x) = f X(x) h (ĝ(x) g(x)), x I. (2.4) s (x) c G (1 τ) = (1 τ)-quatile of Z G I for τ (0, 1), the a bad of the form [ ] Ĉ1 τ s (x) (x) = ĝ(x) ± f X (x) c G (1 τ), x I h will cotai g(x), x I with probability at least 1 τ + o(1) as. I fact, it holds that { } { } P g(x) Ĉ 1 τ (x) x I = P Ẑ I c G (1 τ) = P { Z G I c G (1 τ) } + o(1) 1 τ + o(1). I practice, f X (x), s 2 (x), ad c G (1 τ) are all ukow, ad we have to estimate them. We estimate f X (x) ad s 2 (x) by f X (x) ad ŝ 2 (x) = 1 {Y j ĝ(x)} 2 K2 ((x W j )/h ), respectively. Note that (E[A (x)]) 2 is egligible relative to s 2 (x) so that we have igored (E[A (x)]) 2 i estimatio of s 2 (x). Note also that (Y j ĝ(x)) K ((x W j )/h ) = 0. Next, we estimate the quatile c G (1 τ) by the Gaussia multiplier bootstrap. Geerate ξ 1,..., ξ N(0, 1) i.i.d., idepedetly of the data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m }, ad cosider the multiplier process Ẑ ξ (x) = 1 ŝ (x) ξ j {Y j ĝ(x)} K ((x W j )/h ), (2.5) where ŝ (x) = ŝ 2 (x). Note that uder the regularity coditios stated below, if x I ŝ (x) > 0 with probability approachig oe. Coditioally o the data D, Ẑξ is a Gaussia process with mea zero ad covariace fuctio (presumably) close to that of Z. Ideed, for f,x (y, w) = 9

10 10 K. KATO AND Y. SASAKI {y g(x)}k ((x w)/h )/s (x) ad f,x (y, w) = {y ĝ(x)} K ((x w)/h )/ŝ (x), the covariace fuctio of Ẑξ coditioally o D is E[Ẑξ (x)ẑξ (x ) D ] = 1 f,x (Y j, W j ) f,x (Y j, W j ) for x, x I, which estimates the covariace fuctio of Z G give by E[Z G (x)z G (x )] = E[f,x (Y, W )f,x (Y, W )] E[f,x (Y, W )]E[f,x (Y, W )] for x, x I. Hece, we estimate c G (1 τ) by ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ I give D, which ca be computed via simulatios. Now, the resultig cofidece bad is defied by [ ] ŝ (x) Ĉ 1 τ (x) = ĝ(x) ± f X (x) ĉ (1 τ), x I. (2.6) h Note that, except for the choice of the badwidth, this cofidece bad is completely data-drive. We will discuss practical choice of the badwidth i Sectio 4. emark 2.1 (Novelty of our costructio of the multiplier process). I the error-free case, amely whe we ca observe (Y 1, X 1 ),..., (Y, X ), the deviatio of a stadard kerel regressio estimator ǧ with kerel K from the true regressio fuctio g is uiformly approximated as {f X (x)h } 1 U jk((x X j )/h ) uder suitable regularity coditios. So, to costruct cofidece bads for g via the multiplier bootstrap method, oe would costruct a multiplier stochastic process of the form 1 x σ (x) ξ j U j K((x X j )/h ) with σ (x) = Var(UK((x X)/h )), (2.7) ad the compute the coditioal (1 τ)-quatile of the supremum i absolute value of the multiplier process. I practice, we replace U j ad σ (x) by suitable estimators; for example, a atural estimator of U j would be Ûj = Y j ǧ(x j ). See, for example, Neuma ad Polzehl (1998); see also Sectio 4.3 i Cherozhukov et al. (2013) for applicatios of the multiplier bootstrap method to a differet but related problem of iferece i itersectio boud models usig kerel methods. I the measuremet error case, give a cosmetic similarity betwee the decovolutio kerel estimatio ad the error-free kerel estimatio, oe might be tempted to modify the multiplier process (2.7) by just replacig σ (x) ad K((x X j )/h ) with Var(UK ((x W )/h )) ad K ((x W j )/h ), respectively, but this will ot result i a valid cofidece bad eve if U 1,..., U were assumed to be kow. The reaso is that, i cotrast to the error-free case, approximatio to ĝ(x) g(x) by {f X (x)h } 1 U jk ((x W j )/h )) is icorrect, which highlights oe

11 distictive feature of oparametric EIV regressio. Hece, i the preset paper, we develop a ovel costructio of the multiplier process (2.5) tailored to oparametric EIV regressio Examples. I this sectio, we preset a couple of examples where a auxiliary sample from the measuremet error distributio are available. Example 2.1 (epeated measuremets or pael data, Carroll et al. (2006), p.298). Suppose that we observe repeated measuremets or pael data o X with measuremet errors: W (1) = X + ε (1), W (2) = X + ε (2) where X ad (ε (1), ε (2) ) are idepedet, ad the coditioal distributio of ε (2) give ε (1) is symmetric. The distributio of ε (1) eed ot be symmetric (i particular, the distributios of ε (1) ad ε (2) may be differet), ad idepedece betwee ε (1) ad ε (2) is ot ecessary. If we defie W = (X (1) + X (2) )/2, ε = (ε (1) + ε (2) )/2, ad η = (W (1) W (2) )/2 = (ε (1) ε (2) )/2, the we have that W = X + ε, ε = d η, where η is observable. For this pael data setup, Scheach (2004) proposes a alterative estimator of g based o Kotlarski s lemma which does ot require the symmetry assumptio. The form of Scheach s estimator is more complex tha ours, ad to the best of our kowledge, there is o existig result o asymptotically valid uiform cofidece bads for Scheach s estimator. It is worth otig that while Scheach s approach ca drop the symmetry assumptio, it requires aother techical assumptio that the characteristic fuctio ϕ X (t) = E[e itx ] of X does ot vaish o the etire real lie. Both Scheach (2004) ad we (ad i fact most of papers o decovolutio ad EIV regressio) assume that the characteristic fuctios of the error variables do ot vaish o, but our approach does allow ϕ X to take zeros. The assumptio that ϕ X does ot vaish o is ot iocuous; it is o-trivial to fid desities that are compactly supported ad have o-vaishig characteristic fuctios (though these properties are ot mutually exclusive; see, e.g., Scheach (2016), Footote 4), ad the assumptio excludes desities covolved with distributios whose characteristic fuctios take zeros, ad so o. 4 So, we believe that Scheach s approach ad ours are complemetary to each other. Example 2.2 (Data combiatio 5 ). Suppose that we have access to data o (Y, W ) ad (W, X), separately, but do ot have access to data o (Y, X). This case is ofte faced by empirical 4 For example, covolutios of k uiform desities o [a, b] are piecewise polyomials with degrees k 1, ad covex combiatios of such piecewise polyomials form a rich family of desities, but their characteristic fuctios take zeros. 5 We thak Tatsushi Oka for poitig out this example. 11

12 12 K. KATO AND Y. SASAKI researchers, ad various techiques are proposed to combie the two separate samples see a survey by idder ad Moffitt (2007). To fix ideas, cosider the demad model Y = g(x) + U, where Y deotes the quatity purchased of a product ad X deotes the logarithm of its price. Marketig scietists ad ecoomists ofte use Nielse Homesca data for quatities ad prices to aalyze this demad model, but the home-scaed prices i this data are subject to imputatio errors ε = W X. To overcome this issue, Eiav et al. (2010) collect data o (W, X) from a large grocery retailer by matchig trasactio prices X that were recorded by the retailer (at the store) to the prices W recorded by the Homesca paelists. Together with Nielse Homesca data o (Y, W ), Eiav et al. suggest to combie the two separate data sets to aalyze the demad model. Specifically, we ca costruct a sample {Y 1,..., Y, W 1,..., W, η 1,..., η m } from the two separate data o (Y, W ) ad (W, X). I the literature, validatio data are used as a way to relax the classical measuremet error assumptio that X ad ε are idepedet; see, for example, Che et al. (2005). While they allow for o-classical measuremet errors, Che et al. (2005) focus o the case where the parameter of iterest is fiite dimesioal. It is worth otig that, whe validatio data o (X, W ) are available, the problem of estimatio of g ca be cosidered as a oparametric istrumetal variable (NPIV) problem treatig X as a edogeous variable ad W as a istrumetal variable (see, for example, Newey ad Powell, 2003; Hall ad Horowitz, 2005; Bludell et al., 2007; Che ad eiss, 2011; Horowitz, 2011, for NPIV models). I fact, observe that E[Y W ] = E[g(X) W ]. For NPIV models, Horowitz ad Lee (2012) ad the more recet paper by Che ad Christese (2015) develop methods to costruct cofidece bads for the structural fuctio usig series methods, although these papers do ot formally cosider cases where samples o (Y, W ) ad (X, W ) are differet. 6 However, we would like to poit out that there are differece i uderlyig assumptios betwee series estimatio of NPIV models ad decovolutio kerel estimatio i EIV regressio. For example, i series estimatio of NPIV models, it is ofte assumed that the distributio of W is compactly supported ad the desity of W is bouded away from zero o its support (cf. Bludell et al., 2007; Che ad Christese, 2015). O the other had, i EIV regressio, it is commoly assumed that the characteristic fuctio of the measuremet error ε is o-vaishig o (which leads to idetificatio of the fuctio g via (2.3)), ad i may cases the measuremet error ε the has ubouded support, which i tur implies that W has ubouded support. Further, while both NPIV ad EIV regressios are statistical ill-posed iverse problems, the ways i which the ill-posedess is defied are differet; i series estimatio of NPIV models, the ill-posedess is defied for give basis fuctios, while i EIV regressio, the ill-posedess is defied via how 6 Babii (2016) also develop methods to costruct cofidece bads for Tikhoov regularized estimators i NPIV models, but his cofidece bads are asymptotically coservative i the sese that the coverage probabilities are i geeral strictly larger tha the omial level eve asymptotically.

13 fast the characteristic fuctio of the measuremet error distributio decays. Hece we believe that our iferece results cover differet situatios tha those developed i the NPIV literature. 3. Mai results I this sectio, we study asymptotic validity of the proposed cofidece bad (2.6). To this ed, we make the followig assumptio. For ay give costats β, B > 0, let Σ(β, B) deote a class of fuctios defied by { Σ(β, B) = f : : f is k-times differetiable, } f (k) (x) f (k) (y) B x y β k, x, y, where k is the iteger such that k < β k + 1, ad f (k) deotes the k-derivative of f (f (0) = f). Let I be a compact iterval i. Assumptio 3.1. We assume the followig coditios. (i) E[Y 4 ] <, the fuctio w E[Y 2 W = w]f W (w) is bouded ad cotiuous, ad for each l = 1, 2, the fuctio w E[ Y 2+l W = w]f W (w) is bouded. (ii) The fuctios ϕ X (t) = E[e itx ] ad ψ X (t) = E[g(X)e itx ] for t are itegrable o. (iii) The measuremet error ε has fiite mea, E[ ε ] <, ad its characteristic fuctio, ϕ ε (t) = E[e itε ], t, does ot vaish o. Furthermore, there exist costats C 1 > 1 ad α > 0 such that C 1 1 t α ϕ ε (t) C 1 t α, ϕ ε(t) C 1 t α 1, t 1. (iv) The fuctios f X ad gf X belog to Σ(β, B) for some β > 1/2 ad B > 0. Let k deote the iteger such that k < β k + 1. (v) Let K be a real-valued itegrable fuctio (kerel) o, ot ecessarily o-egative, such that K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1]. Furthermore, ϕ K is (k + 3)-times cotiuously differetiable with ϕ (l) K (0) = 0 for l = 1,..., k. (vi) For all x I, f X (x) > 0 ad E[{Y g(x)} 2 W = x]f W (x) > 0. (log(1/h )) 2 ( m)h 2α+2 0, h log(1/h ) m 13 0, ad h α+β h log(1/h ) 0. (3.1) Coditio (i) is a momet coditio o Y, which we believe is ot restrictive. Note that, for each l = 0, 1, 2, if E[ Y 2+l X, ε] = E[ Y 2+l X], the by comparig the Fourier trasforms of both sides, we arrive at the idetity E[ Y 2+l W = w]f W (w) = ((Υ l f X ) f ε ) (w), where Υ l (x) = E[ Y 2+l X = x], ad the right had side is bouded ad cotiuous if Υ l f X is bouded (which allows Υ l to be ubouded globally). For Coditio (ii), we first ote that ψ X is the Fourier trasform of gf X (which is itegrable by E[ Y ] < ). Coditio (ii) implies that f X

14 14 K. KATO AND Y. SASAKI ad gf X are (cotiuous ad) bouded, which i tur implies that f W (w) = f X(w x)f ε (x)dx is bouded ad cotiuous. Coditio (ii) is satisfied if, for example, f X ad gf X are twice cotiuously differetiable with itegrable derivatives up to the secod order; i fact, uder such coditios, ϕ X (t) = o( t 2 ) ad ψ X (t) = o( t 2 ) as t. However, differetiability of f X ad gf X is ot strictly ecessary for Coditio (ii) to hold; for example, a Laplace desity is ot differetiable but its Fourier trasform is itegrable. Coditio (iii) is cocered with the characteristic fuctio of the measuremet error. Note that fiiteess of the first momet of ε esures that ϕ ε is cotiuously differetiable. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015), we assume that the measuremet error desity is ordiary smooth, amely, ϕ ε (t) decays at most polyomially fast as t (cf. Fa, 1991a). Iformally, the smoother f ε is, the faster ϕ ε (t) decays as t, so Coditio (iii) restricts smoothess of f ε. Laplace ad Gamma distributios, together with their covolutios, (suitable) mixtures, ad symmetrizatios 7, are typical examples of distributios satisfyig Coditio (iii), but ormal ad Cauchy distributios do ot satisfy Coditio (iii). Normal ad Cauchy desities are examples of super-smooth desities, i.e., their characteristic fuctios decay expoetially fast as t. 8 Coditio (iv) is cocered with smoothess of the fuctios f X ad g. Coditio (v) is about a kerel fuctio. By chages of variables, Coditio (iv) esures that x k+1 K(x) dx < ad xl K(x)dx = i l ϕ (l) K (0) = 0 for l = 1,..., k, that is, K is a (k + 1)-th order kerel (but we allow for the possibility that xk+1 K(x)dx = 0). 9 Coditio (vi) esures that if x I f X (x) > 0 (sice f X is cotiuous) ad if x I E[{Y g(x)} 2 W = x]f W (x) > 0 (see the proof of Lemma A.4-(ii)). Note that sice gf X is bouded, we have that g I gf X I / if x I f X(x) <. It is worth metioig that uder these coditios, we have that s 2 (x) = Var({Y g(x)}k ((x W )/h )) h 2α+1 uiformly i x I (see Lemma A.4), ad the right had side is larger by factor h 2α tha the correspodig term i the error-free case (recall that i stadard kerel regressio without measuremet errors, the variace of UK((x X)/h ) is h ). This results i slower rates of covergece of kerel regressio estimators i presece of measuremet errors tha those i the error-free case, ad the value of α is a key parameter that cotrols the difficulty of estimatig g, 7 ecall that if a radom variable η has characteristic fuctio ϕη, the η η for a idepedet copy η of η has characteristic fuctio ϕ η 2. 8 Covolutios of ordiary smooth ad super-smooth desities are super-smooth, but mixtures of ordiary smooth ad super-smooth desities are ordiary smooth. 9 I the simulatio studies, we will use a flap top kerel (McMurry ad Politis, 2004), which is a ifiite order kerel.

15 amely, the larger the value of α is, the more difficult estimatio of g will be. I other words, the value of α quatifies the degree of ill-posedess of estimatio of g. Coditio (vii) restricts the badwidth h ad the sample size m from the measuremet error distributio. The secod coditio i (3.1) allows m to be of smaller order tha, which i particular covers the pael data setup discussed i Example 2.1. The last coditio i (3.1) meas that we are choosig udersmoothig badwidths, that is, choosig badwidths that are of smaller order tha optimal rates for estimatio of g. Ispectio of the proof of Theorem 3.1 shows that without the last coditio i (3.1), we have that ĝ g I = O P {h α (h ) 1/2 log(1/h )} + O(h β ), where the O(h β ) term comes from the determiistic bias. So, choosig h (/ log ) 1/(2α+2β+1) optimizes the rate o the right had side, ad the resultig rate of covergece of ĝ g I is O P {(/ log ) β/(2α+2β+1) }. The last coditio i (3.1) requires to choose h of smaller order tha (/ log ) 1/(2α+2β+1) (by log factors), so that the variace term domiates the bias term. We will later discuss the problem of bias after presetig the theorems (see emark 3.3). For Coditio (vii) to be o-void, we require β > 1/2. We first state a theorem that establishes that, uder Assumptio 3.1, the distributio of Ẑ I = sup x I Ẑ(x), where {Ẑ(x) : x I} is defied i (2.4), ca be approximated by that of the supremum of a certai Gaussia process, which is a buildig block for provig validity of the proposed cofidece bad. ecall that a Gaussia process {Z(x) : x I} idexed by I is a tight radom variable i l (I) if ad oly if I is totally bouded for the itrisic pseudo-metric ρ 2 (x, y) = E[{Z(x) Z(y)} 2 ] for x, y I, ad Z has sample paths almost surely uiformly ρ 2 -cotiuous; see va der Vaart ad Weller (1996, p.41). Theorem 3.1 (Gaussia approximatio). Uder Assumptio 3.1, for each sufficietly large, there exists a tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, { } sup P Ẑ I z P { Z G I z } 0. (3.2) z Theorem 3.1 derives a itermediate Gaussia approximatio to the process Ẑ, i the sese that the approximatig Gaussia process Z G depeds o the sample size. It could be possible to further show that, if I is ot sigleto, uder additioal coditios, for some sequeces a > 0 ad b, a ( ẐG I b ) coverges i distributio to a Gumbel distributio. However, while it is mathematically itriguig, we avoid to use the Gumbel approximatio, sice 1) the Gumbel approximatio is slow ad the coverage error of the resultig cofidece bad is of order 1/ log (see Hall, 1991), ad 2) derivig the Gumbel approximatio would require additioal restrictive coditios o the measuremet error distributio. For example, i a problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) derive a 15

16 16 K. KATO AND Y. SASAKI Gumbel approximatio to the supremum deviatio of the decovolutio kerel desity estimator, thereby establishig a Smirov-Bickel-oseblatt type theorem (Smirov, 1950; Bickel ad oseblatt, 1973) for the decovolutio kerel desity estimator. But to do so, they require more restrictive coditios o the measuremet error distributio tha those i the preset paper (see their Assumptio 2). The followig theorem shows asymptotic validity of the proposed cofidece bad. Theorem 3.2 (Validity of multiplier bootstrap cofidece bad). Uder Assumptio 3.1, as, { Ẑξ } sup P I z D P { Z G I z } P 0, (3.3) z where Z G is a Gaussia radom variable i l (I) give i Theorem 3.1. Therefore, for the cofidece bad Ĉ1 τ defied i (2.6), we have as, { } P g(x) Ĉ1 τ (x) x I = 1 τ + o(1). (3.4) Fially, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 3.1. Ispectio of the proof shows that the result (3.4) holds eve whe τ = τ 0 as. Furthermore, the supremum width of the bad is O P {h α (h ) 1/2 log(1/h ) log(1/τ )}. emark 3.2. If we take h = v (/ log ) 1/(2α+2β+1) for v (log ) 1, the the supremum width of the bad Ĉ1 τ is (/ log ) β/(2α+2β+1) (log ) α+1/2. emark 3.3 (Bias). For ay oparametric iferece problem, how to deal with the determiistic bias is a delicate ad difficult problem. See Sectio 5.7 i Wasserma (2006) for related discussios. I the preset paper, we employ udersmoothig badwidths so that the bias is egligible relative to the variace part. A alterative approach is to estimate the bias at each poit, ad costruct a bias correct cofidece bad. See, for example, Eubak ad Speckma (1993) ad Xia (1998) for the error-free case. 10 However, i EIV regressio, estimatio of the bias is ot quite attractive for a couple of reasos. First, the bias cosists of higher order derivatives of g ad f X, ad estimatio of these higher order derivatives is difficult, especially i the EIV case. This is because estimatio of g ad f X is a ill-posed iverse problem ad rates of covergece of the derivative estimators of g ad f X are eve slower tha those i the error-free case. Secod, oe of popular kerels used i EIV regressio ad decovolutio is a flap top kerel (McMurry ad Politis, 2004) which is a ifiite order kerel, ad if we use a flap top kerel, the the bias 10 More recet discussios regardig the problem of bias i oparametric iferece problems iclude Hall ad Horowitz (2013), Cherozhukov et al. (2014b), Armstrog ad Kolesár (2014), Caloico et al. (2015), ad Scheach (2015). These paper do ot cover EIV regressio.

17 is ot calculated i a closed form. 11 See emark 1 i Bissatz et al. (2007) for a related issue i the decovolutio case. emark 3.4 (Super-smooth case). I the preset paper, we focus o the case where the measuremet error desity is ordiary smooth, similarly to Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015) that study iferece i decovolutio ad oparametric EIV regressio. If the measuremet error desity is super-smooth, i.e., its characteristic fuctio decays expoetially fast as t, the 1) i view of the poitwise asymptotic ormality result i Fa ad Masry (1992), the asymptotic behavior of the variace fuctio s 2 (x) is much more complex; 2) miimax rates of covergece for estimatio of g uder the sup-orm loss are logarithmically slow (i.e., of the form (log ) c for some costat c > 0), eve whe the measuremet error distributio is assumed to be kow (Fa ad Truog, 1993). These difficulties prevet us from directly extedig our aalysis to the super-smooth case. Hece the super-smooth case is left for future research. The proofs of Theorems 3.1 ad 3.2 build o o-trivial applicatios of the itermediate Gaussia ad multiplier bootstrap approximatio theorems developed i Cherozhukov et al. (2014a,b, 2016). However, we stress that Theorems 3.1 ad 3.2 do ot follow directly from the geeral theorems i Cherozhukov et al. (2014a,b, 2016) ad require substatial work. This is because 1) first of all, how to device a multiplier bootstrap i EIV regressio is ot apparet, ad as discussed i emark 2.1 our costructio of the multiplier process appears to be ovel; 2) the populatio decovolutio kerel K is implicitly defied via the Fourier iversio ad substatially differet from stadard kerels i the error-free case; ad 3) the decovolutio kerel K is i fact ukow ad estimated, so that its estimatio error has to be take ito accout. A alterative stadard techique to derive Gaussia approximatios similar to (3.2) is to apply the Komlós-Major-Tusády (KMT) strog approximatio (Komlós et al., 1975). problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) (ad Schmidt-Hieber et al. (2013)) use the KMT approximatio to derive Gaussia approximatios to the decovolutio kerel desity estimator. However, the KMT approximatio is tailored to empirical processes idexed by uivariate fuctios ad hece is ot applicable to our problem. Alteratively, we ca use io s couplig (see io, 1994), but to apply io s couplig, we would have to assume (at least) that Y is bouded (rather tha fiite fourth momet) ad K has total variatio of order h α (which requires additioal coditios o the measuremet error distributio). By employig the techiques developed i Cherozhukov et al. (2014a,b, 2016), we are able to avoid such restrictive coditios. 17 I a 11 For example, Scheach (2004) ad Bissatz et al. (2007) use flap top kerels i their simulatio studies.

18 18 K. KATO AND Y. SASAKI 4. Badwidth selectio The theory developed i the previous sectio prescribes admissible rates for the badwidth h that require udersmoothig. The literature provides data-drive approaches to badwidth selectio, which typically aim at miimizig the MISE (cf. Delaigle ad Hall, 2008). These datadrive approaches ted to yield o-udersmoothig rates for the badwidth, ad are cotrary to our requiremets. I this light, we propose here a ovel alterative approach to the badwidth selectio. To emphasize the depedece o a arbitrary cadidate badwidth h > 0, write s 2 (x; h) = Var({Y g(x)}k ((x W )/h; h)), A (x; h) = E[{Y g(x)}k ((x W )/h; h)], K (x; h) = 1 e itx ϕ K(t) 2π ϕ ε (t/h) dt. ad Note that A (x) = A (x; h ), s 2 (x) = s (x; h ), ad K (x) = K (x; h ). A optimal choice h (igorig the log factor) balaces the uiform squared bias A 2 ( ; h) I ad the uiform variace s 2 ( ; h)/ I, i.e., h A2 ( ; h) I = h=h h s2 ( ; h)/ I h=h A atural way of udersmoothig is to choose the smallest h > 0 such that c h A2 ( ; h) I h s2 ( ; h)/ I for some c > 1 where c is icreasig i. We will try alterative sequeces {c } =1 i the subsequet simulatio studies to recommed practical choices. I practice, we do ot kow g or the distributio of (Y, X). For g to be used for badwidth selectio, we use a polyomial regressio g uder EIV, e.g., g(x) = g 0 + g 1 x where ) ( ( g0 = g W j 1 W j 1 W 2 j m 1 m η2 j ) 1 ( 1 Y ) j 1 W. jy j A polyomial of degree three will be employed throughout i the simulatio studies. We make a grid 0 < h,1 <... < h,j of cadidate badwidths, ad the choose h,j with the smallest j {2,..., J} such that c ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ),

19 where ŝ 2 (x; h) = 1 Â (x; h) = 1 K (x; h) = 1 2π {Y j g(x)} 2 K2 ((x W j )/h; h) Â2 (x; h), {Y j g(x)} K ((x W j )/h; h), ad e itx ϕ K(t) ϕ ε (t/h) dt Because we use the fiite sample estimates, either ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ) eeds to be mootoe icreasig i the idex j i geeral. As ( Â2 such, we mootoize these differeces ) of the estimates i the followig maer. Let ( ; h,j ) I Â2 ( ; h,j 1 ) I ad ŝ 2,j = ( ŝ 2 ( ; h,j )/ I ŝ 2 ) ( ; h,j 1 )/ I. Â,j = The mootoizatio algorithm executes the followig assigmets i the icreasig order of j: Â,j if Â,j > Â,j+1 Â,j+1 := ad Â,j+1 if Â,j Â,j+1 ŝ 2 ŝ 2,j if ŝ 2,j,j+1 := < ŝ2,j+1. ŝ 2,j+1 if ŝ 2,j ŝ2,j+1 emark 4.1. The above guide to badwidth selectio applies to the case of α > 1/2. We could accommodate the case of α 1/2 if we modify this method by replacig s 2 (x; h), A (x; h), ŝ 2 (x; h) ad Â(x; h) by s 2 (x; h)/h 2, A (x; h)/h, ŝ 2 (x; h)/h 2 ad Â(x; h)/h, respectively. We implemeted simulatio studies uder both of these two alterative methods of badwidth selectio, ad foud that the method described above shows superior performaces i terms of the distace betwee omial ad simulated coverage probabilities for the data geeratig models that we cosider. Therefore, we oly suggest the method which we describe above, ad preset simulatio studies below oly for this versio of badwidth selectio rule. 5. Simulatio studies 5.1. Simulatio Framework. We cosider two data geeratig models, reflectig two commo patters of data availability. For the first model, the data D = {(Y j, W j, η j )} is costructed by Model 1 Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) d W j = X j + ε j ε j = ηj Laplace (0, 2 1/2 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε j, ad η j are mutually idepedet. The characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /2) 1, which is o-vaishig o ad ordiary smooth of order α = 2. The sigal-to-oise ratio is Var(X)/ Var(ε) = σ X. 19 or

20 20 K. KATO AND Y. SASAKI For the secod model, we cosider the followig repeated measuremet or pael data setup. Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) Model 2 W (1) j = X j + ε (1) j ε (1) j Laplace (0, 2 1 ) W (2) j = X j + ε (2) j ε (1) j Laplace (0, 2 1 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε (1) j, ad ε (2) j are mutually idepedet. We observe {(Y j, W (1) j, W (2) j )}. By defiig W j := (W (1) j + W (2) j )/2 ad η j := (W (1) j W (2) j )/2, we obtai the geerated data D = {(Y j, W j, η j )} such that W j = X j + ε j with ε j = (ε 1 +ε 2 )/2 d = η j. For Model 2, the characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /16) 2, which is o-vaishig o ad ordiary smooth with order α = 4. The sigal-to-oise ratio is give by Var(X)/ Var(ε) = 2σ X. Simulatios are ru across five differet specificatios of g, ad alterative values of the sigalto-oise ratio σ X {2, 4}. The five specificatios of g are g(x) = x, g(x) = x 2, g(x) = x 3, g(x) = si(x), ad g(x) = cos(x). We use Mote Carlo simulatios to evaluate the coverage probabilities of our cofidece bads for g o the iterval I = [ σ X, σ X ]. We use the kerel fuctio K defied by its Fourier trasform ϕ K give by 1 if t c { } ϕ K (t) = exp b exp( b/( t c) 2 ) if c < t < 1 ( t 1) 2 0 if 1 t where b = 1 ad c = 0.05 (cf. McMurry ad Politis, 2004; Bissatz et al., 2007). The fuctio ϕ K is ifiitely differetiable with support [ 1, 1], ad its iverse Fourier trasform K is realvalued ad itegrable with K(x)dx = 1. We follow the badwidth selectio rule discussed i Sectio 4. I this simulatio study, we try alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) Simulatio esults. Tables 1, 2, 3, 4, ad 5 show simulatio results for g(x) = x, x 2, x 3, si(x), ad cos(x), respectively. Each table cotais results for each of Model 1 ad Model 2, for each of the three sample sizes = 250, 500, ad 1000, ad for each of σ X = 2.0 ad 4.0 that cotrols the sigal-to-oise ratio. Simulated coverage probabilities are reported for each of the three omial coverage probabilities, 0.800, 0.900, ad I all the cases, simulated coverage probabilities are reasoably close to the desiged omial coverage probabilities for large sample sizes. I particular, the results for polyomial specificatios exhibit a very high coverage accuracy. The high performace for the polyomial specificatios may well be imputed to our method of badwidth selectio which relies o a prelimiary polyomial regressio uder EIV. However, it is otable that the coverage accuracy is reasoably high eve for o-polyomial periodic fuctios like g(x) = si(x) ad g(x) = cos(x).

21 There seems o systematic patter as to which of the alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) 0.5 ted to yield better coverage results. As such, we recommed the itermediate choice c = (/100) 0.3 as a practical guidelie eal data aalysis Accordig to Ceters for Disease Cotrol ad Prevetio (CDC) of the US Departmet of Health ad Huma Services, more tha oe-third (36.5%) of US adults have obesity (defied by body mass idex or BMI > 30) i the period betwee 2011 ad 2014 (Ogde et al., 2015). The estimated aual medical cost of obesity i the Uited States was 147 billio 2008 U.S. dollars, with the medical costs for people who are obese beig $1,429 higher tha those of ormal weight (Fikelstei et al., 2009). While there is a extesive body of literature o cost estimatio of obesity, it is a limitatio that commoly used data sets cotai oly self-reported body measures, ad hece the values of BMI geerated from them are proe to biases (Boud, et al., 2001). More recetly, Cawley ad Meyerhoefer (2012) use the istrumetal variable approach to address this issue i cost estimatio of obesity. I this sectio, we employ our data combiatio approach to treat the self-reportig errors, ad draw cofidece bads for oparametric regressios of medical costs o BMI. We focus o costs measured by medical expeditures. With this said, we ote that there are also idirect costs of obesity which we do ot accout for, e.g., the costs of obesity are kow to be passed o to obese workers with employer-sposored health isurace i the form of lower cash wages ad labor market discrimiatio agaist obese job seekers by isurace-providig employers (Bhattacharya ad Budorf, 2009) see also Cawley (2004). Details of the two data sets which we combie are as follows. The Natioal Health ad Nutritio Examiatio Survey (NHANES) of CDC cotais data of survey resposes, medical examiatio results, ad laboratory test results. The survey resposes iclude demographic characteristics, such as geder ad age. I additio to the demographic characteristics, the survey resposes also cotai self-reported body measures ad self-reported health coditios. Amog the self reported body measures are height i iches ad weight i pouds. These two variables allow us to costruct the BMI i lbs/i 2 as a geerated variable. We covert this uit ito the metric uit (kg/m 2 ). The NHANES also cotais medical examiatio results, icludig cliically measured BMI i kg/m 2. We treat the BMI costructed from the self-reported body measures as W j, ad the cliically measured BMI as X j. From the NHANES as a validatio data set of size m, we ca compute η j = W j X j for each j = 1,..., m. The Pael Survey of Icome Dyamics (PSID) is a logitudial pael survey of America families coducted by the Survey esearch Ceter at the Uiversity of Michiga. This data set cotais a log list of variables icludig demographic characteristics, socio-ecoomic attributes, expeses, ad health coditios, amog others. I particular, the PSID cotais self-reported body measures of the household head, icludig height i iches ad weight i pouds. These

22 22 K. KATO AND Y. SASAKI two variables allow us to costruct the body mass idex (BMI) i lbs/i 2 as a geerated variable. Agai, we covert this uit ito the metric uit (kg/m 2 ). The PSID also cotais medical ad prescriptio expeses. We treat the BMI costructed from the self-reported body measures as W j, ad the medical ad prescriptio expeses as Y j. We ote that the iformatio cotaied i the PSID are mostly at the household level, as opposed to the idividual level, ad thus Y j idicates the total medical ad prescriptio expeses of household j. To focus o the idividual medical ad prescriptio expeses rather tha household expeses, we oly cosider the subsample of the households of sigle me with o depedet family, for which the total medical ad prescriptio expeses of the household equal to the idividual medical ad prescriptio expeses of the household head. Hece, the reported regressio results cocer these selected subpopulatios. Combiig the NHANES of size m ad the PSID of size, we obtai the geerated data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m } to which we ca apply our method i order to draw cofidece bads for the regressio fuctio g of the model Y = g(x) + U with E[U X, ε] = 0. We set I = [15, 35] as the iterval o which we draw cofidece bads. This iterval I has 25 (the WHO cut-off poit for overweight) as the midpoit, ad is cotaied i the covex hull of the empirical support of W. The kerel fuctio ad the badwidth rule carry over form our simulatio studies. The sequece {c } =1 used for badwidth choice is defied by c = (/100) 0.3 followig the recommedatio which we made from our simulatio results. To accout for the differet medical coditios across ages, we categorize the sample ito the followig subsamples: (a) male idividuals aged 20 34, (b) male idividuals aged 35 49, (c) male idividuals aged 50 64, ad (d) male idividuals aged 65 or above. Note that this stratificatio takes ito accout the fact that 64 ad 65 make the cutoff of medicare eligibility, ad hece that group (d) faces differet expediture schedules ad differet ecoomic icetives of health care utilizatio from groups (a) (c) see Card et al. (2008). After deletig observatios with missig fields from the NHANES , we obtai the followig sample sizes of these four subsamples: (a) m = 407, (b) m = 435, (c) m = 407, ad (d) m = 431. After deletig observatios with missig fields from the PSID 2009 for total medical expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 413, (b) = 181, (c) = 180, ad (d) = 64. Similarly, after deletig observatios with missig fields from the PSID 2009 for prescriptio expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 528, (b) = 243, (c) = 247, ad (d) = 106. Note that we use similar survey periods aroud 2009 for both the NHANES ad PSID to remove potetial time effects. Figure 1 displays estimates ad cofidece bads for total medical expeses i 2009 US dollars as the depedet variable. Figure 2 similarly displays estimates ad cofidece bads for prescriptio expeses i 2009 US dollars as the depedet variable. I both figures, the estimates

23 are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads. The four parts of the figure represet (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. We see that the levels of both total medical expeses ad prescriptio expeses ted to icrease i age, as expected. For the groups (a) (b) of youg me, both total medical expeses ad ad prescriptio expeses exhibit little partial correlatio with BMI. For the group (c) of middle aged me, o the other had, the relatios tur ito positive oes. For the group (d) of seior me, total medical expeses ad BMI cotiue to have a positive relatioship, but prescriptio expeses exhibit little partial correlatio with BMI. If we look at the 90% cofidece bad for the group (c) of me aged from 50 to 64, aual average total medical expeses are approximately $5,399 $17,015 if BMI = 20, approximately $7,316 $18,119 if BMI = 25, ad approximately $7,868 $21,934 if BMI = 30. Likewise, aual average prescriptio expeses are approximately $283 $636 if BMI = 20, approximately $372 $761 if BMI = 25, ad approximately $429 $951 if BMI = 30. These cocrete umbers illustrate that cofidece bads are useful to make iterval predictios of icurred average costs, ad this coveiet feature has practical values added to the existig methods which oly allow for reportig estimates with ukow extets of ucertaities. 7. Extesios 7.1. Applicatio to specificatio testig. The results of the preset paper ca be used for specificatio testig of the regressio fuctio g. 23 Specificatio testig i EIV models is importat sice oparametric estimatio of a regressio fuctio has slow rates of covergece, eve slower tha stadard error-free oparametric regressio, while correct specificatio of a parametric model eables us to estimate the regressio fuctio with faster rates, ofte of oder 1/. Suppose that we wat to test whether the regressio fuctio g belogs to a parametric class {g θ : θ Θ} where Θ is a subset of a metric space (i most cases a Euclidea space). Popular specificatios of g iclude liear ad polyomial fuctios. I cases where g is liear or polyomial, it is possible to estimate the coefficiets with -rate uder suitable regularity coditios (Fuller, 1987; Cha ad Mak, 1985; Hausma et al., 1991; Cheg ad Scheeweiss, 1998). Suppose ow that g = g θ for some θ Θ ad θ ca be estimated by θ with a sufficietly fast rate, i.e., g g θ θ I = o P {h α {h log(1/h )} 1/2 }, ad that Assumptio 3.1 is satisfied with g = g θ. The it is ot difficult to see from the proof of Theorem 3.2 that f X (x) h (ĝ(x) g θ(x)) ŝ (x) uiformly i x I, so that = f X (x) h (ĝ(x) g θ (x)) ŝ (x) = f X (x) h (ĝ(x) g θ (x)) ŝ (x) { } P g θ(x) / Ĉ1 τ (x) for some x I τ. + f X (x) h (g θ (x) g θ(x)) ŝ (x) + o P {(log(1/h )) 1/2 },

24 24 K. KATO AND Y. SASAKI Therefore, the test that rejects the hypothesis that g = g θ for some θ Θ if g θ(x) / Ĉ1 τ (x) for some x I is asymptotically of level τ. We summarize the above discussio as a corollary. Corollary 7.1. Suppose that g = g θ for some θ Θ where Θ is a subset of a metric space, ad that Assumptio 3.1 is satisfied with g = g θ. Let θ be ay estimator of θ such that g θ g θ I = o P {h α {h log(1/h )} 1/2 }; the P{g θ(x) / Ĉ1 τ (x) for some x I} τ. emark 7.1 (Literature o specificatio testig i EIV regressio). The literature o specificatio testig for EIV regressio is large. See Zhu et al. (2003), Zhu ad Cui (2005), Hall ad Ma (2007), Sog (2008), Otsu ad Taylor (2016), ad refereces therei. However, oe of those papers cosiders L -based specificatio tests Additioal regressors without measuremet errors. I practical applicatios, we may have additioal regressors Z, possibly vector valued, without measuremet errors. Suppose that we are iterested i estimatio ad makig iferece o g(x, z) = E[Y X = x, Z = z]. We assume that E[Y g(x, Z) X, Z, ε] = 0, ad ε is idepedet from X coditioally o Z. I priciple, the aalysis ca be reduced to the case where there are o additioal regressors by coditioig o Z = z. If Z is discretely distributed with fiitely may mass poits, the g(x, z), where z is a mass poit, ca be estimated by usig oly observatios j for which Z j = z. If Z is cotiuously distributed, the g(x, z) ca be estimated by usig observatios j for which Z j is close to z, which ca be implemeted by usig kerel weights. However, the detailed aalysis of this case is ot preseted here for brevity Cofidece bads for coditioal distributio fuctios. The techiques used to derive cofidece bads for the coditioal mea i EIV regressio ca be exteded to the coditioal distributio fuctio. Suppose ow that we are iterested i costructig cofidece bads for the coditioal distributio fuctio g(y, x) = P(Y y X = x) o a compact rectagle J I where J ad I are compact itervals, ad where we do ot observe X but istead observe W = X + ε with ε (measuremet error) beig idepedet of (Y, X). As before, we assume that i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} o (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio. Sice g(y, x) = E[1(Y y) X = x] where 1( ) deotes the idicator fuctio, we may estimate g(y, x) by ĝ(y, x) = µ(y, x)/ f X (x), where µ(y, x) = 1 h 1(Y j y) K ((x W j )/h ). To costruct a cofidece bad for g(y, x), we apply the methodology developed i Sectio 2 with Y j replaced by 1(Y j y) for each y. Let ŝ 2 (y, x) = 1 {1(Y j y) ĝ(y, x)} 2 K2 ((x W j )/h ),

25 ad geerate idepedet stadard ormal radom variables ξ 1,..., ξ idepedet of the data D. Cosider the multiplier stochastic process Ẑ(y, ξ 1 x) = ŝ (y, x) ξ j {1(Y j y) ĝ(y, x)} K ((x W j )/h ), ad for τ (0, 1), let ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ J I give D. The the resultig cofidece bad for g(y, x) o J I is give by [ ] Ĉ 1 τ (y, x) = ĝ(y, x) ± ŝ(y, x) f X (x) ĉ (1 τ), (y, x) J I. h We make the followig assumptio, which is aalogous to Assumptio 3.1. Assumptio 7.1. Let I, J be compact itervals i. (i) The fuctio (y, w) P(Y y W = w)f W (w) is cotiuous i w uiformly i y J. (ii) The characteristic fuctio of X, ϕ X (t) = E[e itx ], t, is itegrable o. Furthermore, sup y J E[g(y, X)eitX ] dt <. (iii) Coditio (iii) i Assumptio 3.1. (iv) The fuctios f X ad g(y, )f X ( ) belog to Σ(β, B) for some β > 1/2 ad B > 0 for all y J. Let k deote the iteger such that k < β k + 1. (v) Coditio (v) i Assumptio 3.1. (vi) For all x I, f X (x) > 0, ad if (y,x) J I E[{1(Y y) g(y, x)} 2 W = x]f W (x) > 0. (vii) Coditio (vii) i Assumptio 3.1. Theorem 7.1. Uder Assumptio 7.1, as, P{g(y, x) Ĉ1 τ (y, x) (y, x) J I} 1 τ. Furthermore, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 7.2. To the best of our kowledge, Theorem 7.1 is also a ew result. 8. Coclusio I this paper, we develop a method to costruct uiform cofidece bads for oparametric EIV regressio fuctio g. We cosider the practically relevat case where the distributio of the measuremet error is ukow. We assume that there is a idepedet sample from the measuremet error distributio, where the sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. Such a sample from the measuremet error distributio is available if there is, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed. We establish asymptotic validity of the proposed cofidece bad for ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric EIV regressio. We also propose a practical 25

26 26 K. KATO AND Y. SASAKI method to choose a udersmoothig badwidth for valid iferece. Simulatio studies verify the fiite sample performace of the proposed cofidece bad. Fially, we discuss extesios of our results to specificatio testig, cases with additioal regressors without measuremet errors, ad cofidece bads for coditioal distributio fuctios.

27 27 Appedix A. Proofs A.1. Techical tools. I this sectio, we collect techical tools that will be used i the proofs of Theorems 3.1 ad 3.2. The proofs rely o moder empirical process theory. For a probability measure Q o a measurable space (S, S) ad a class of measurable fuctios F o S such that F L 2 (Q), let N(F, Q,2, δ) deote the δ-coverig umber for F with respect to the L 2 (Q)- semiorm Q,2. The class F is said to be poitwise measurable if there exists a coutable subclass G F such that for every f F there exists a sequece g m G with g m f poitwise. A fuctio F : S [0, ) is said to be a evelope for F if F (x) sup f F f(x) for all x S. See Sectio 2.1 i va der Vaart ad Weller (1996) for details. Lemma A.1 (A useful maximal iequality). Let X, X 1,..., X be i.i.d. radom variables takig values i a measurable space (S, S), ad let F be a poitwise measurable class of (measurable) real-valued fuctios o S with measurable evelope F. Suppose that there exist costats A e ad V 1 such that sup N(F, Q,2, ε F Q,2 ) (A/δ) V, 0 < δ 1, Q where sup Q is take over all fiitely discrete distributios o S. Furthermore, suppose that 0 < E[F 2 (X)] <, ad let σ 2 > 0 be ay positive costat such that sup f F E[f 2 (X)] σ 2 E[F 2 (X)]. Defie B = E[max 1 j F 2 (X j )]. The E 1 {f(x j ) E[f(X)]} F ( C V σ 2 log where C > 0 is a uiversal costat. A ) ( E[F 2 (X)] + V B log σ A ) E[F 2 (X)], σ Proof. See Corollary 5.1 i Cherozhukov et al. (2014a). Lemma A.2 (A auxiliary maximal iequality). Let ζ 1,..., ζ be radom variables such that E[ ζ j r ] < for all j = 1,..., for some r 1. The [ ] E max ζ j 1/r max (E[ ζ j r ]) 1/r. 1 j 1 j Proof. This iequality is well kow, ad follows from Jese s iequality. Ideed, E[max 1 j ζ j ] (E[max 1 j ζ j r ]) 1/r ( E[ ζ j r ]) 1/r 1/r max 1 j (E[ ζ j r ]) 1/r. The followig ati-cocetratio iequality for the supremum of a Gaussia process will play a crucial role i the proofs of Theorems 3.1 ad 3.2.

28 28 K. KATO AND Y. SASAKI Lemma A.3 (Ati-cocetratio for the supremum of a Gaussia process). Let T be a oempty set, ad let X = (X t : t T ) be a tight Gaussia radom variable i l (T ) with mea zero ad E[Xt 2 ] = 1 for all t T. The for ay h > 0, sup P{ X T x h} 4h(1 + E[ X T ]). x Proof. See Corollary 2.1 i Cherozhukov et al. (2014b); see also Theorem 3 i Cherozhukov et al. (2015). A.2. Proof of Theorem 3.1. I what follows, we always assume Assumptio 3.1. Before provig Theorem 3.1, we first prove some prelimiary lemmas. ecall that A (x) = E[{Y g(x)}k ((x W )/h )] ad s 2 (x) = Var({Y g(x)}k ((x W )/h )). Observe that K = O(h α ) uder our assumptio. I what follows, the otatio sigifies that the left had side is bouded by the right had side up to a positive costat idepedet of ad x. Lemma A.4. The followig bouds hold: (i) A I = O(h β+1 ). (ii) For sufficietly large, if x I s 2 (x) h 2α+1. (iii) For l = 0, 1, 2, we have sup x E[ Y K ((x W )/h ) 2+l ] = O(h (2+l)α+1 ). Proof. (i). Sice E[Y e itw ] = E[{g(X) + U}e it(x+ε) ] = ψ X (t)ϕ ε (t), we have that E[Y K ((x W )/h )] = h 2π = h 2π e itx E[Y e itw ] ϕ K(th ) dt ϕ ε (t) e itx ψ X (t)ϕ K (th )dt. Sice ψ X ( ) ad ϕ K ( h ) are the Fourier trasforms of gf X ad h 1 K( /h ), respectively, the Fourier iversio formula yields that h 2π e itx ( ψ X (t)ϕ K (th )dt = h gfx (h 1 = K( /h )) ) (x) g(w)f X (w)k((x w)/h )dw. Note that the far left ad right had sides are cotiuous i x, ad so the equality holds for all x. Likewise, we have E[K ((x W )/h )] = f X(w)K((x w)/h )dw for all x, so that A (x) = {g(w) g(x)}k((x w)/h )f X (w)dw = h {g(x h w) g(x)}f X (x h w)k(w)dw.

29 29 By the Taylor expasio, for ay x, w, {g(x h w) g(x)}f X (x h w) = k 1 (gf X ) (j) (x) g(x)f (j) X (x) ( h w) j j! + (gf X) (k) (x θh w) g(x)f (k) X (x θh w) ( h w) k, k! for some θ [0, 1]. Sice wj K(w)dw = 0 for j = 1,..., k ad f X, gf X Σ(β, B), we have {g(x h w) g(x)}f X (x h w)k(w)dw k = (gf X ) {g(x (j) (x) g(x)f (j) X h w) g(x)}f X (x h w) (x) ( h w) j K(w)dw j! (1 + g I)Bh β w β K(w) dw. k! This shows that A I = O(h β+1 ). (ii). Sice A (x) = E[{Y g(x)}k ((x W )/h )] = O(h β+1 ) uiformly i x I, it suffices to show that if E[{Y x I g(x)}2 K((x 2 W )/h )] (1 o(1))h 2α+1. Observe that E[Y W = w]f W (w) = ((gf X ) f ε ) (w) (compare the Fourier trasforms of both sides), ad defie V (x, w) = E[{Y g(x)} 2 W = w]f W (w) = (E[Y 2 W = w] + g 2 (x))f W (w) 2g(x) ((gf X ) f ε ) (w). The fuctio (gf X ) f ε is bouded ad cotiuous by boudedess of gf X. Sice E[Y 2 W = ], f W, ad (gf X ) f ε are bouded ad cotiuous o, ad g is bouded ad cotiuous o I, we have that the fuctio (x, w) V (x, w) is bouded ad cotiuous o I. I particular, sice V (x, x) > 0 for all x I uder our assumptio, we have that if x I V (x, x) > 0. Now, observe that E[{Y g(x)} 2 K((x 2 W )/h )] = V (x, w)k((x 2 w)/h )dw = h V (x, x h w)k(w)dw. 2 Furthermore, we have that K(w)dw 2 = 1 ϕ K (t) 2 dt h 2α 2π ϕ ε (t/h ) 2 by Placherel s theorem. Hece, it suffices to show that h2α {V (x, x h w) V (x, x)}k(w)dw 2 0. sup x I (A.1)

30 30 K. KATO AND Y. SASAKI From the proof of Lemma 3 i Kato ad Sasaki (2016), we have that h 2α K 2 (x) mi{1, x 2 }. By the defiitio of V (x, w), for ay ρ > 0, there exists sufficietly small δ > 0 such that V (x, x + w) V (x, x) ρ for all x I wheever w δ. Therefore, sup V (x, x h w) V (x, x) h 2α K(w)dw 2 x I ρ mi{1, w 2 }dw + 2 V I w 2 dw ρ + o(1). w δ/h w >δ/h (iii). Pick ay l = 0, 1, 2. Sice K h α, we have that E[ Y K ((x W )/h ) 2+l ] = h E[ Y 2+l W = x h w] K (w) 2+l f W (x h w)dw h lα+1 V l (x h w)k 2 (w)dw h lα+1 V l K(w)dw 2 h (2+l)α+1, where V l (w) = E[ Y 2+l W = w]f W (w). This completes the proof. Lemma A.5. ϕ ε ϕ ε [ h 1,h 1 ] = O P{m 1/2 log(1/h )}. Proof. See Lemma 4 i Kato ad Sasaki (2016); see also Theorem 4.1 i Neuma ad eiß (2009). Cosider the followig classes of fuctios F (1) = {(y, w) yk ((x w)/h ) : x }, F (2) = F (3) F (4) = { (y, w) 1 s (x) {y g(x)}k ((x w)/h ) : x I = {(y, w) {y g(x)}k((x 2 w)/h ) : x I}, { (y, w) 1 s 2 (x){y g(x)}2 K((x 2 w)/h ) : x I }, }. (A.2) I view of the fact that K h α (idepedet of ) such that K D 1 h α ad if x I s (x) h α+1/2, choose costats D 1, D 2 > 0 ad 1/s I D 2 h α 1/2. Let F (1) (y, w) = D 1 y h α, F (2) (y, w) = D 1 D 2 ( y + g I )/ h, F (3) (y, w) = D 1 ( y + g I )h 2α, F (4) (y, w) = {F (2) (y, w)} 2. Note that F (l) is a evelope fuctio for F (l) for each l = 1,..., 4. Lemma A.6. There exist costats A, v e idepedet of such that sup Q N(F (l), Q,2, δ F (l) Q,2 ) (A/δ) v, 0 < δ 1, (A.3) for all l = 1,..., 4, where sup Q is take over all fiitely discrete distributios o 2.

31 31 Proof. Cosider the followig classes of fuctios K = {w K ((x w)/h ) : x }, K 2 = {f 2 : f K }. Lemma 1 i Kato ad Sasaki (2016) ad Corollary A.1 i Cherozhukov et al. (2014a) yield that there exist costats A 1, v 1 e idepedet of such that sup Q N(K, Q,2, D 1 h α δ) (A 1 /δ) v 1 ad sup Q N(K, 2 Q,2, D1 2h 2α δ) (A 1 /δ) v 1 for all 0 < δ 1. I what follows, we oly prove (A.3) for l = 2; the proofs for the other cases are completely aalogous give the above bouds o the coverig umbers for K ad K. 2 Let H = {y {y g(x)}/s (x) : x I}, ad observe that, sice 1/s I D 2 h α 1/2, there exist costats A 2, v 2 e idepedet of such that sup Q N(H, Q,2, δ H Q,2 ) (A 2 /δ) v 2 for all 0 < δ 1, where H (y) = D 2 ( y + g I )h α 1/2 is a evelope fuctio for H. This ca be verified by a direct calculatio, or observig that H ( {y ay + b : a > 0, b }) is a VC subgraph class with VC idex at most 4 (cf. va der Vaart ad Weller, 1996, Lemma ), ad applyig Theorem i va der Vaart ad Weller (1996). Let H K := {(y, w) f 1 (y)f 2 (w) : f 1 H, f 2 K } F (2), ad ote that H (y)d 1 h α = F (2) (y, w). From Corollary A.1 i Cherozhukov et al. (2014a), there exist costats A 3, v 3 e idepedet of such that sup Q N(H K, Q,2, δ F (2) Q,2 ) (A 3 /δ) v 3 for all 0 < δ 1. Now, the desired result follows from the observatio that N(F (2), Q,2, 2δ) N(H K, Q,2, δ) for all δ > 0. Lemma A.7. We have f X ( ) E[ f X ( )] = O P {h α (h ) 1/2 log(1/h )} ad E[ f X ( )] f X ( ) = O(h β ) = o{h α (h log(1/h )) 1/2 }. Furthermore, µ ( ) E[ µ ( )] = O P {h α (h ) 1/2 log(1/h )}. Proof. The first two results are implicit i the proofs of Corollaries 1 ad 2 i Kato ad Sasaki (2016). To prove the last result, we shall apply Lemma A.1 to the class of fuctios F (1). From Lemma A.4-(iii), we have that sup x E[Y 2 K((x 2 W )/h )] = O(h 2α+1 ). I view of the coverig umber boud for F (1) give i Lemma A.6, we may apply Lemma A.1 to F (1) to coclude that (h )E[ µ ( ) E[ µ ( )] ] = E {f(y j, W j ) E[f(Y, W )]} h α h log(1/h ) + h α F (1) E[ max Y 2 1 j j ] log(1/h ). From Lemma A.2, we have E[max 1 j Yj 2] = O(1/2 ), so that we have (h )E[ µ ( ) E[ µ ( )] ] h α h log(1/h ) + h α 1/4 log(1/h ) h α h log(1/h ), where the secod iequality follows from the first coditio i (3.1). This completes the proof.

32 32 K. KATO AND Y. SASAKI We are ow i positio to prove Theorem 3.1. Proof of Theorem 3.1. We divide the proof ito two steps. Step 1. Let r = h α {h log(1/h )} 1/2. We first prove that ĝ(x) g(x) = 1 1 f X (x) h uiformly i x I. [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) To this ed, we shall show that µ µ = o P (r ). First, observe from Lemma A.5 that if t h 1 ϕ ε (t) if t h 1 ϕ ε (t) O P {m 1/2 log(1/h )} (1 o P (1))h α. Let ψ Y W (t) = E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = ψ X (t)ϕ ε (t), ad let ψ Y W (t) = 1 Y je itw j. Decompose µ(x) µ (x) as µ(x) µ (x) = 1 e itx ψy W (t) ϕ K(th ) dt 1 e itx ψy W (t) ϕ K(th ) dt 2π ϕ ε (t) 2π ϕ ε (t) = 1 e itx ϕ K (th ) ψ Y W (t) ϕ ε (t) 2π {ψ X 0} ψ Y W (t) ϕ ε (t) ψ X(t)dt + 1 e itx ϕ K(th ) ψ Y W (t) ϕ ε(t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) dt 1 e itx ϕ K (th ) ψ Y W (t) 2π {ψ X 0} ψ Y W (t) ψ X(t)dt 1 e itx ϕ K(th ) ψ Y W (t)dt 2π {ψ X =0} ϕ ε (t) = 1 { } {ϕε } ψy e itx W (t) ϕ K (th ) 2π {ψ X 0} ψ Y W (t) 1 (t) ϕ ε (t) 1 ψ X (t)dt + 1 e itx ϕ { } K(th ) ϕε (t) ψ Y W (t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) 1 dt + 1 { } e itx ϕε (t) ϕ K (th ) 2π ϕ ε (t) 1 ψ X (t)dt. Hece the Cauchy-Schwarz iequality yields that µ(x) µ (x) 2 ψ Y W (t) 2 { h 1 } {ψ X 0} [ h 1,h 1 ] ψ Y W (t) 1 ψ X (t) 2 dt ϕ ε (t) h 1 ϕ ε (t) 1 2 dt { } { h 1 } + h 2α ψ Y W (t) 2 dt ϕ ε (t) {ψ X =0} [ h 1,h 1 ] h 1 ϕ ε (t) 1 2 dt h 1 + ϕ ε (t) h 1 ϕ ε (t) 1 2 ψ X (t) dt. (A.4) We shall boud each term o the right had side. Observe that h 1 h 1 ϕ ε (t) ϕ ε (t) 1 2 h 1 dt O P (h 2α ) ϕ ε (t) ϕ ε (t) 2 dt h 1

33 33 ad the itegral o the right had side is O P {(mh ) 1 } sice h 1 h 1 h 1 E[ ϕ ε (t) ϕ ε (t) 2 ]dt m 1 h 1 dt = 2(mh ) 1. Likewise, usig the fact that ψ X is itegrable, we have that the last term o the right had side of (A.4) is O P (h 2α m 1 ). For ay t with ψ X (t) 0, we have E[ ψ Y W (t)/ψ Y W (t) 1 2 ] E[Y 2 ]/{ ψ Y W (t) 2 }, so that E {ψ X 0} [ h 1,h 1 ] ψ Y W (t) ψ Y W (t) 1 2 ψ X (t) 2 dt h 1 1 h 1 Fially, for ay t with ψ X (t) = 0, we have ψ Y W (t) = 0, so that [ ] E ψ Y W (t) 2 dt (h ) 1. {ψ X =0} [ h 1,h 1 ] 1 dt h 2α ϕ ε (t) 2 (h ) 1. Therefore, we have µ µ 2 = O P(h 4α 2 1 m 1 + h 2α m 1 ) = o P (r). 2 From Step 2 i the proof of Theorem 1 of Kato ad Sasaki (2016), it follows that f X f X = o P (r ), which i particular implies that f X f X I f X f X I + f X f X I = o P (1) so that 1/ f X I = O P (1). Furthermore, µ I E[ µ ( )] I + µ ( ) E[ µ ( )] I ψ X(t) dt + o P (1) = O P (1). Therefore, Now, observe that ĝ ĝ I 1/ f X I µ µ I + µ I 1/ f X 1/ f X I o P (r ) + O P (1) f X f X I = o P (r ). ĝ (x) g(x) = 1 f X (x) 1 h {Y j g(x)}k ((x W j )/h ). Sice A I = O(h β+1 ) = o(h r ), we have ĝ (x) g(x) = 1 1 f X (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) h uiformly i x I. Sice uiformly i x I, ad 1 h [{Y j g(x)}k ((x W j )/h ) A (x)] = µ (x) E[ µ (x)] g(x){ f X(x) E[ f X(x)]} = O P {h α (h ) 1/2 log(1/h )} 1/ f X 1/f X I O P (1) f X f X I = O P {h α (h ) 1/2 log(1/h )},

34 34 K. KATO AND Y. SASAKI we coclude that ĝ (x) g(x) = 1 1 f X (x) h [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) uiformly i x I. This leads to the desired result of Step 1. Furthermore, the derivatio so far yields that ĝ g I = O P {h α (h ) 1/2 log(1/h )}. Step 2. By Step 1 together with the fact that if x I s (x) h α+1/2, we have Ẑ (x) = f X(x) h (ĝ(x) g(x)) s (x) 1 = s (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P {(log(1/h )) 1/2 } = Z (x) + o P {(log(1/h )) 1/2 } uiformly i x I. ecall the class of fuctios F (2) process idexed by F (2) : ν (f) = 1 defied i (A.2), ad cosider the empirical {f(y j, W j ) E[f(Y, W )]}, f F (2). We apply Theorem 2.1 i Cherozhukov et al. (2016) to approximate ν (2) F = Z I by the supremum of a Gaussia process. To this ed, we shall verify the coditios i Cherozhukov et al. (2016). First, from the coverig umber boud for F (2) give i Lemma A.6 ad fiiteess of the secod momet of F (2) (Y, W ), there exists a tight Gaussia radom variable G i l (F (2) ) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) }. Exted ν liearly to F (2) ( F (2) ) = {f, f : f F (2) }, ad observe that ν (2) F = sup (2) f F ( F (2) ) ν (f). Note that from Theorem i Gié ad Nickl (2016), G exteds to the liear hull of F (2) i such a way that G has liear sample paths, so that G (2) F = sup (2) f F ( F (2) ) G (f), ad i additio G has uiformly cotiuous paths o the symmetric covex hull of F (2). It is ot difficult to verify that the coverig umber of F (2) ( F (2) ) is at most twice that of F (2). I particular, {G (f) : f F (2) ( F (2) )} is a tight Gaussia radom variable i l (F (2) ( F (2) )) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) ( F (2) )}. Next, observe that E[ Y K ((x W )/h ) 2+l ] h (2+l)α+1 for l = 0, 1, 2 from Lemma A.4-(iii), so that sup f F (2) E[ F (2) (Y, W ) 4 ] h 2 E[ f(y, W ) 2+l ] h l/2 (E[Y 4 ] + g 4 I et al. (2016) to F (2) ( F (2) for l = 0, 1, 2. Furthermore, observe that ) h 2. Therefore, applyig Theorem 2.1 i Cherozhukov ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V havig the same distributio as

35 35 G F (2) such that { } (log ) 5/4 ν (2) F V = OP 1/4 h 1/2 + log (h ) 1/6 Now, for f,x (y, w) = {y g(x)}k ((x w)/h )/s (x), defie Z G (x) = G (f,x ), x I, = o P {(log(1/h )) 1/2 }. ad observe that Z G is a tight Gaussia radom variable i l (I) with mea zero ad the same covariace fuctio as Z such that Z G I has the same distributio as V. Sice Ẑ I V Ẑ I Z I + Z I V = o P {(log(1/h )) 1/2 }, there exists a sequece 0 such that P{ Ẑ I V > (log(1/h )) 1/2 } (which follows from the fact that the Ky Fa metric metrizes covergece i probability; see Theorem i Dudley (2002)). Observe that for ay z, P{ Ẑ I z} P{V z + (log(1/h )) 1/2 } + P{ Ẑ I V > (log(1/h )) 1/2 } = P{ Z G I z + (log(1/h )) 1/2 } +. The ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) the yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + 4 (log(1/h )) 1/2 {1 + E[ Z G I ]}. From the coverig umber boud for F (2) give i Lemma A.6, together with the facts that E[F (2) (Y, W ) 2 ] h 1 ad Var(f,x (Y, W )) = 1 for all x I, Dudley s etropy itegral boud (cf. va der Vaart ad Weller, 1996, Corollary 2.2.8) yields that which implies that E[ Z G I ] = E[ G (2) F ] log(1/(δ h ))dδ log(1/h ), P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Ẑ I z} P{ Z G I z} o(1) uiformly i z. This completes the proof. A.3. Proof of Theorem 3.2. We first prove the followig techical lemma. Lemma A.8. ŝ 2 ( )/s 2 ( ) 1 I = o P {(log(1/h )) 1 }. Proof. Observe that {Y j ĝ(x)} 2 K2 ((x W j )/h ) = {Y j g(x)} 2 K 2 ((x W j )/h ) + {g(x) ĝ(x)} 2 K 2 ((x W j )/h ) + 2{g(x) ĝ(x)}{y j g(x)}k 2 ((x W j )/h ) + {Y j ĝ(x)} 2 { K 2 ((x W j )/h ) K 2 ((x W j )/h )},

36 36 K. KATO AND Y. SASAKI so that 1 {Y j ĝ( )} 2 K2 (( W j )/h ) 1 {Y j g( )} 2 K 2 (( W j )/h ) I O P (h 2α ) ĝ g 2 I + 2 ĝ g I 1 {Y j g( )}K 2 (( W j )/h ) + 2 (Y 2 j + ĝ 2 I) K 2 K 2. (A.5) I From Step 1 i the proof of Theorem 3.1, ĝ g I = O P {h α (h ) 1/2 log(1/h )}, so that the first term o the right had side of (A.5) is O P {h 4α (h ) 1 log(1/h )}. Sice K K 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt O P (h 2α ) ϕ ε (t/h ) ϕ ε (t/h ) ϕ K (t) dt we have that = O P (h 2α m 1/2 ), K 2 K 2 K K K + K = O P (h 3α m 1/2 ), which implies that the last term o the right had side o (A.5) is O P (h 3α m 1/2 ). To boud the secod term, observe first that, sice E[Y W = w]f W (w) = ((gf X ) f ε ) (w) is bouded (i absolute value) by gf X, Hece, 1 E[{Y g( )}K(( 2 W )/h )] I h ( gf X + g I f W ) K 2 (w)dw h 2α+1. {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )] }{{} I I + 1 =O(h 2α+1 ) {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )]. I The secod term o the right had side is idetical to 1 {f(y j, W j ) E[f(Y, W )]} F (3).

37 I view of the coverig umber boud for F (3) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is 1/2 E[{F (3) (Y, W )]} 2 ] h 2α 1/2. Therefore, the right had side o (A.5) is { O P h 4α which is o P {h 2α+1 (h ) 1 log(1/h ) + h α (h ) 1/2 log(1/h )(h 2α+1 1 s 2 (x) = (log(1/h )) 1 }. Hece, sice if x I s 2 (x) h 2α+1 1 s 2 (x) {Y j ĝ(x)} 2 K2 ((x W j )/h ) 37 + h 2α 1/2 ) + h 3α m 1/2},, we have {Y j g(x)} 2 K((x 2 W j )/h ) + o P {(log(1/h )) 1 } uiformly i x I. Sice A 2 ( )/s 2 ( ) I = O(h 2α+2β+1 ), it remais to prove that 1 [ 1 s 2 {Y j g( )} 2 K(( 2 W j )/h ) E ( ) s 2 ( ) {Y g( )}2 K(( 2 W ))/h )] I = 1 {f(y j, W j ) E[f(Y, W )]} F (4) is o P {(log(1/h )) 1 }. I view of the coverig umber boud for F (4) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is This completes the proof. 1/2 E[{F (4) (Y, W )]} 2 ] h 1 1/2 = o{(log(1/h )) 1 }. Proof of Theorem 3.2. We divide the proof ito several steps. Step 1. Defie Z ξ (x) = 1 s (x) ξ j [{Y j g(x)}k ((x W j )/h ) 1 ] j =1 {Y j g(x)}k ((x W j )/h ) for x I. We first prove that sup P{ Z ξ I z D } P{ Z G I z} P 0. z To this ed, we shall apply Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ). Let ν(f) ξ = 1 ξ j {f(y j, W j ) 1 j =1 f(y j, W (2) j )}, f F.

38 38 K. KATO AND Y. SASAKI The applyig Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V ξ of which the coditioal distributio give D is idetical to the distributio of G (2) F (= Z G I ), ad such that ν ξ F (2) { V ξ (log ) 9/4 = O P 1/4 h 1/2 + } (log )2 (h ) 1/4 = o P {(log(1/h )) 1/2 }, which shows that there exists a sequece 0 such that { } ν ξ P (2) F V ξ > (log(1/h )) 1/2 P D 0. Sice ν ξ F (2) = Z ξ I, we have P{ Z ξ I z D } P{V ξ z + (log(1/h )) 1/2 D } + o P (1) = P{ Z G I z + (log(1/h )) 1/2 } + o P (1) uiformly i z, ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Z ξ I z D } P{ Z G I z} o P (1) uiformly i z. Step 2. I view of the proof of Step 1, i order to prove the result (3.3), it is eough to prove that Ẑξ Z ξ I = o P {(log(1/h )) 1/2 }. To this ed, defie Z ξ (x) = for x I, ad we first prove that 1 s (x) ξ j {Y j ĝ(x)} K ((x W j )/h ) Z ξ Z ξ I = o P {(log(1/h )) 1/2 }. (A.6) We begi with otig that 1 {Y j g(x)}k ((x W j )/h ) = h { µ (x) E[ µ (x)]} h g(x){ f X(x) E[ f X(x)]} + A (x) = O P {h α+1 (h ) 1/2 log(1/h )} uiformly i x I, so that it suffices to verify that 1 s ( ) ξ j {Y j ĝ( )} K (( W j )/h ) ξ j {Y j g( )} K (( W j )/h ) I

39 is o P {(log(1/h )) 1/2 }. Sice 1/s I h α 1/2, the last term is h α 1/2 { 1/2 ξ j Y j { K (( W j )/h ) K (( W j )/h )} + ĝ g I ξ j K (( W j )/h ) + g I ξ j { K (( W j )/h ) K ((x W j )/h )} I =: h α 1/2 1/2 {I + II + III }. Step 2 i the proof of Theorem 2 i Kato ad Sasaki (2016) shows that h α 1/2 1/2 III = o P {(log(1/h )) 1/2 }. For the secod term II, observe that II ĝ g I ξ j e itw j/h ϕ K (t) ϕ ε (t/h ) dt 1 O P (h α ) ĝ g I ξ j e ity j/h 1 dt = O P {h 2α 1/2 log(1/h )}, so that h α 1/2 1/2 II = O P {h α 1 1/2 log(1/h )} = o P {(log(1/h )) 1/2 }. For the first term I, observe that I ξ j Y j e itw j/h 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt 1 1 ξ j Y j e itw j/h = O P ( 1/2 h 2α m 1/2 ), 2 1 I 1/2 { 1 dt 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) so that h α 1/2 1/2 I = o P {(log(1/h )) 1 }. Hece we have proved (A.6). Note that the result of Step 1 ad the fact that E[ Z G I ] = O( log(1/h )) imply that 2 dt } 1/2 Z ξ I = O P ( log(1/h )), which i tur implies that Z ξ I = O P ( log(1/h )). Hece which leads to (3.3). Ẑξ Z ξ I s ( )/ŝ ( ) 1 I Z ξ I = o P {(log(1/h )) 1/2 }, Step 3. We shall prove the last two assertios of the theorem. Observe that f X (x) { } h (ĝ(x) g(x)) Ẑ(x) = { ŝ (x) f h (ĝ(x) g(x)) s (x) X (x) f X (x)} + ŝ }{{} (x) ŝ (x) 1 Ẑ (x), =:Ẑ (x) 39 I }

40 40 K. KATO AND Y. SASAKI ad the right had side is o P {(log(1/h )) 1/2 } uiformly i x I. To see this, sice ĝ g I = O P {h α (h ) 1/2 log(1/h )} ad f X f X I = O P {h α (h ) 1/2 log(1/h )} (which follows from Corollary 1 i Kato ad Sasaki (2016)), the right had side o the above displayed equatio is O P {h α (h ) 1/2 log(1/h )} O P ( log(1/h )) + o P {(log(1/h )) 1 } O P ( log(1/h )) = o P {(log(1/h )) 1/2 } uiformly i x I. Now, Theorem 3.1 ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yield that sup P{ Ẑ I z} P{ Z G I z} 0. z We are to show that P{ Ẑ I ĉ (1 τ)} 1 τ. From the result (3.3), there exists a sequece 0 such that with probability greater tha 1, sup P{ Ẑξ I z D } P{ Z G I z}, z (A.7) ad let E be the evet that (A.7) holds. Takig 0 more slowly if ecessary, we have that sup z P{ Ẑ I z} P{ Z G I z}. ecall that c G (1 τ) is the (1 τ)-quatile of Z G I, ad observe that o the evet E, P{ Ẑξ I c G (1 τ + )} P{ Z G I c G (1 τ + )} = 1 τ, where the last equality holds sice the distributio fuctio of Z G I is cotiuous (which follows from Lemma A.3). Hece o the evet E, it holds that ĉ (1 τ) c G (1 τ + ), so that P{ Ẑ I ĉ (1 τ)} P{ Ẑ I c G (1 τ + )} + P{ Z G I c G (1 τ + )} + 2 = 1 τ + 3. Likewise, we have P{ Ẑ I ĉ (1 τ)} 1 τ 3, which shows that P{ Ẑ I ĉ (1 τ)} 1 τ ad thus (3.4) holds. Fially, the Borell-Sudakov-Tsirelso iequality (va der Vaart ad Weller, 1996, Lemma A.2.2) yields that c G (1 τ + ) E[ Z G I ] + 2 log(1/(τ )) log(1/h ), which implies that ĉ (1 τ) = O P ( log(1/h )). Furthermore, sup x I ŝ (x) ŝ (x) sup s (x) sup x I x I s (x) = O P(h α+1/2 ). Therefore, the supremum width of the bad Ĉ1 τ is 2 sup x I This completes the proof. ŝ (x) h ĉ (1 τ) = O P {h α (h ) 1/2 log(1/h ) }.

41 A.4. Proof of Theorem 7.1. The proof is completely aalogous to those of Theorems 3.1 ad 3.2, give the facts that g(y, x) = E[1(Y y) X = x] ad the fuctio class {1( y) : y J} is a VC class. Hece we omit the detail for brevity. 41

42 42 K. KATO AND Y. SASAKI Appedix B. Tables for Sectio 5 egressio: g(x) = x σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 1. Simulated uiform coverage probabilities of g(x) = x by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

43 43 egressio: g(x) = x 2 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 2. Simulated uiform coverage probabilities of g(x) = x 2 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

44 44 K. KATO AND Y. SASAKI egressio: g(x) = x 3 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 3. Simulated uiform coverage probabilities of g(x) = x 3 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

45 45 egressio: g(x) = si(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 4. Simulated uiform coverage probabilities of g(x) = si(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

46 46 K. KATO AND Y. SASAKI egressio: g(x) = cos(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 5. Simulated uiform coverage probabilities of g(x) = cos(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5 Appedix C. Figures for Sectio 6

47 47 (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 1. Estimates ad cofidece bads for the oparametric regressio of medical expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the medical expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.

48 48 K. KATO AND Y. SASAKI (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 2. Estimates ad cofidece bads for the oparametric regressio of prescriptio expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the prescriptio expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data Proceedigs 59th ISI World Statistics Cogress, 5-30 August 013, Hog Kog (Sessio STS046) p.09 Kolmogorov-Smirov type Tests for Local Gaussiaity i High-Frequecy Data George Tauche, Duke Uiversity Viktor Todorov,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Detailed proofs of Propositions 3.1 and 3.2

Detailed proofs of Propositions 3.1 and 3.2 Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Math 2784 (or 2794W) University of Connecticut

Math 2784 (or 2794W) University of Connecticut ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

More information

Chapter 7 Isoperimetric problem

Chapter 7 Isoperimetric problem Chapter 7 Isoperimetric problem Recall that the isoperimetric problem (see the itroductio its coectio with ido s proble) is oe of the most classical problem of a shape optimizatio. It ca be formulated

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal Web-based Supplemetary Materials for A Modified Partial Likelihood Score Method for Cox Regressio with Covariate Error Uder the Iteral Validatio Desig by David M. Zucker, Xi Zhou, Xiaomei Liao, Yi Li,

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8) Elemets of Statistical Methods Lots of Data or Large Samples (Ch 8) Fritz Scholz Sprig Quarter 2010 February 26, 2010 x ad X We itroduced the sample mea x as the average of the observed sample values x

More information

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation Metodološki zvezki, Vol. 13, No., 016, 117-130 Approximate Cofidece Iterval for the Reciprocal of a Normal Mea with a Kow Coefficiet of Variatio Wararit Paichkitkosolkul 1 Abstract A approximate cofidece

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation Cofidece Iterval for tadard Deviatio of Normal Distributio with Kow Coefficiets of Variatio uparat Niwitpog Departmet of Applied tatistics, Faculty of Applied ciece Kig Mogkut s Uiversity of Techology

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course

EE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

2.1. Convergence in distribution and characteristic functions.

2.1. Convergence in distribution and characteristic functions. 3 Chapter 2. Cetral Limit Theorem. Cetral limit theorem, or DeMoivre-Laplace Theorem, which also implies the wea law of large umbers, is the most importat theorem i probability theory ad statistics. For

More information

Rank tests and regression rank scores tests in measurement error models

Rank tests and regression rank scores tests in measurement error models Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests

More information

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations

ECE-S352 Introduction to Digital Signal Processing Lecture 3A Direct Solution of Difference Equations ECE-S352 Itroductio to Digital Sigal Processig Lecture 3A Direct Solutio of Differece Equatios Discrete Time Systems Described by Differece Equatios Uit impulse (sample) respose h() of a DT system allows

More information