arxiv: v3 [math.st] 24 Feb 2017

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "arxiv: v3 [math.st] 24 Feb 2017"

Transcription

1 UNIFOM CONFIDENCE BANDS FO NONPAAMETIC EOS-IN-VAIABLES EGESSION KENGO KATO AND YUYA SASAKI arxiv: v3 [math.st] 24 Feb 2017 Abstract. This paper develops a method to costruct uiform cofidece bads for a oparametric regressio fuctio where a predictor variable is subject to a measuremet error. We allow for the distributio of the measuremet error to be ukow, but assume that there is a idepedet sample from the measuremet error distributio. The sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. The availability of a sample from the measuremet error distributio is satisfied if, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed, are available. The proposed cofidece bad builds o the decovolutio kerel estimatio ad a ovel applicatio of the multiplier (or wild) bootstrap method. We establish asymptotic validity of the proposed cofidece bad uder ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric errors-i-variables regressio. We also propose a ovel data-drive method to choose a badwidth, ad coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. Applyig our method to a combiatio of two empirical data sets, we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. 1. Itroductio Cosider the oparametric errors-i-variables (EIV) regressio model classical measuremet error Y = g(x) + U, E[U X, ε] = 0, (1.1) W = X + ε, where each of Y, X, U, W, ad ε is a uivariate radom variable, ad ε is idepedet from X. We observe (Y, W ), but observe either X or ε. Furthermore, we assume that the distributio of ε is ukow. The variable X is a latet predictor variable, while ε is a measuremet error. Of iterest are estimatio of ad iferece o the regressio fuctio g(x) = E[Y X = x]. I Date: First arxiv versio: February 11, This versio: February 27, K. Kato is supported by Grat-i-Aid for Scietific esearch (C) (15K03392) from the JSPS. We would like to thak Tatsushi Oka ad Holger Dette for useful commets ad discussios. 1

2 2 K. KATO AND Y. SASAKI particular, we are iterested i costructig uiform cofidece bads for g. Cofidece bads provide a simple graphical descriptio of the extet to which a oparametric estimator varies at desig poits, thereby quatifyig ucertaities of the oparametric estimator. However, costructio of cofidece bads teds to be challegig, especially for complex oparametric models. 1 Ideed, despite the rich literature o cosistet estimatio of oparametric EIV regressio, the literature o poitwise or uiform cofidece bads for oparametric EIV regressio is limited see below for a literature review likely because of its complexity. Eve poitwise iferece o g uder the assumptio that the measuremet error distributio is kow is cosidered by experts to be difficult. 2 This is because, as discussed i Delaigle et al. (2015): 1) the asymptotic variace of the decovolutio kerel estimator of g is o-trivial to estimate ad so iferece based o limitig distributios is difficult to implemet; ad 2) it is ot straightforward to devise a way to implemet bootstrap for iferece o g due to the uobservability of X i data. With all these challeges recogized i the literature, the preset paper attempts to solve a eve more challegig problem of costructig uiform cofidece bads for the regressio fuctio g without assumig that the measuremet error distributio is kow. To deal with ukow measuremet error distributio, we assume that, i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} from the distributio of (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio where m = m as. (The auxiliary sample {η 1,..., η m } eed ot be idepedet from {(Y 1, W 1 ),..., (Y, W )}.) For example, i atural sciece, measuremet errors are ofte due to measurig devices; i such cases, oe ca obtai prelimiary calibratio measures i the absece of sigal, which produce a sample from the measuremet error distributio; see the itroductio of Comte ad Lacour (2011), for example. Other real data scearios of such additioal data availability that are plausible i ecoomics, social scieces, ad biomedical scieces iclude: the case where validatio data is available for data combiatio; ad the case where repeated measuremets (pael data) o X with errors oe of which is symmetrically distributed are available. These patters of data requiremets are ofte cosidered i the existig literature with measuremet errors that we review below. Uder this setup, we develop a method to costruct cofidece bads for the regressio fuctio g. Our method builds o the decovolutio kerel estimatio (Fa ad Truog, 1993), ad a 1 We refer to Wasserma (2006) ad Gié ad Nickl (2016) as geeral refereces o cofidece bads i oparametric statistical models. 2 Delaigle et al. (2015), who study poitwise cofidece bads for oparametric EIV regressio uder the assumptio that the measuremet error distributio is kow, state that despite their practical importace, to our kowledge cofidece bads i oparametric EIV regressio have largely bee igored so far. We show that the problem is particularly complex, much more so tha i the stadard error-free settig. (Delaigle et al., 2015, p.149)

3 ovel applicatio of the multiplier (or wild) bootstrap method. Our costructio of the multiplier process differs from the stadard approach i the error-free case (cf. Neuma ad Polzehl, 1998), ad is tailored to EIV regressio; see emark 2.1 ahead. Buildig o o-trivial applicatios of the probabilistic techiques developed i Cherozhukov et al. (2014a,b, 2016), we establish asymptotic validity of the proposed cofidece bad, i.e., the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), Delaigle et al. (2015) that study iferece i decovolutio ad EIV regressio, we focus for a techical reaso o the case where the measuremet error desity is ordiary smooth, i.e., the characteristic fuctio of the measuremet error distributio decays at most polyomially fast i the tail (cf. Fa, 1991a; Fa ad Truog, 1993). I additio to these cotributios, we also propose a ovel data-drive method to choose a badwidth. I the theoretical study, we require to take the badwidth i such a way that it udersmoothes the decovolutio kerel estimate, so that the bias is egligible relative to the variace part. Existig data-drive methods for badwidth selectio typically aim at choosig a badwidth miimizig the MISE, thereby yieldig a o-udersmoothig badwidth (cf. Delaigle ad Hall, 2008). We propose a alterative method for badwidth selectio that aims at yieldig a udersmoothig badwidth. We coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. The simulatio studies show that the proposed cofidece bad, combied with the proposed badwidth selectio rule, works well. Applyig our method to a combiatio of the two data sets, the Natioal Health ad Nutritio Examiatio Survey (NHANES) ad the Pael Survey of Icome Dyamics (PSID), we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal error-free regressors, ad cofidece bads for coditioal distributio fuctios. I order to locate the preset paper i the cotext of the relevat literature, it is useful to first review measuremet error models ad decovolutio. We refer to books by Fuller (1987), Carroll et al. (2006), Meister (2009) ad Horowitz (2009, Chapter 5) ad surveys by Che et al. (2011) ad Scheach (2016) for geeral refereces. The geesis of this literature features the decovolutio kerel desity estimatio with kow error distributios (Carroll ad Hall, 1988; Stefaski ad Carroll, 1990; Fa, 1991a,b), followed by that with ukow error distributios (Diggle ad Hall, 1993; Horowitz ad Markatou, 1996; Neuma, 1997; Efromovich, 1997; Li ad Vuog, 1998; Delaigle et al., 2008; Johaes, 2009; Comte ad Lacour, 2011). Diggle ad Hall (1993); Neuma (1997); Efromovich (1997); Johaes (2009); Comte ad Lacour (2011) assume the availability of a sample from the measuremet error distributio, while Horowitz ad Markatou (1996); Delaigle et al. (2008) assume repeated measuremets (pael data) with symmetrically ad 3

4 4 K. KATO AND Y. SASAKI idetically distributed errors. For repeated measuremets (pael data) without symmetry of error distributios, Li ad Vuog (1998) propose a alterative desity estimator based o Kotlarski s lemma (cf. Kotlarski, 1967; ao, 1992) that does ot require kow error distributio; see also Bohomme ad obi (2010) ad Comte ad Kappus (2015) for further developmets. Methods to costruct cofidece bads i decovolutio are developed by Bissatz et al. (2007); Bissatz ad Holzma (2008); va Es ad Gugushvili (2008); Louici ad Nickl (2011); Schmidt-Hieber et al. (2013) for the case of kow error distributio, ad more recetly by Kato ad Sasaki (2016) for the case of ukow error distributio. Similarly to the desity estimatio, the literature o oparametric EIV regressio estimatio ofte takes the decovolutio kerel approach. Fa ad Truog (1993) propose to substitute the decovolutio kerel i the Nadaraya-Watso estimator also see Fa ad Masry (1992) for poitwise asymptotic ormality, Delaigle ad Meister (2007) for extesios to heteroscedastic measuremet errors, Delaigle et al. (2009) for local polyomial extesios, ad Delaigle et al. (2015) for poitwise iferece. These papers focus o the case of kow error distributio. Delaigle et al. (2008) estimate the error characteristic fuctio usig repeated measuremets o X with symmetrically ad idetically distributed errors, ad substitute the estimated error characteristic fuctio ito the decovolutio kerel. Scheach (2004) also works with cases with repeated measuremets but without assumig symmetry of error distributios, ad proposes a alterative approach to estimate the regressio fuctio based o Kotlarski s lemma. See also Carroll et al. (1999); Scheach et al. (2012); Scheach ad Hu (2013); Hu ad Sasaki (2015). Our method of iferece is based o the decovolutio kerel estimatio. We maily focus o (i) the case where a sample draw from the error distributio is available; (ii) the case where validatio data is available for data combiatio; ad (iii) the case where repeated measuremets with errors oe of which is symmetrically distributed are available. For (ii) data combiatio with validatio data, our model shares similarities albeit differet assumptios to that of the oparametric istrumetal variables (NPIV) regressio, for which Horowitz ad Lee (2012), Che ad Christese (2015) ad Babii (2016) develop methods to costruct cofidece bads as we do for oparametric EIV regressio. We ote the followig two refereces as particularly relevat bechmarks for idetifyig our cotributios. Oe referece is Scheach (2004) that derives poitwise asymptotic ormality for the oparametric EIV regressio estimator differet from ours, uder ukow error distributio. To this existig result, our cotributios are four-fold. First, we provide a method of uiform iferece as opposed to a poitwise oe. Secod, we propose a method of badwidth selectio for valid iferece. Third, while the existig result left aside the issue of variace estimatio ad thus are ot readily applicable i practice, we provide a bootstrap method for ease of practical implemetatio. Fourth, we devise lower-level assumptios which are easier to verify with cocrete examples of distributio ad coditioal momet fuctios. The other referece

5 5 is Delaigle et al. (2015) that suggests a method of poitwise iferece via bootstrap for oparametric EIV regressio with kow error distributio. To this existig result, our cotributios are three-fold. First, our method allows for ukow error distributio. Secod, we provide a method of uiform iferece as opposed to a poitwise oe. Third, we provide formal theories to support the asymptotic validity of our bootstrap method. Delaigle et al. (2015) metio how to modify their methodology to the case where the measuremet error distributio is ukow, ad to costructio of uiform cofidece bads. However, their theoretical results do ot formally cover those cases. Fially, Birke et al. (2010) ad Proksh et al. (2015) obtai cofidece bads for iverse regressio with fixed equidistat desigs (the fixed equidistat desig assumptio is substatial i their setups ad aalyses); the iverse regressio is related to but differet from our EIV regressio (1.1), ad our setup does ot allow fixed equidistat desigs because of measuremet errors. The methodologies ad the proof strategies are also differet; for example both of those papers rely o Gumbel approximatios for validity of the cofidece bads, which we do ot. Importatly, to the best of our kowledge, oe of the existig results covers uiform cofidece bads for EIV regressio (1.1), eve uder the simpler settig that the measuremet error distributio is kow. The preset paper fills this importat void. The rest of the paper is orgaized as follows. I Sectio 2, we iformally preset our methodology to costruct uiform cofidece bads for g. I Sectio 3, we preset asymptotic validity of the proposed cofidece bad uder suitable regularity coditios. I Sectio 4, we propose a practical method to choose the badwidth. I Sectio 5, we coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. I Sectio 6, we apply the proposed method to a combiatio of two empirical data sets. I Sectio 7, we discuss extesios of our results to specificatio testig of the coditioal mea fuctio, cases with additioal regressors without measuremet errors, ad costructio of cofidece bads for the coditioal distributio fuctio. Sectio 8 cocludes. All the proofs are deferred to Appedix Notatios. For a o-empty set T ad a (complex-valued) fuctio f o T, we use the otatio f T = sup t T f(t). Let l (T ) deote the Baach space of all bouded real-valued fuctios o T with orm T. The Fourier trasform of a itegrable fuctio f o is defied by ϕ f (t) = e itx f(x)dx, t, where i = 1 deotes the imagiary uit throughout the paper. We refer to Follad (1999) as a basic referece o Fourier aalysis. For ay positive sequeces a ad b, we write a b if a /b is bouded ad bouded away from zero. For ay a, b, let a b = mi{a, b} ad a b = max{a, b}. For a, b > 0, we use the shorthad otatio [a ± b] = [a b, a + b]. Let = d deote the equality i distributio.

6 6 K. KATO AND Y. SASAKI 2. Methodology I this sectio, we iformally preset our methodology to costruct cofidece bads for g. The formal aalysis of our cofidece bads will be carried out i the ext sectio. We will also discuss some examples of situatios where a auxiliary sample from the measuremet error distributio is available Decovolutio kerel estimatio. We first itroduce a decovolutio kerel method to estimate f X ad g uder the assumptio that the distributio of ε is kow. Let {(Y 1, W 1 ),..., (Y, W )} be a idepedet sample from the distributio of (Y, W ). I this paper, we assume that the desities of X ad ε exist ad are deoted by f X ad f ε, respectively. Let ϕ W, ϕ X, ad ϕ ε deote the characteristic fuctios of W, X, ad ε, respectively. By the idepedece betwee X ad ε, the desity of W exists ad is give by the covolutio of the desities of X ad ε, amely, f W (w) = (f X f ε )(w) = f X (w x)f ε (x)dx, w, where deotes the covolutio. This i tur implies that the characteristic fuctio of W is idetical to the product of those of X ad ε, amely, ϕ W (t) = ϕ X (t)ϕ ε (t), t. Provided that ϕ ε is o-vaishig o ad ϕ X is itegrable o with respect to the Lebesgue measure (we hereafter omit with respect to the Lebesgue measure ), the Fourier iversio formula yields that f X (x) = 1 2π e itx ϕ X (t)dt = 1 2π e itx ϕ W (t) dt, x. (2.1) ϕ ε (t) The expressio (2.1) leads to a method to estimate f X. However, simply replacig ϕ W by the empirical characteristic fuctio of W, amely, ϕ W (t) = 1 e itw j, t does ot work. Specifically, the fuctio t e itx ϕ W (t)/ϕ ε (t) is ot itegrable o because ϕ ε (t) 0 as t by the iema-lebesgue lemma while ϕ W is the characteristic fuctio of the discrete distributio (i.e., the empirical distributio) ad lim sup t ϕ ε (t) = 1 (cf. Sato, 1999, Propositio 27.28). A stadard approach to dealig with this problem is to use a kerel fuctio to restrict the itegral regio i (2.1) to a compact iterval. Let K : be a kerel fuctio such that K is itegrable o, K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1] (i.e., ϕ K (t) = 0 for all t > 1). Whe f ε is kow, the decovolutio kerel

7 desity estimator of f X is give by f X(x) = 1 2π e itx ϕ W (t) ϕ K(th ) dt. ϕ ε (t) This estimator was first cosidered by Carroll ad Hall (1988) ad Stefaski ad Carroll (1990). ates of covergece ad poitwise asymptotic ormality of f X are studied i Fa (1991a,b), amog others. Alteratively, by a chage of variables, we may rewrite f X as f X(x) = 1 K ((x W j )/h ), (2.2) h where the fuctio K, called the decovolutio kerel, is defied by K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that K is real-valued sice K (x) = 1 2π e itx ϕ K(t) ϕ ε (t/h ) dt = 1 2π e itx ϕ K( t) ϕ ε ( t/h ) dt = K (x), where z deotes the complex cojugate of a complex umber z. The secod expressio (2.2) resembles a stadard kerel desity estimator without measuremet errors. Aalogously, Fa ad Truog (1993) propose to estimate the regressio fuctio g(x) by ĝ (x) = µ (x)/ f X (x), where µ (x) = 1 2π e itx ( 1 Y je itw j ) ϕk (th ) dt = 1 ϕ ε (t) h Y j K ((x W j )/h ). To uderstad the ratioal behid this estimator, observe that E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = E[g(X)e it(x+ε) ] = E[g(X)e itx ]ϕ ε (t), ad E[g(X)e itx ] is the Fourier trasform of gf X, i.e., E[g(X)e itx ] = ϕ gfx (t). Hece ϕ gfx (t) = E[Y e itw ]/ϕ ε (t), ad provided that ϕ gfx o, the Fourier iversio formula yields that g(x)f X (x) = 1 2π 7 is itegrable e itx E[Y eitw ] dt. (2.3) ϕ ε (t) It is worth poitig out that estimatio of f X ad gf X correspods to solvig certai Fredholm itegral equatios of the first kid, ad therefore estimatio of f X ad gf X (or g) is a statistical illposed iverse problem. I fact, f X ad gf X satisfy f X f ε = f W ad (gf X ) f ε = E[Y W = ]f W ; these are Fredholm itegral equatios of the first kid where the right had side fuctios are directly estimable. 3 ates of covergece ad poitwise asymptotic ormality of ĝ are studied by Fa ad Truog (1993); Fa ad Masry (1992), amog others. The discussio so far has presumed that the distributio of ε is kow. However, i may applicatios, the distributio of ε is ukow, ad hece the estimators f X ad ĝ are ifeasible. 3 See, for example, Che (2007), Carrasco et al. (2007), Cavalier (2008), ad Horowitz (2009) for overview of statistical ill-posed iverse problems.

8 8 K. KATO AND Y. SASAKI I the preset paper, we assume that there is a idepedet sample {η 1,..., η m } from the distributio of ε: where m = m as. η 1,..., η m f ε i.i.d., We do ot assume that η 1,..., η m are idepedet from {(Y 1, W 1 ),..., (Y, W )}. I Sectio 2.3, we will discuss examples where such observatios from the measuremet error distributio are available. Give {η 1,..., η m }, we may estimate ϕ ε by the empirical characteristic fuctio, amely, ϕ ε (t) = 1 m m e itη j, ad estimate the decovolutio kerel K by the plug-i method: K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that uder the regularity coditios stated below, if t h 1 ϕ ε(t) > 0 with probability approachig oe, so that K is well-defied with probability approachig oe. Note also that K is real-valued. Now, we estimate g(x) by ĝ(x) = µ(x)/ f X (x), where µ(x) = 1 h Y j K ((x W j )/h ) ad fx (x) = 1 h K ((x W j )/h ). Desity estimators of the form f X are studied i Diggle ad Hall (1993), Neuma (1997), ad Efromovich (1997), amog others, ad oparametric regressio estimators of the form ĝ are studied i Delaigle et al. (2008), amog others Costructio of cofidece bads. We ow describe our method to costruct cofidece bads for g based o the estimator ĝ. Uder the regularity coditios stated below, we will show that ĝ(x) g(x) ca be approximated by 1 [{Y j g(x)}k ((x W j )/h ) A (x)] f X (x)h uiformly i x I, where I is a compact iterval i o which f X is bouded away from zero, ad A (x) = E[{Y g(x)}k ((x W )/h )]. Let ad cosider the process Z (x) = s 2 (x) = Var ({Y g(x)}k ((x W )/h )), 1 s (x) [{Y j g(x)}k ((x W j )/h )) A (x)], x I, where s (x) = s 2 (x). Note that uder the regularity coditios stated below, if x I s (x) > 0 for sufficietly large, so that Z is well-defied. Furthermore, we will show that there exists a

9 tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, P{ Z I z} P{ Z G I z} 0. sup z ecall that Z I = sup x I Z(x). This i tur yields that { } sup P Ẑ I z P { Z G I z } 0, z where {Ẑ(x) : x I} is a process defied by Therefore, if we deote by Ẑ (x) = f X(x) h (ĝ(x) g(x)), x I. (2.4) s (x) c G (1 τ) = (1 τ)-quatile of Z G I for τ (0, 1), the a bad of the form [ ] Ĉ1 τ s (x) (x) = ĝ(x) ± f X (x) c G (1 τ), x I h will cotai g(x), x I with probability at least 1 τ + o(1) as. I fact, it holds that { } { } P g(x) Ĉ 1 τ (x) x I = P Ẑ I c G (1 τ) = P { Z G I c G (1 τ) } + o(1) 1 τ + o(1). I practice, f X (x), s 2 (x), ad c G (1 τ) are all ukow, ad we have to estimate them. We estimate f X (x) ad s 2 (x) by f X (x) ad ŝ 2 (x) = 1 {Y j ĝ(x)} 2 K2 ((x W j )/h ), respectively. Note that (E[A (x)]) 2 is egligible relative to s 2 (x) so that we have igored (E[A (x)]) 2 i estimatio of s 2 (x). Note also that (Y j ĝ(x)) K ((x W j )/h ) = 0. Next, we estimate the quatile c G (1 τ) by the Gaussia multiplier bootstrap. Geerate ξ 1,..., ξ N(0, 1) i.i.d., idepedetly of the data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m }, ad cosider the multiplier process Ẑ ξ (x) = 1 ŝ (x) ξ j {Y j ĝ(x)} K ((x W j )/h ), (2.5) where ŝ (x) = ŝ 2 (x). Note that uder the regularity coditios stated below, if x I ŝ (x) > 0 with probability approachig oe. Coditioally o the data D, Ẑξ is a Gaussia process with mea zero ad covariace fuctio (presumably) close to that of Z. Ideed, for f,x (y, w) = 9

10 10 K. KATO AND Y. SASAKI {y g(x)}k ((x w)/h )/s (x) ad f,x (y, w) = {y ĝ(x)} K ((x w)/h )/ŝ (x), the covariace fuctio of Ẑξ coditioally o D is E[Ẑξ (x)ẑξ (x ) D ] = 1 f,x (Y j, W j ) f,x (Y j, W j ) for x, x I, which estimates the covariace fuctio of Z G give by E[Z G (x)z G (x )] = E[f,x (Y, W )f,x (Y, W )] E[f,x (Y, W )]E[f,x (Y, W )] for x, x I. Hece, we estimate c G (1 τ) by ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ I give D, which ca be computed via simulatios. Now, the resultig cofidece bad is defied by [ ] ŝ (x) Ĉ 1 τ (x) = ĝ(x) ± f X (x) ĉ (1 τ), x I. (2.6) h Note that, except for the choice of the badwidth, this cofidece bad is completely data-drive. We will discuss practical choice of the badwidth i Sectio 4. emark 2.1 (Novelty of our costructio of the multiplier process). I the error-free case, amely whe we ca observe (Y 1, X 1 ),..., (Y, X ), the deviatio of a stadard kerel regressio estimator ǧ with kerel K from the true regressio fuctio g is uiformly approximated as {f X (x)h } 1 U jk((x X j )/h ) uder suitable regularity coditios. So, to costruct cofidece bads for g via the multiplier bootstrap method, oe would costruct a multiplier stochastic process of the form 1 x σ (x) ξ j U j K((x X j )/h ) with σ (x) = Var(UK((x X)/h )), (2.7) ad the compute the coditioal (1 τ)-quatile of the supremum i absolute value of the multiplier process. I practice, we replace U j ad σ (x) by suitable estimators; for example, a atural estimator of U j would be Ûj = Y j ǧ(x j ). See, for example, Neuma ad Polzehl (1998); see also Sectio 4.3 i Cherozhukov et al. (2013) for applicatios of the multiplier bootstrap method to a differet but related problem of iferece i itersectio boud models usig kerel methods. I the measuremet error case, give a cosmetic similarity betwee the decovolutio kerel estimatio ad the error-free kerel estimatio, oe might be tempted to modify the multiplier process (2.7) by just replacig σ (x) ad K((x X j )/h ) with Var(UK ((x W )/h )) ad K ((x W j )/h ), respectively, but this will ot result i a valid cofidece bad eve if U 1,..., U were assumed to be kow. The reaso is that, i cotrast to the error-free case, approximatio to ĝ(x) g(x) by {f X (x)h } 1 U jk ((x W j )/h )) is icorrect, which highlights oe

11 distictive feature of oparametric EIV regressio. Hece, i the preset paper, we develop a ovel costructio of the multiplier process (2.5) tailored to oparametric EIV regressio Examples. I this sectio, we preset a couple of examples where a auxiliary sample from the measuremet error distributio are available. Example 2.1 (epeated measuremets or pael data, Carroll et al. (2006), p.298). Suppose that we observe repeated measuremets or pael data o X with measuremet errors: W (1) = X + ε (1), W (2) = X + ε (2) where X ad (ε (1), ε (2) ) are idepedet, ad the coditioal distributio of ε (2) give ε (1) is symmetric. The distributio of ε (1) eed ot be symmetric (i particular, the distributios of ε (1) ad ε (2) may be differet), ad idepedece betwee ε (1) ad ε (2) is ot ecessary. If we defie W = (X (1) + X (2) )/2, ε = (ε (1) + ε (2) )/2, ad η = (W (1) W (2) )/2 = (ε (1) ε (2) )/2, the we have that W = X + ε, ε = d η, where η is observable. For this pael data setup, Scheach (2004) proposes a alterative estimator of g based o Kotlarski s lemma which does ot require the symmetry assumptio. The form of Scheach s estimator is more complex tha ours, ad to the best of our kowledge, there is o existig result o asymptotically valid uiform cofidece bads for Scheach s estimator. It is worth otig that while Scheach s approach ca drop the symmetry assumptio, it requires aother techical assumptio that the characteristic fuctio ϕ X (t) = E[e itx ] of X does ot vaish o the etire real lie. Both Scheach (2004) ad we (ad i fact most of papers o decovolutio ad EIV regressio) assume that the characteristic fuctios of the error variables do ot vaish o, but our approach does allow ϕ X to take zeros. The assumptio that ϕ X does ot vaish o is ot iocuous; it is o-trivial to fid desities that are compactly supported ad have o-vaishig characteristic fuctios (though these properties are ot mutually exclusive; see, e.g., Scheach (2016), Footote 4), ad the assumptio excludes desities covolved with distributios whose characteristic fuctios take zeros, ad so o. 4 So, we believe that Scheach s approach ad ours are complemetary to each other. Example 2.2 (Data combiatio 5 ). Suppose that we have access to data o (Y, W ) ad (W, X), separately, but do ot have access to data o (Y, X). This case is ofte faced by empirical 4 For example, covolutios of k uiform desities o [a, b] are piecewise polyomials with degrees k 1, ad covex combiatios of such piecewise polyomials form a rich family of desities, but their characteristic fuctios take zeros. 5 We thak Tatsushi Oka for poitig out this example. 11

12 12 K. KATO AND Y. SASAKI researchers, ad various techiques are proposed to combie the two separate samples see a survey by idder ad Moffitt (2007). To fix ideas, cosider the demad model Y = g(x) + U, where Y deotes the quatity purchased of a product ad X deotes the logarithm of its price. Marketig scietists ad ecoomists ofte use Nielse Homesca data for quatities ad prices to aalyze this demad model, but the home-scaed prices i this data are subject to imputatio errors ε = W X. To overcome this issue, Eiav et al. (2010) collect data o (W, X) from a large grocery retailer by matchig trasactio prices X that were recorded by the retailer (at the store) to the prices W recorded by the Homesca paelists. Together with Nielse Homesca data o (Y, W ), Eiav et al. suggest to combie the two separate data sets to aalyze the demad model. Specifically, we ca costruct a sample {Y 1,..., Y, W 1,..., W, η 1,..., η m } from the two separate data o (Y, W ) ad (W, X). I the literature, validatio data are used as a way to relax the classical measuremet error assumptio that X ad ε are idepedet; see, for example, Che et al. (2005). While they allow for o-classical measuremet errors, Che et al. (2005) focus o the case where the parameter of iterest is fiite dimesioal. It is worth otig that, whe validatio data o (X, W ) are available, the problem of estimatio of g ca be cosidered as a oparametric istrumetal variable (NPIV) problem treatig X as a edogeous variable ad W as a istrumetal variable (see, for example, Newey ad Powell, 2003; Hall ad Horowitz, 2005; Bludell et al., 2007; Che ad eiss, 2011; Horowitz, 2011, for NPIV models). I fact, observe that E[Y W ] = E[g(X) W ]. For NPIV models, Horowitz ad Lee (2012) ad the more recet paper by Che ad Christese (2015) develop methods to costruct cofidece bads for the structural fuctio usig series methods, although these papers do ot formally cosider cases where samples o (Y, W ) ad (X, W ) are differet. 6 However, we would like to poit out that there are differece i uderlyig assumptios betwee series estimatio of NPIV models ad decovolutio kerel estimatio i EIV regressio. For example, i series estimatio of NPIV models, it is ofte assumed that the distributio of W is compactly supported ad the desity of W is bouded away from zero o its support (cf. Bludell et al., 2007; Che ad Christese, 2015). O the other had, i EIV regressio, it is commoly assumed that the characteristic fuctio of the measuremet error ε is o-vaishig o (which leads to idetificatio of the fuctio g via (2.3)), ad i may cases the measuremet error ε the has ubouded support, which i tur implies that W has ubouded support. Further, while both NPIV ad EIV regressios are statistical ill-posed iverse problems, the ways i which the ill-posedess is defied are differet; i series estimatio of NPIV models, the ill-posedess is defied for give basis fuctios, while i EIV regressio, the ill-posedess is defied via how 6 Babii (2016) also develop methods to costruct cofidece bads for Tikhoov regularized estimators i NPIV models, but his cofidece bads are asymptotically coservative i the sese that the coverage probabilities are i geeral strictly larger tha the omial level eve asymptotically.

13 fast the characteristic fuctio of the measuremet error distributio decays. Hece we believe that our iferece results cover differet situatios tha those developed i the NPIV literature. 3. Mai results I this sectio, we study asymptotic validity of the proposed cofidece bad (2.6). To this ed, we make the followig assumptio. For ay give costats β, B > 0, let Σ(β, B) deote a class of fuctios defied by { Σ(β, B) = f : : f is k-times differetiable, } f (k) (x) f (k) (y) B x y β k, x, y, where k is the iteger such that k < β k + 1, ad f (k) deotes the k-derivative of f (f (0) = f). Let I be a compact iterval i. Assumptio 3.1. We assume the followig coditios. (i) E[Y 4 ] <, the fuctio w E[Y 2 W = w]f W (w) is bouded ad cotiuous, ad for each l = 1, 2, the fuctio w E[ Y 2+l W = w]f W (w) is bouded. (ii) The fuctios ϕ X (t) = E[e itx ] ad ψ X (t) = E[g(X)e itx ] for t are itegrable o. (iii) The measuremet error ε has fiite mea, E[ ε ] <, ad its characteristic fuctio, ϕ ε (t) = E[e itε ], t, does ot vaish o. Furthermore, there exist costats C 1 > 1 ad α > 0 such that C 1 1 t α ϕ ε (t) C 1 t α, ϕ ε(t) C 1 t α 1, t 1. (iv) The fuctios f X ad gf X belog to Σ(β, B) for some β > 1/2 ad B > 0. Let k deote the iteger such that k < β k + 1. (v) Let K be a real-valued itegrable fuctio (kerel) o, ot ecessarily o-egative, such that K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1]. Furthermore, ϕ K is (k + 3)-times cotiuously differetiable with ϕ (l) K (0) = 0 for l = 1,..., k. (vi) For all x I, f X (x) > 0 ad E[{Y g(x)} 2 W = x]f W (x) > 0. (log(1/h )) 2 ( m)h 2α+2 0, h log(1/h ) m 13 0, ad h α+β h log(1/h ) 0. (3.1) Coditio (i) is a momet coditio o Y, which we believe is ot restrictive. Note that, for each l = 0, 1, 2, if E[ Y 2+l X, ε] = E[ Y 2+l X], the by comparig the Fourier trasforms of both sides, we arrive at the idetity E[ Y 2+l W = w]f W (w) = ((Υ l f X ) f ε ) (w), where Υ l (x) = E[ Y 2+l X = x], ad the right had side is bouded ad cotiuous if Υ l f X is bouded (which allows Υ l to be ubouded globally). For Coditio (ii), we first ote that ψ X is the Fourier trasform of gf X (which is itegrable by E[ Y ] < ). Coditio (ii) implies that f X

14 14 K. KATO AND Y. SASAKI ad gf X are (cotiuous ad) bouded, which i tur implies that f W (w) = f X(w x)f ε (x)dx is bouded ad cotiuous. Coditio (ii) is satisfied if, for example, f X ad gf X are twice cotiuously differetiable with itegrable derivatives up to the secod order; i fact, uder such coditios, ϕ X (t) = o( t 2 ) ad ψ X (t) = o( t 2 ) as t. However, differetiability of f X ad gf X is ot strictly ecessary for Coditio (ii) to hold; for example, a Laplace desity is ot differetiable but its Fourier trasform is itegrable. Coditio (iii) is cocered with the characteristic fuctio of the measuremet error. Note that fiiteess of the first momet of ε esures that ϕ ε is cotiuously differetiable. I the preset paper, as i Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015), we assume that the measuremet error desity is ordiary smooth, amely, ϕ ε (t) decays at most polyomially fast as t (cf. Fa, 1991a). Iformally, the smoother f ε is, the faster ϕ ε (t) decays as t, so Coditio (iii) restricts smoothess of f ε. Laplace ad Gamma distributios, together with their covolutios, (suitable) mixtures, ad symmetrizatios 7, are typical examples of distributios satisfyig Coditio (iii), but ormal ad Cauchy distributios do ot satisfy Coditio (iii). Normal ad Cauchy desities are examples of super-smooth desities, i.e., their characteristic fuctios decay expoetially fast as t. 8 Coditio (iv) is cocered with smoothess of the fuctios f X ad g. Coditio (v) is about a kerel fuctio. By chages of variables, Coditio (iv) esures that x k+1 K(x) dx < ad xl K(x)dx = i l ϕ (l) K (0) = 0 for l = 1,..., k, that is, K is a (k + 1)-th order kerel (but we allow for the possibility that xk+1 K(x)dx = 0). 9 Coditio (vi) esures that if x I f X (x) > 0 (sice f X is cotiuous) ad if x I E[{Y g(x)} 2 W = x]f W (x) > 0 (see the proof of Lemma A.4-(ii)). Note that sice gf X is bouded, we have that g I gf X I / if x I f X(x) <. It is worth metioig that uder these coditios, we have that s 2 (x) = Var({Y g(x)}k ((x W )/h )) h 2α+1 uiformly i x I (see Lemma A.4), ad the right had side is larger by factor h 2α tha the correspodig term i the error-free case (recall that i stadard kerel regressio without measuremet errors, the variace of UK((x X)/h ) is h ). This results i slower rates of covergece of kerel regressio estimators i presece of measuremet errors tha those i the error-free case, ad the value of α is a key parameter that cotrols the difficulty of estimatig g, 7 ecall that if a radom variable η has characteristic fuctio ϕη, the η η for a idepedet copy η of η has characteristic fuctio ϕ η 2. 8 Covolutios of ordiary smooth ad super-smooth desities are super-smooth, but mixtures of ordiary smooth ad super-smooth desities are ordiary smooth. 9 I the simulatio studies, we will use a flap top kerel (McMurry ad Politis, 2004), which is a ifiite order kerel.

15 amely, the larger the value of α is, the more difficult estimatio of g will be. I other words, the value of α quatifies the degree of ill-posedess of estimatio of g. Coditio (vii) restricts the badwidth h ad the sample size m from the measuremet error distributio. The secod coditio i (3.1) allows m to be of smaller order tha, which i particular covers the pael data setup discussed i Example 2.1. The last coditio i (3.1) meas that we are choosig udersmoothig badwidths, that is, choosig badwidths that are of smaller order tha optimal rates for estimatio of g. Ispectio of the proof of Theorem 3.1 shows that without the last coditio i (3.1), we have that ĝ g I = O P {h α (h ) 1/2 log(1/h )} + O(h β ), where the O(h β ) term comes from the determiistic bias. So, choosig h (/ log ) 1/(2α+2β+1) optimizes the rate o the right had side, ad the resultig rate of covergece of ĝ g I is O P {(/ log ) β/(2α+2β+1) }. The last coditio i (3.1) requires to choose h of smaller order tha (/ log ) 1/(2α+2β+1) (by log factors), so that the variace term domiates the bias term. We will later discuss the problem of bias after presetig the theorems (see emark 3.3). For Coditio (vii) to be o-void, we require β > 1/2. We first state a theorem that establishes that, uder Assumptio 3.1, the distributio of Ẑ I = sup x I Ẑ(x), where {Ẑ(x) : x I} is defied i (2.4), ca be approximated by that of the supremum of a certai Gaussia process, which is a buildig block for provig validity of the proposed cofidece bad. ecall that a Gaussia process {Z(x) : x I} idexed by I is a tight radom variable i l (I) if ad oly if I is totally bouded for the itrisic pseudo-metric ρ 2 (x, y) = E[{Z(x) Z(y)} 2 ] for x, y I, ad Z has sample paths almost surely uiformly ρ 2 -cotiuous; see va der Vaart ad Weller (1996, p.41). Theorem 3.1 (Gaussia approximatio). Uder Assumptio 3.1, for each sufficietly large, there exists a tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, { } sup P Ẑ I z P { Z G I z } 0. (3.2) z Theorem 3.1 derives a itermediate Gaussia approximatio to the process Ẑ, i the sese that the approximatig Gaussia process Z G depeds o the sample size. It could be possible to further show that, if I is ot sigleto, uder additioal coditios, for some sequeces a > 0 ad b, a ( ẐG I b ) coverges i distributio to a Gumbel distributio. However, while it is mathematically itriguig, we avoid to use the Gumbel approximatio, sice 1) the Gumbel approximatio is slow ad the coverage error of the resultig cofidece bad is of order 1/ log (see Hall, 1991), ad 2) derivig the Gumbel approximatio would require additioal restrictive coditios o the measuremet error distributio. For example, i a problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) derive a 15

16 16 K. KATO AND Y. SASAKI Gumbel approximatio to the supremum deviatio of the decovolutio kerel desity estimator, thereby establishig a Smirov-Bickel-oseblatt type theorem (Smirov, 1950; Bickel ad oseblatt, 1973) for the decovolutio kerel desity estimator. But to do so, they require more restrictive coditios o the measuremet error distributio tha those i the preset paper (see their Assumptio 2). The followig theorem shows asymptotic validity of the proposed cofidece bad. Theorem 3.2 (Validity of multiplier bootstrap cofidece bad). Uder Assumptio 3.1, as, { Ẑξ } sup P I z D P { Z G I z } P 0, (3.3) z where Z G is a Gaussia radom variable i l (I) give i Theorem 3.1. Therefore, for the cofidece bad Ĉ1 τ defied i (2.6), we have as, { } P g(x) Ĉ1 τ (x) x I = 1 τ + o(1). (3.4) Fially, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 3.1. Ispectio of the proof shows that the result (3.4) holds eve whe τ = τ 0 as. Furthermore, the supremum width of the bad is O P {h α (h ) 1/2 log(1/h ) log(1/τ )}. emark 3.2. If we take h = v (/ log ) 1/(2α+2β+1) for v (log ) 1, the the supremum width of the bad Ĉ1 τ is (/ log ) β/(2α+2β+1) (log ) α+1/2. emark 3.3 (Bias). For ay oparametric iferece problem, how to deal with the determiistic bias is a delicate ad difficult problem. See Sectio 5.7 i Wasserma (2006) for related discussios. I the preset paper, we employ udersmoothig badwidths so that the bias is egligible relative to the variace part. A alterative approach is to estimate the bias at each poit, ad costruct a bias correct cofidece bad. See, for example, Eubak ad Speckma (1993) ad Xia (1998) for the error-free case. 10 However, i EIV regressio, estimatio of the bias is ot quite attractive for a couple of reasos. First, the bias cosists of higher order derivatives of g ad f X, ad estimatio of these higher order derivatives is difficult, especially i the EIV case. This is because estimatio of g ad f X is a ill-posed iverse problem ad rates of covergece of the derivative estimators of g ad f X are eve slower tha those i the error-free case. Secod, oe of popular kerels used i EIV regressio ad decovolutio is a flap top kerel (McMurry ad Politis, 2004) which is a ifiite order kerel, ad if we use a flap top kerel, the the bias 10 More recet discussios regardig the problem of bias i oparametric iferece problems iclude Hall ad Horowitz (2013), Cherozhukov et al. (2014b), Armstrog ad Kolesár (2014), Caloico et al. (2015), ad Scheach (2015). These paper do ot cover EIV regressio.

17 is ot calculated i a closed form. 11 See emark 1 i Bissatz et al. (2007) for a related issue i the decovolutio case. emark 3.4 (Super-smooth case). I the preset paper, we focus o the case where the measuremet error desity is ordiary smooth, similarly to Bissatz et al. (2007), Schmidt-Hieber et al. (2013), ad Delaigle et al. (2015) that study iferece i decovolutio ad oparametric EIV regressio. If the measuremet error desity is super-smooth, i.e., its characteristic fuctio decays expoetially fast as t, the 1) i view of the poitwise asymptotic ormality result i Fa ad Masry (1992), the asymptotic behavior of the variace fuctio s 2 (x) is much more complex; 2) miimax rates of covergece for estimatio of g uder the sup-orm loss are logarithmically slow (i.e., of the form (log ) c for some costat c > 0), eve whe the measuremet error distributio is assumed to be kow (Fa ad Truog, 1993). These difficulties prevet us from directly extedig our aalysis to the super-smooth case. Hece the super-smooth case is left for future research. The proofs of Theorems 3.1 ad 3.2 build o o-trivial applicatios of the itermediate Gaussia ad multiplier bootstrap approximatio theorems developed i Cherozhukov et al. (2014a,b, 2016). However, we stress that Theorems 3.1 ad 3.2 do ot follow directly from the geeral theorems i Cherozhukov et al. (2014a,b, 2016) ad require substatial work. This is because 1) first of all, how to device a multiplier bootstrap i EIV regressio is ot apparet, ad as discussed i emark 2.1 our costructio of the multiplier process appears to be ovel; 2) the populatio decovolutio kerel K is implicitly defied via the Fourier iversio ad substatially differet from stadard kerels i the error-free case; ad 3) the decovolutio kerel K is i fact ukow ad estimated, so that its estimatio error has to be take ito accout. A alterative stadard techique to derive Gaussia approximatios similar to (3.2) is to apply the Komlós-Major-Tusády (KMT) strog approximatio (Komlós et al., 1975). problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) (ad Schmidt-Hieber et al. (2013)) use the KMT approximatio to derive Gaussia approximatios to the decovolutio kerel desity estimator. However, the KMT approximatio is tailored to empirical processes idexed by uivariate fuctios ad hece is ot applicable to our problem. Alteratively, we ca use io s couplig (see io, 1994), but to apply io s couplig, we would have to assume (at least) that Y is bouded (rather tha fiite fourth momet) ad K has total variatio of order h α (which requires additioal coditios o the measuremet error distributio). By employig the techiques developed i Cherozhukov et al. (2014a,b, 2016), we are able to avoid such restrictive coditios. 17 I a 11 For example, Scheach (2004) ad Bissatz et al. (2007) use flap top kerels i their simulatio studies.

18 18 K. KATO AND Y. SASAKI 4. Badwidth selectio The theory developed i the previous sectio prescribes admissible rates for the badwidth h that require udersmoothig. The literature provides data-drive approaches to badwidth selectio, which typically aim at miimizig the MISE (cf. Delaigle ad Hall, 2008). These datadrive approaches ted to yield o-udersmoothig rates for the badwidth, ad are cotrary to our requiremets. I this light, we propose here a ovel alterative approach to the badwidth selectio. To emphasize the depedece o a arbitrary cadidate badwidth h > 0, write s 2 (x; h) = Var({Y g(x)}k ((x W )/h; h)), A (x; h) = E[{Y g(x)}k ((x W )/h; h)], K (x; h) = 1 e itx ϕ K(t) 2π ϕ ε (t/h) dt. ad Note that A (x) = A (x; h ), s 2 (x) = s (x; h ), ad K (x) = K (x; h ). A optimal choice h (igorig the log factor) balaces the uiform squared bias A 2 ( ; h) I ad the uiform variace s 2 ( ; h)/ I, i.e., h A2 ( ; h) I = h=h h s2 ( ; h)/ I h=h A atural way of udersmoothig is to choose the smallest h > 0 such that c h A2 ( ; h) I h s2 ( ; h)/ I for some c > 1 where c is icreasig i. We will try alterative sequeces {c } =1 i the subsequet simulatio studies to recommed practical choices. I practice, we do ot kow g or the distributio of (Y, X). For g to be used for badwidth selectio, we use a polyomial regressio g uder EIV, e.g., g(x) = g 0 + g 1 x where ) ( ( g0 = g W j 1 W j 1 W 2 j m 1 m η2 j ) 1 ( 1 Y ) j 1 W. jy j A polyomial of degree three will be employed throughout i the simulatio studies. We make a grid 0 < h,1 <... < h,j of cadidate badwidths, ad the choose h,j with the smallest j {2,..., J} such that c ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ),

19 where ŝ 2 (x; h) = 1 Â (x; h) = 1 K (x; h) = 1 2π {Y j g(x)} 2 K2 ((x W j )/h; h) Â2 (x; h), {Y j g(x)} K ((x W j )/h; h), ad e itx ϕ K(t) ϕ ε (t/h) dt Because we use the fiite sample estimates, either ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ) eeds to be mootoe icreasig i the idex j i geeral. As ( Â2 such, we mootoize these differeces ) of the estimates i the followig maer. Let ( ; h,j ) I Â2 ( ; h,j 1 ) I ad ŝ 2,j = ( ŝ 2 ( ; h,j )/ I ŝ 2 ) ( ; h,j 1 )/ I. Â,j = The mootoizatio algorithm executes the followig assigmets i the icreasig order of j: Â,j if Â,j > Â,j+1 Â,j+1 := ad Â,j+1 if Â,j Â,j+1 ŝ 2 ŝ 2,j if ŝ 2,j,j+1 := < ŝ2,j+1. ŝ 2,j+1 if ŝ 2,j ŝ2,j+1 emark 4.1. The above guide to badwidth selectio applies to the case of α > 1/2. We could accommodate the case of α 1/2 if we modify this method by replacig s 2 (x; h), A (x; h), ŝ 2 (x; h) ad Â(x; h) by s 2 (x; h)/h 2, A (x; h)/h, ŝ 2 (x; h)/h 2 ad Â(x; h)/h, respectively. We implemeted simulatio studies uder both of these two alterative methods of badwidth selectio, ad foud that the method described above shows superior performaces i terms of the distace betwee omial ad simulated coverage probabilities for the data geeratig models that we cosider. Therefore, we oly suggest the method which we describe above, ad preset simulatio studies below oly for this versio of badwidth selectio rule. 5. Simulatio studies 5.1. Simulatio Framework. We cosider two data geeratig models, reflectig two commo patters of data availability. For the first model, the data D = {(Y j, W j, η j )} is costructed by Model 1 Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) d W j = X j + ε j ε j = ηj Laplace (0, 2 1/2 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε j, ad η j are mutually idepedet. The characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /2) 1, which is o-vaishig o ad ordiary smooth of order α = 2. The sigal-to-oise ratio is Var(X)/ Var(ε) = σ X. 19 or

20 20 K. KATO AND Y. SASAKI For the secod model, we cosider the followig repeated measuremet or pael data setup. Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) Model 2 W (1) j = X j + ε (1) j ε (1) j Laplace (0, 2 1 ) W (2) j = X j + ε (2) j ε (1) j Laplace (0, 2 1 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε (1) j, ad ε (2) j are mutually idepedet. We observe {(Y j, W (1) j, W (2) j )}. By defiig W j := (W (1) j + W (2) j )/2 ad η j := (W (1) j W (2) j )/2, we obtai the geerated data D = {(Y j, W j, η j )} such that W j = X j + ε j with ε j = (ε 1 +ε 2 )/2 d = η j. For Model 2, the characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /16) 2, which is o-vaishig o ad ordiary smooth with order α = 4. The sigal-to-oise ratio is give by Var(X)/ Var(ε) = 2σ X. Simulatios are ru across five differet specificatios of g, ad alterative values of the sigalto-oise ratio σ X {2, 4}. The five specificatios of g are g(x) = x, g(x) = x 2, g(x) = x 3, g(x) = si(x), ad g(x) = cos(x). We use Mote Carlo simulatios to evaluate the coverage probabilities of our cofidece bads for g o the iterval I = [ σ X, σ X ]. We use the kerel fuctio K defied by its Fourier trasform ϕ K give by 1 if t c { } ϕ K (t) = exp b exp( b/( t c) 2 ) if c < t < 1 ( t 1) 2 0 if 1 t where b = 1 ad c = 0.05 (cf. McMurry ad Politis, 2004; Bissatz et al., 2007). The fuctio ϕ K is ifiitely differetiable with support [ 1, 1], ad its iverse Fourier trasform K is realvalued ad itegrable with K(x)dx = 1. We follow the badwidth selectio rule discussed i Sectio 4. I this simulatio study, we try alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) Simulatio esults. Tables 1, 2, 3, 4, ad 5 show simulatio results for g(x) = x, x 2, x 3, si(x), ad cos(x), respectively. Each table cotais results for each of Model 1 ad Model 2, for each of the three sample sizes = 250, 500, ad 1000, ad for each of σ X = 2.0 ad 4.0 that cotrols the sigal-to-oise ratio. Simulated coverage probabilities are reported for each of the three omial coverage probabilities, 0.800, 0.900, ad I all the cases, simulated coverage probabilities are reasoably close to the desiged omial coverage probabilities for large sample sizes. I particular, the results for polyomial specificatios exhibit a very high coverage accuracy. The high performace for the polyomial specificatios may well be imputed to our method of badwidth selectio which relies o a prelimiary polyomial regressio uder EIV. However, it is otable that the coverage accuracy is reasoably high eve for o-polyomial periodic fuctios like g(x) = si(x) ad g(x) = cos(x).

21 There seems o systematic patter as to which of the alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) 0.5 ted to yield better coverage results. As such, we recommed the itermediate choice c = (/100) 0.3 as a practical guidelie eal data aalysis Accordig to Ceters for Disease Cotrol ad Prevetio (CDC) of the US Departmet of Health ad Huma Services, more tha oe-third (36.5%) of US adults have obesity (defied by body mass idex or BMI > 30) i the period betwee 2011 ad 2014 (Ogde et al., 2015). The estimated aual medical cost of obesity i the Uited States was 147 billio 2008 U.S. dollars, with the medical costs for people who are obese beig $1,429 higher tha those of ormal weight (Fikelstei et al., 2009). While there is a extesive body of literature o cost estimatio of obesity, it is a limitatio that commoly used data sets cotai oly self-reported body measures, ad hece the values of BMI geerated from them are proe to biases (Boud, et al., 2001). More recetly, Cawley ad Meyerhoefer (2012) use the istrumetal variable approach to address this issue i cost estimatio of obesity. I this sectio, we employ our data combiatio approach to treat the self-reportig errors, ad draw cofidece bads for oparametric regressios of medical costs o BMI. We focus o costs measured by medical expeditures. With this said, we ote that there are also idirect costs of obesity which we do ot accout for, e.g., the costs of obesity are kow to be passed o to obese workers with employer-sposored health isurace i the form of lower cash wages ad labor market discrimiatio agaist obese job seekers by isurace-providig employers (Bhattacharya ad Budorf, 2009) see also Cawley (2004). Details of the two data sets which we combie are as follows. The Natioal Health ad Nutritio Examiatio Survey (NHANES) of CDC cotais data of survey resposes, medical examiatio results, ad laboratory test results. The survey resposes iclude demographic characteristics, such as geder ad age. I additio to the demographic characteristics, the survey resposes also cotai self-reported body measures ad self-reported health coditios. Amog the self reported body measures are height i iches ad weight i pouds. These two variables allow us to costruct the BMI i lbs/i 2 as a geerated variable. We covert this uit ito the metric uit (kg/m 2 ). The NHANES also cotais medical examiatio results, icludig cliically measured BMI i kg/m 2. We treat the BMI costructed from the self-reported body measures as W j, ad the cliically measured BMI as X j. From the NHANES as a validatio data set of size m, we ca compute η j = W j X j for each j = 1,..., m. The Pael Survey of Icome Dyamics (PSID) is a logitudial pael survey of America families coducted by the Survey esearch Ceter at the Uiversity of Michiga. This data set cotais a log list of variables icludig demographic characteristics, socio-ecoomic attributes, expeses, ad health coditios, amog others. I particular, the PSID cotais self-reported body measures of the household head, icludig height i iches ad weight i pouds. These

22 22 K. KATO AND Y. SASAKI two variables allow us to costruct the body mass idex (BMI) i lbs/i 2 as a geerated variable. Agai, we covert this uit ito the metric uit (kg/m 2 ). The PSID also cotais medical ad prescriptio expeses. We treat the BMI costructed from the self-reported body measures as W j, ad the medical ad prescriptio expeses as Y j. We ote that the iformatio cotaied i the PSID are mostly at the household level, as opposed to the idividual level, ad thus Y j idicates the total medical ad prescriptio expeses of household j. To focus o the idividual medical ad prescriptio expeses rather tha household expeses, we oly cosider the subsample of the households of sigle me with o depedet family, for which the total medical ad prescriptio expeses of the household equal to the idividual medical ad prescriptio expeses of the household head. Hece, the reported regressio results cocer these selected subpopulatios. Combiig the NHANES of size m ad the PSID of size, we obtai the geerated data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m } to which we ca apply our method i order to draw cofidece bads for the regressio fuctio g of the model Y = g(x) + U with E[U X, ε] = 0. We set I = [15, 35] as the iterval o which we draw cofidece bads. This iterval I has 25 (the WHO cut-off poit for overweight) as the midpoit, ad is cotaied i the covex hull of the empirical support of W. The kerel fuctio ad the badwidth rule carry over form our simulatio studies. The sequece {c } =1 used for badwidth choice is defied by c = (/100) 0.3 followig the recommedatio which we made from our simulatio results. To accout for the differet medical coditios across ages, we categorize the sample ito the followig subsamples: (a) male idividuals aged 20 34, (b) male idividuals aged 35 49, (c) male idividuals aged 50 64, ad (d) male idividuals aged 65 or above. Note that this stratificatio takes ito accout the fact that 64 ad 65 make the cutoff of medicare eligibility, ad hece that group (d) faces differet expediture schedules ad differet ecoomic icetives of health care utilizatio from groups (a) (c) see Card et al. (2008). After deletig observatios with missig fields from the NHANES , we obtai the followig sample sizes of these four subsamples: (a) m = 407, (b) m = 435, (c) m = 407, ad (d) m = 431. After deletig observatios with missig fields from the PSID 2009 for total medical expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 413, (b) = 181, (c) = 180, ad (d) = 64. Similarly, after deletig observatios with missig fields from the PSID 2009 for prescriptio expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 528, (b) = 243, (c) = 247, ad (d) = 106. Note that we use similar survey periods aroud 2009 for both the NHANES ad PSID to remove potetial time effects. Figure 1 displays estimates ad cofidece bads for total medical expeses i 2009 US dollars as the depedet variable. Figure 2 similarly displays estimates ad cofidece bads for prescriptio expeses i 2009 US dollars as the depedet variable. I both figures, the estimates

23 are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads. The four parts of the figure represet (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. We see that the levels of both total medical expeses ad prescriptio expeses ted to icrease i age, as expected. For the groups (a) (b) of youg me, both total medical expeses ad ad prescriptio expeses exhibit little partial correlatio with BMI. For the group (c) of middle aged me, o the other had, the relatios tur ito positive oes. For the group (d) of seior me, total medical expeses ad BMI cotiue to have a positive relatioship, but prescriptio expeses exhibit little partial correlatio with BMI. If we look at the 90% cofidece bad for the group (c) of me aged from 50 to 64, aual average total medical expeses are approximately $5,399 $17,015 if BMI = 20, approximately $7,316 $18,119 if BMI = 25, ad approximately $7,868 $21,934 if BMI = 30. Likewise, aual average prescriptio expeses are approximately $283 $636 if BMI = 20, approximately $372 $761 if BMI = 25, ad approximately $429 $951 if BMI = 30. These cocrete umbers illustrate that cofidece bads are useful to make iterval predictios of icurred average costs, ad this coveiet feature has practical values added to the existig methods which oly allow for reportig estimates with ukow extets of ucertaities. 7. Extesios 7.1. Applicatio to specificatio testig. The results of the preset paper ca be used for specificatio testig of the regressio fuctio g. 23 Specificatio testig i EIV models is importat sice oparametric estimatio of a regressio fuctio has slow rates of covergece, eve slower tha stadard error-free oparametric regressio, while correct specificatio of a parametric model eables us to estimate the regressio fuctio with faster rates, ofte of oder 1/. Suppose that we wat to test whether the regressio fuctio g belogs to a parametric class {g θ : θ Θ} where Θ is a subset of a metric space (i most cases a Euclidea space). Popular specificatios of g iclude liear ad polyomial fuctios. I cases where g is liear or polyomial, it is possible to estimate the coefficiets with -rate uder suitable regularity coditios (Fuller, 1987; Cha ad Mak, 1985; Hausma et al., 1991; Cheg ad Scheeweiss, 1998). Suppose ow that g = g θ for some θ Θ ad θ ca be estimated by θ with a sufficietly fast rate, i.e., g g θ θ I = o P {h α {h log(1/h )} 1/2 }, ad that Assumptio 3.1 is satisfied with g = g θ. The it is ot difficult to see from the proof of Theorem 3.2 that f X (x) h (ĝ(x) g θ(x)) ŝ (x) uiformly i x I, so that = f X (x) h (ĝ(x) g θ (x)) ŝ (x) = f X (x) h (ĝ(x) g θ (x)) ŝ (x) { } P g θ(x) / Ĉ1 τ (x) for some x I τ. + f X (x) h (g θ (x) g θ(x)) ŝ (x) + o P {(log(1/h )) 1/2 },

24 24 K. KATO AND Y. SASAKI Therefore, the test that rejects the hypothesis that g = g θ for some θ Θ if g θ(x) / Ĉ1 τ (x) for some x I is asymptotically of level τ. We summarize the above discussio as a corollary. Corollary 7.1. Suppose that g = g θ for some θ Θ where Θ is a subset of a metric space, ad that Assumptio 3.1 is satisfied with g = g θ. Let θ be ay estimator of θ such that g θ g θ I = o P {h α {h log(1/h )} 1/2 }; the P{g θ(x) / Ĉ1 τ (x) for some x I} τ. emark 7.1 (Literature o specificatio testig i EIV regressio). The literature o specificatio testig for EIV regressio is large. See Zhu et al. (2003), Zhu ad Cui (2005), Hall ad Ma (2007), Sog (2008), Otsu ad Taylor (2016), ad refereces therei. However, oe of those papers cosiders L -based specificatio tests Additioal regressors without measuremet errors. I practical applicatios, we may have additioal regressors Z, possibly vector valued, without measuremet errors. Suppose that we are iterested i estimatio ad makig iferece o g(x, z) = E[Y X = x, Z = z]. We assume that E[Y g(x, Z) X, Z, ε] = 0, ad ε is idepedet from X coditioally o Z. I priciple, the aalysis ca be reduced to the case where there are o additioal regressors by coditioig o Z = z. If Z is discretely distributed with fiitely may mass poits, the g(x, z), where z is a mass poit, ca be estimated by usig oly observatios j for which Z j = z. If Z is cotiuously distributed, the g(x, z) ca be estimated by usig observatios j for which Z j is close to z, which ca be implemeted by usig kerel weights. However, the detailed aalysis of this case is ot preseted here for brevity Cofidece bads for coditioal distributio fuctios. The techiques used to derive cofidece bads for the coditioal mea i EIV regressio ca be exteded to the coditioal distributio fuctio. Suppose ow that we are iterested i costructig cofidece bads for the coditioal distributio fuctio g(y, x) = P(Y y X = x) o a compact rectagle J I where J ad I are compact itervals, ad where we do ot observe X but istead observe W = X + ε with ε (measuremet error) beig idepedet of (Y, X). As before, we assume that i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} o (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio. Sice g(y, x) = E[1(Y y) X = x] where 1( ) deotes the idicator fuctio, we may estimate g(y, x) by ĝ(y, x) = µ(y, x)/ f X (x), where µ(y, x) = 1 h 1(Y j y) K ((x W j )/h ). To costruct a cofidece bad for g(y, x), we apply the methodology developed i Sectio 2 with Y j replaced by 1(Y j y) for each y. Let ŝ 2 (y, x) = 1 {1(Y j y) ĝ(y, x)} 2 K2 ((x W j )/h ),

25 ad geerate idepedet stadard ormal radom variables ξ 1,..., ξ idepedet of the data D. Cosider the multiplier stochastic process Ẑ(y, ξ 1 x) = ŝ (y, x) ξ j {1(Y j y) ĝ(y, x)} K ((x W j )/h ), ad for τ (0, 1), let ĉ (1 τ) = coditioal (1 τ)-quatile of Ẑξ J I give D. The the resultig cofidece bad for g(y, x) o J I is give by [ ] Ĉ 1 τ (y, x) = ĝ(y, x) ± ŝ(y, x) f X (x) ĉ (1 τ), (y, x) J I. h We make the followig assumptio, which is aalogous to Assumptio 3.1. Assumptio 7.1. Let I, J be compact itervals i. (i) The fuctio (y, w) P(Y y W = w)f W (w) is cotiuous i w uiformly i y J. (ii) The characteristic fuctio of X, ϕ X (t) = E[e itx ], t, is itegrable o. Furthermore, sup y J E[g(y, X)eitX ] dt <. (iii) Coditio (iii) i Assumptio 3.1. (iv) The fuctios f X ad g(y, )f X ( ) belog to Σ(β, B) for some β > 1/2 ad B > 0 for all y J. Let k deote the iteger such that k < β k + 1. (v) Coditio (v) i Assumptio 3.1. (vi) For all x I, f X (x) > 0, ad if (y,x) J I E[{1(Y y) g(y, x)} 2 W = x]f W (x) > 0. (vii) Coditio (vii) i Assumptio 3.1. Theorem 7.1. Uder Assumptio 7.1, as, P{g(y, x) Ĉ1 τ (y, x) (y, x) J I} 1 τ. Furthermore, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 7.2. To the best of our kowledge, Theorem 7.1 is also a ew result. 8. Coclusio I this paper, we develop a method to costruct uiform cofidece bads for oparametric EIV regressio fuctio g. We cosider the practically relevat case where the distributio of the measuremet error is ukow. We assume that there is a idepedet sample from the measuremet error distributio, where the sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. Such a sample from the measuremet error distributio is available if there is, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed. We establish asymptotic validity of the proposed cofidece bad for ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric EIV regressio. We also propose a practical 25

26 26 K. KATO AND Y. SASAKI method to choose a udersmoothig badwidth for valid iferece. Simulatio studies verify the fiite sample performace of the proposed cofidece bad. Fially, we discuss extesios of our results to specificatio testig, cases with additioal regressors without measuremet errors, ad cofidece bads for coditioal distributio fuctios.

27 27 Appedix A. Proofs A.1. Techical tools. I this sectio, we collect techical tools that will be used i the proofs of Theorems 3.1 ad 3.2. The proofs rely o moder empirical process theory. For a probability measure Q o a measurable space (S, S) ad a class of measurable fuctios F o S such that F L 2 (Q), let N(F, Q,2, δ) deote the δ-coverig umber for F with respect to the L 2 (Q)- semiorm Q,2. The class F is said to be poitwise measurable if there exists a coutable subclass G F such that for every f F there exists a sequece g m G with g m f poitwise. A fuctio F : S [0, ) is said to be a evelope for F if F (x) sup f F f(x) for all x S. See Sectio 2.1 i va der Vaart ad Weller (1996) for details. Lemma A.1 (A useful maximal iequality). Let X, X 1,..., X be i.i.d. radom variables takig values i a measurable space (S, S), ad let F be a poitwise measurable class of (measurable) real-valued fuctios o S with measurable evelope F. Suppose that there exist costats A e ad V 1 such that sup N(F, Q,2, ε F Q,2 ) (A/δ) V, 0 < δ 1, Q where sup Q is take over all fiitely discrete distributios o S. Furthermore, suppose that 0 < E[F 2 (X)] <, ad let σ 2 > 0 be ay positive costat such that sup f F E[f 2 (X)] σ 2 E[F 2 (X)]. Defie B = E[max 1 j F 2 (X j )]. The E 1 {f(x j ) E[f(X)]} F ( C V σ 2 log where C > 0 is a uiversal costat. A ) ( E[F 2 (X)] + V B log σ A ) E[F 2 (X)], σ Proof. See Corollary 5.1 i Cherozhukov et al. (2014a). Lemma A.2 (A auxiliary maximal iequality). Let ζ 1,..., ζ be radom variables such that E[ ζ j r ] < for all j = 1,..., for some r 1. The [ ] E max ζ j 1/r max (E[ ζ j r ]) 1/r. 1 j 1 j Proof. This iequality is well kow, ad follows from Jese s iequality. Ideed, E[max 1 j ζ j ] (E[max 1 j ζ j r ]) 1/r ( E[ ζ j r ]) 1/r 1/r max 1 j (E[ ζ j r ]) 1/r. The followig ati-cocetratio iequality for the supremum of a Gaussia process will play a crucial role i the proofs of Theorems 3.1 ad 3.2.

28 28 K. KATO AND Y. SASAKI Lemma A.3 (Ati-cocetratio for the supremum of a Gaussia process). Let T be a oempty set, ad let X = (X t : t T ) be a tight Gaussia radom variable i l (T ) with mea zero ad E[Xt 2 ] = 1 for all t T. The for ay h > 0, sup P{ X T x h} 4h(1 + E[ X T ]). x Proof. See Corollary 2.1 i Cherozhukov et al. (2014b); see also Theorem 3 i Cherozhukov et al. (2015). A.2. Proof of Theorem 3.1. I what follows, we always assume Assumptio 3.1. Before provig Theorem 3.1, we first prove some prelimiary lemmas. ecall that A (x) = E[{Y g(x)}k ((x W )/h )] ad s 2 (x) = Var({Y g(x)}k ((x W )/h )). Observe that K = O(h α ) uder our assumptio. I what follows, the otatio sigifies that the left had side is bouded by the right had side up to a positive costat idepedet of ad x. Lemma A.4. The followig bouds hold: (i) A I = O(h β+1 ). (ii) For sufficietly large, if x I s 2 (x) h 2α+1. (iii) For l = 0, 1, 2, we have sup x E[ Y K ((x W )/h ) 2+l ] = O(h (2+l)α+1 ). Proof. (i). Sice E[Y e itw ] = E[{g(X) + U}e it(x+ε) ] = ψ X (t)ϕ ε (t), we have that E[Y K ((x W )/h )] = h 2π = h 2π e itx E[Y e itw ] ϕ K(th ) dt ϕ ε (t) e itx ψ X (t)ϕ K (th )dt. Sice ψ X ( ) ad ϕ K ( h ) are the Fourier trasforms of gf X ad h 1 K( /h ), respectively, the Fourier iversio formula yields that h 2π e itx ( ψ X (t)ϕ K (th )dt = h gfx (h 1 = K( /h )) ) (x) g(w)f X (w)k((x w)/h )dw. Note that the far left ad right had sides are cotiuous i x, ad so the equality holds for all x. Likewise, we have E[K ((x W )/h )] = f X(w)K((x w)/h )dw for all x, so that A (x) = {g(w) g(x)}k((x w)/h )f X (w)dw = h {g(x h w) g(x)}f X (x h w)k(w)dw.

29 29 By the Taylor expasio, for ay x, w, {g(x h w) g(x)}f X (x h w) = k 1 (gf X ) (j) (x) g(x)f (j) X (x) ( h w) j j! + (gf X) (k) (x θh w) g(x)f (k) X (x θh w) ( h w) k, k! for some θ [0, 1]. Sice wj K(w)dw = 0 for j = 1,..., k ad f X, gf X Σ(β, B), we have {g(x h w) g(x)}f X (x h w)k(w)dw k = (gf X ) {g(x (j) (x) g(x)f (j) X h w) g(x)}f X (x h w) (x) ( h w) j K(w)dw j! (1 + g I)Bh β w β K(w) dw. k! This shows that A I = O(h β+1 ). (ii). Sice A (x) = E[{Y g(x)}k ((x W )/h )] = O(h β+1 ) uiformly i x I, it suffices to show that if E[{Y x I g(x)}2 K((x 2 W )/h )] (1 o(1))h 2α+1. Observe that E[Y W = w]f W (w) = ((gf X ) f ε ) (w) (compare the Fourier trasforms of both sides), ad defie V (x, w) = E[{Y g(x)} 2 W = w]f W (w) = (E[Y 2 W = w] + g 2 (x))f W (w) 2g(x) ((gf X ) f ε ) (w). The fuctio (gf X ) f ε is bouded ad cotiuous by boudedess of gf X. Sice E[Y 2 W = ], f W, ad (gf X ) f ε are bouded ad cotiuous o, ad g is bouded ad cotiuous o I, we have that the fuctio (x, w) V (x, w) is bouded ad cotiuous o I. I particular, sice V (x, x) > 0 for all x I uder our assumptio, we have that if x I V (x, x) > 0. Now, observe that E[{Y g(x)} 2 K((x 2 W )/h )] = V (x, w)k((x 2 w)/h )dw = h V (x, x h w)k(w)dw. 2 Furthermore, we have that K(w)dw 2 = 1 ϕ K (t) 2 dt h 2α 2π ϕ ε (t/h ) 2 by Placherel s theorem. Hece, it suffices to show that h2α {V (x, x h w) V (x, x)}k(w)dw 2 0. sup x I (A.1)

30 30 K. KATO AND Y. SASAKI From the proof of Lemma 3 i Kato ad Sasaki (2016), we have that h 2α K 2 (x) mi{1, x 2 }. By the defiitio of V (x, w), for ay ρ > 0, there exists sufficietly small δ > 0 such that V (x, x + w) V (x, x) ρ for all x I wheever w δ. Therefore, sup V (x, x h w) V (x, x) h 2α K(w)dw 2 x I ρ mi{1, w 2 }dw + 2 V I w 2 dw ρ + o(1). w δ/h w >δ/h (iii). Pick ay l = 0, 1, 2. Sice K h α, we have that E[ Y K ((x W )/h ) 2+l ] = h E[ Y 2+l W = x h w] K (w) 2+l f W (x h w)dw h lα+1 V l (x h w)k 2 (w)dw h lα+1 V l K(w)dw 2 h (2+l)α+1, where V l (w) = E[ Y 2+l W = w]f W (w). This completes the proof. Lemma A.5. ϕ ε ϕ ε [ h 1,h 1 ] = O P{m 1/2 log(1/h )}. Proof. See Lemma 4 i Kato ad Sasaki (2016); see also Theorem 4.1 i Neuma ad eiß (2009). Cosider the followig classes of fuctios F (1) = {(y, w) yk ((x w)/h ) : x }, F (2) = F (3) F (4) = { (y, w) 1 s (x) {y g(x)}k ((x w)/h ) : x I = {(y, w) {y g(x)}k((x 2 w)/h ) : x I}, { (y, w) 1 s 2 (x){y g(x)}2 K((x 2 w)/h ) : x I }, }. (A.2) I view of the fact that K h α (idepedet of ) such that K D 1 h α ad if x I s (x) h α+1/2, choose costats D 1, D 2 > 0 ad 1/s I D 2 h α 1/2. Let F (1) (y, w) = D 1 y h α, F (2) (y, w) = D 1 D 2 ( y + g I )/ h, F (3) (y, w) = D 1 ( y + g I )h 2α, F (4) (y, w) = {F (2) (y, w)} 2. Note that F (l) is a evelope fuctio for F (l) for each l = 1,..., 4. Lemma A.6. There exist costats A, v e idepedet of such that sup Q N(F (l), Q,2, δ F (l) Q,2 ) (A/δ) v, 0 < δ 1, (A.3) for all l = 1,..., 4, where sup Q is take over all fiitely discrete distributios o 2.

31 31 Proof. Cosider the followig classes of fuctios K = {w K ((x w)/h ) : x }, K 2 = {f 2 : f K }. Lemma 1 i Kato ad Sasaki (2016) ad Corollary A.1 i Cherozhukov et al. (2014a) yield that there exist costats A 1, v 1 e idepedet of such that sup Q N(K, Q,2, D 1 h α δ) (A 1 /δ) v 1 ad sup Q N(K, 2 Q,2, D1 2h 2α δ) (A 1 /δ) v 1 for all 0 < δ 1. I what follows, we oly prove (A.3) for l = 2; the proofs for the other cases are completely aalogous give the above bouds o the coverig umbers for K ad K. 2 Let H = {y {y g(x)}/s (x) : x I}, ad observe that, sice 1/s I D 2 h α 1/2, there exist costats A 2, v 2 e idepedet of such that sup Q N(H, Q,2, δ H Q,2 ) (A 2 /δ) v 2 for all 0 < δ 1, where H (y) = D 2 ( y + g I )h α 1/2 is a evelope fuctio for H. This ca be verified by a direct calculatio, or observig that H ( {y ay + b : a > 0, b }) is a VC subgraph class with VC idex at most 4 (cf. va der Vaart ad Weller, 1996, Lemma ), ad applyig Theorem i va der Vaart ad Weller (1996). Let H K := {(y, w) f 1 (y)f 2 (w) : f 1 H, f 2 K } F (2), ad ote that H (y)d 1 h α = F (2) (y, w). From Corollary A.1 i Cherozhukov et al. (2014a), there exist costats A 3, v 3 e idepedet of such that sup Q N(H K, Q,2, δ F (2) Q,2 ) (A 3 /δ) v 3 for all 0 < δ 1. Now, the desired result follows from the observatio that N(F (2), Q,2, 2δ) N(H K, Q,2, δ) for all δ > 0. Lemma A.7. We have f X ( ) E[ f X ( )] = O P {h α (h ) 1/2 log(1/h )} ad E[ f X ( )] f X ( ) = O(h β ) = o{h α (h log(1/h )) 1/2 }. Furthermore, µ ( ) E[ µ ( )] = O P {h α (h ) 1/2 log(1/h )}. Proof. The first two results are implicit i the proofs of Corollaries 1 ad 2 i Kato ad Sasaki (2016). To prove the last result, we shall apply Lemma A.1 to the class of fuctios F (1). From Lemma A.4-(iii), we have that sup x E[Y 2 K((x 2 W )/h )] = O(h 2α+1 ). I view of the coverig umber boud for F (1) give i Lemma A.6, we may apply Lemma A.1 to F (1) to coclude that (h )E[ µ ( ) E[ µ ( )] ] = E {f(y j, W j ) E[f(Y, W )]} h α h log(1/h ) + h α F (1) E[ max Y 2 1 j j ] log(1/h ). From Lemma A.2, we have E[max 1 j Yj 2] = O(1/2 ), so that we have (h )E[ µ ( ) E[ µ ( )] ] h α h log(1/h ) + h α 1/4 log(1/h ) h α h log(1/h ), where the secod iequality follows from the first coditio i (3.1). This completes the proof.

32 32 K. KATO AND Y. SASAKI We are ow i positio to prove Theorem 3.1. Proof of Theorem 3.1. We divide the proof ito two steps. Step 1. Let r = h α {h log(1/h )} 1/2. We first prove that ĝ(x) g(x) = 1 1 f X (x) h uiformly i x I. [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) To this ed, we shall show that µ µ = o P (r ). First, observe from Lemma A.5 that if t h 1 ϕ ε (t) if t h 1 ϕ ε (t) O P {m 1/2 log(1/h )} (1 o P (1))h α. Let ψ Y W (t) = E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = ψ X (t)ϕ ε (t), ad let ψ Y W (t) = 1 Y je itw j. Decompose µ(x) µ (x) as µ(x) µ (x) = 1 e itx ψy W (t) ϕ K(th ) dt 1 e itx ψy W (t) ϕ K(th ) dt 2π ϕ ε (t) 2π ϕ ε (t) = 1 e itx ϕ K (th ) ψ Y W (t) ϕ ε (t) 2π {ψ X 0} ψ Y W (t) ϕ ε (t) ψ X(t)dt + 1 e itx ϕ K(th ) ψ Y W (t) ϕ ε(t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) dt 1 e itx ϕ K (th ) ψ Y W (t) 2π {ψ X 0} ψ Y W (t) ψ X(t)dt 1 e itx ϕ K(th ) ψ Y W (t)dt 2π {ψ X =0} ϕ ε (t) = 1 { } {ϕε } ψy e itx W (t) ϕ K (th ) 2π {ψ X 0} ψ Y W (t) 1 (t) ϕ ε (t) 1 ψ X (t)dt + 1 e itx ϕ { } K(th ) ϕε (t) ψ Y W (t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) 1 dt + 1 { } e itx ϕε (t) ϕ K (th ) 2π ϕ ε (t) 1 ψ X (t)dt. Hece the Cauchy-Schwarz iequality yields that µ(x) µ (x) 2 ψ Y W (t) 2 { h 1 } {ψ X 0} [ h 1,h 1 ] ψ Y W (t) 1 ψ X (t) 2 dt ϕ ε (t) h 1 ϕ ε (t) 1 2 dt { } { h 1 } + h 2α ψ Y W (t) 2 dt ϕ ε (t) {ψ X =0} [ h 1,h 1 ] h 1 ϕ ε (t) 1 2 dt h 1 + ϕ ε (t) h 1 ϕ ε (t) 1 2 ψ X (t) dt. (A.4) We shall boud each term o the right had side. Observe that h 1 h 1 ϕ ε (t) ϕ ε (t) 1 2 h 1 dt O P (h 2α ) ϕ ε (t) ϕ ε (t) 2 dt h 1

33 33 ad the itegral o the right had side is O P {(mh ) 1 } sice h 1 h 1 h 1 E[ ϕ ε (t) ϕ ε (t) 2 ]dt m 1 h 1 dt = 2(mh ) 1. Likewise, usig the fact that ψ X is itegrable, we have that the last term o the right had side of (A.4) is O P (h 2α m 1 ). For ay t with ψ X (t) 0, we have E[ ψ Y W (t)/ψ Y W (t) 1 2 ] E[Y 2 ]/{ ψ Y W (t) 2 }, so that E {ψ X 0} [ h 1,h 1 ] ψ Y W (t) ψ Y W (t) 1 2 ψ X (t) 2 dt h 1 1 h 1 Fially, for ay t with ψ X (t) = 0, we have ψ Y W (t) = 0, so that [ ] E ψ Y W (t) 2 dt (h ) 1. {ψ X =0} [ h 1,h 1 ] 1 dt h 2α ϕ ε (t) 2 (h ) 1. Therefore, we have µ µ 2 = O P(h 4α 2 1 m 1 + h 2α m 1 ) = o P (r). 2 From Step 2 i the proof of Theorem 1 of Kato ad Sasaki (2016), it follows that f X f X = o P (r ), which i particular implies that f X f X I f X f X I + f X f X I = o P (1) so that 1/ f X I = O P (1). Furthermore, µ I E[ µ ( )] I + µ ( ) E[ µ ( )] I ψ X(t) dt + o P (1) = O P (1). Therefore, Now, observe that ĝ ĝ I 1/ f X I µ µ I + µ I 1/ f X 1/ f X I o P (r ) + O P (1) f X f X I = o P (r ). ĝ (x) g(x) = 1 f X (x) 1 h {Y j g(x)}k ((x W j )/h ). Sice A I = O(h β+1 ) = o(h r ), we have ĝ (x) g(x) = 1 1 f X (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) h uiformly i x I. Sice uiformly i x I, ad 1 h [{Y j g(x)}k ((x W j )/h ) A (x)] = µ (x) E[ µ (x)] g(x){ f X(x) E[ f X(x)]} = O P {h α (h ) 1/2 log(1/h )} 1/ f X 1/f X I O P (1) f X f X I = O P {h α (h ) 1/2 log(1/h )},

34 34 K. KATO AND Y. SASAKI we coclude that ĝ (x) g(x) = 1 1 f X (x) h [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) uiformly i x I. This leads to the desired result of Step 1. Furthermore, the derivatio so far yields that ĝ g I = O P {h α (h ) 1/2 log(1/h )}. Step 2. By Step 1 together with the fact that if x I s (x) h α+1/2, we have Ẑ (x) = f X(x) h (ĝ(x) g(x)) s (x) 1 = s (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P {(log(1/h )) 1/2 } = Z (x) + o P {(log(1/h )) 1/2 } uiformly i x I. ecall the class of fuctios F (2) process idexed by F (2) : ν (f) = 1 defied i (A.2), ad cosider the empirical {f(y j, W j ) E[f(Y, W )]}, f F (2). We apply Theorem 2.1 i Cherozhukov et al. (2016) to approximate ν (2) F = Z I by the supremum of a Gaussia process. To this ed, we shall verify the coditios i Cherozhukov et al. (2016). First, from the coverig umber boud for F (2) give i Lemma A.6 ad fiiteess of the secod momet of F (2) (Y, W ), there exists a tight Gaussia radom variable G i l (F (2) ) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) }. Exted ν liearly to F (2) ( F (2) ) = {f, f : f F (2) }, ad observe that ν (2) F = sup (2) f F ( F (2) ) ν (f). Note that from Theorem i Gié ad Nickl (2016), G exteds to the liear hull of F (2) i such a way that G has liear sample paths, so that G (2) F = sup (2) f F ( F (2) ) G (f), ad i additio G has uiformly cotiuous paths o the symmetric covex hull of F (2). It is ot difficult to verify that the coverig umber of F (2) ( F (2) ) is at most twice that of F (2). I particular, {G (f) : f F (2) ( F (2) )} is a tight Gaussia radom variable i l (F (2) ( F (2) )) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) ( F (2) )}. Next, observe that E[ Y K ((x W )/h ) 2+l ] h (2+l)α+1 for l = 0, 1, 2 from Lemma A.4-(iii), so that sup f F (2) E[ F (2) (Y, W ) 4 ] h 2 E[ f(y, W ) 2+l ] h l/2 (E[Y 4 ] + g 4 I et al. (2016) to F (2) ( F (2) for l = 0, 1, 2. Furthermore, observe that ) h 2. Therefore, applyig Theorem 2.1 i Cherozhukov ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V havig the same distributio as

35 35 G F (2) such that { } (log ) 5/4 ν (2) F V = OP 1/4 h 1/2 + log (h ) 1/6 Now, for f,x (y, w) = {y g(x)}k ((x w)/h )/s (x), defie Z G (x) = G (f,x ), x I, = o P {(log(1/h )) 1/2 }. ad observe that Z G is a tight Gaussia radom variable i l (I) with mea zero ad the same covariace fuctio as Z such that Z G I has the same distributio as V. Sice Ẑ I V Ẑ I Z I + Z I V = o P {(log(1/h )) 1/2 }, there exists a sequece 0 such that P{ Ẑ I V > (log(1/h )) 1/2 } (which follows from the fact that the Ky Fa metric metrizes covergece i probability; see Theorem i Dudley (2002)). Observe that for ay z, P{ Ẑ I z} P{V z + (log(1/h )) 1/2 } + P{ Ẑ I V > (log(1/h )) 1/2 } = P{ Z G I z + (log(1/h )) 1/2 } +. The ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) the yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + 4 (log(1/h )) 1/2 {1 + E[ Z G I ]}. From the coverig umber boud for F (2) give i Lemma A.6, together with the facts that E[F (2) (Y, W ) 2 ] h 1 ad Var(f,x (Y, W )) = 1 for all x I, Dudley s etropy itegral boud (cf. va der Vaart ad Weller, 1996, Corollary 2.2.8) yields that which implies that E[ Z G I ] = E[ G (2) F ] log(1/(δ h ))dδ log(1/h ), P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Ẑ I z} P{ Z G I z} o(1) uiformly i z. This completes the proof. A.3. Proof of Theorem 3.2. We first prove the followig techical lemma. Lemma A.8. ŝ 2 ( )/s 2 ( ) 1 I = o P {(log(1/h )) 1 }. Proof. Observe that {Y j ĝ(x)} 2 K2 ((x W j )/h ) = {Y j g(x)} 2 K 2 ((x W j )/h ) + {g(x) ĝ(x)} 2 K 2 ((x W j )/h ) + 2{g(x) ĝ(x)}{y j g(x)}k 2 ((x W j )/h ) + {Y j ĝ(x)} 2 { K 2 ((x W j )/h ) K 2 ((x W j )/h )},

36 36 K. KATO AND Y. SASAKI so that 1 {Y j ĝ( )} 2 K2 (( W j )/h ) 1 {Y j g( )} 2 K 2 (( W j )/h ) I O P (h 2α ) ĝ g 2 I + 2 ĝ g I 1 {Y j g( )}K 2 (( W j )/h ) + 2 (Y 2 j + ĝ 2 I) K 2 K 2. (A.5) I From Step 1 i the proof of Theorem 3.1, ĝ g I = O P {h α (h ) 1/2 log(1/h )}, so that the first term o the right had side of (A.5) is O P {h 4α (h ) 1 log(1/h )}. Sice K K 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt O P (h 2α ) ϕ ε (t/h ) ϕ ε (t/h ) ϕ K (t) dt we have that = O P (h 2α m 1/2 ), K 2 K 2 K K K + K = O P (h 3α m 1/2 ), which implies that the last term o the right had side o (A.5) is O P (h 3α m 1/2 ). To boud the secod term, observe first that, sice E[Y W = w]f W (w) = ((gf X ) f ε ) (w) is bouded (i absolute value) by gf X, Hece, 1 E[{Y g( )}K(( 2 W )/h )] I h ( gf X + g I f W ) K 2 (w)dw h 2α+1. {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )] }{{} I I + 1 =O(h 2α+1 ) {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )]. I The secod term o the right had side is idetical to 1 {f(y j, W j ) E[f(Y, W )]} F (3).

37 I view of the coverig umber boud for F (3) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is 1/2 E[{F (3) (Y, W )]} 2 ] h 2α 1/2. Therefore, the right had side o (A.5) is { O P h 4α which is o P {h 2α+1 (h ) 1 log(1/h ) + h α (h ) 1/2 log(1/h )(h 2α+1 1 s 2 (x) = (log(1/h )) 1 }. Hece, sice if x I s 2 (x) h 2α+1 1 s 2 (x) {Y j ĝ(x)} 2 K2 ((x W j )/h ) 37 + h 2α 1/2 ) + h 3α m 1/2},, we have {Y j g(x)} 2 K((x 2 W j )/h ) + o P {(log(1/h )) 1 } uiformly i x I. Sice A 2 ( )/s 2 ( ) I = O(h 2α+2β+1 ), it remais to prove that 1 [ 1 s 2 {Y j g( )} 2 K(( 2 W j )/h ) E ( ) s 2 ( ) {Y g( )}2 K(( 2 W ))/h )] I = 1 {f(y j, W j ) E[f(Y, W )]} F (4) is o P {(log(1/h )) 1 }. I view of the coverig umber boud for F (4) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is This completes the proof. 1/2 E[{F (4) (Y, W )]} 2 ] h 1 1/2 = o{(log(1/h )) 1 }. Proof of Theorem 3.2. We divide the proof ito several steps. Step 1. Defie Z ξ (x) = 1 s (x) ξ j [{Y j g(x)}k ((x W j )/h ) 1 ] j =1 {Y j g(x)}k ((x W j )/h ) for x I. We first prove that sup P{ Z ξ I z D } P{ Z G I z} P 0. z To this ed, we shall apply Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ). Let ν(f) ξ = 1 ξ j {f(y j, W j ) 1 j =1 f(y j, W (2) j )}, f F.

38 38 K. KATO AND Y. SASAKI The applyig Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V ξ of which the coditioal distributio give D is idetical to the distributio of G (2) F (= Z G I ), ad such that ν ξ F (2) { V ξ (log ) 9/4 = O P 1/4 h 1/2 + } (log )2 (h ) 1/4 = o P {(log(1/h )) 1/2 }, which shows that there exists a sequece 0 such that { } ν ξ P (2) F V ξ > (log(1/h )) 1/2 P D 0. Sice ν ξ F (2) = Z ξ I, we have P{ Z ξ I z D } P{V ξ z + (log(1/h )) 1/2 D } + o P (1) = P{ Z G I z + (log(1/h )) 1/2 } + o P (1) uiformly i z, ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Z ξ I z D } P{ Z G I z} o P (1) uiformly i z. Step 2. I view of the proof of Step 1, i order to prove the result (3.3), it is eough to prove that Ẑξ Z ξ I = o P {(log(1/h )) 1/2 }. To this ed, defie Z ξ (x) = for x I, ad we first prove that 1 s (x) ξ j {Y j ĝ(x)} K ((x W j )/h ) Z ξ Z ξ I = o P {(log(1/h )) 1/2 }. (A.6) We begi with otig that 1 {Y j g(x)}k ((x W j )/h ) = h { µ (x) E[ µ (x)]} h g(x){ f X(x) E[ f X(x)]} + A (x) = O P {h α+1 (h ) 1/2 log(1/h )} uiformly i x I, so that it suffices to verify that 1 s ( ) ξ j {Y j ĝ( )} K (( W j )/h ) ξ j {Y j g( )} K (( W j )/h ) I

39 is o P {(log(1/h )) 1/2 }. Sice 1/s I h α 1/2, the last term is h α 1/2 { 1/2 ξ j Y j { K (( W j )/h ) K (( W j )/h )} + ĝ g I ξ j K (( W j )/h ) + g I ξ j { K (( W j )/h ) K ((x W j )/h )} I =: h α 1/2 1/2 {I + II + III }. Step 2 i the proof of Theorem 2 i Kato ad Sasaki (2016) shows that h α 1/2 1/2 III = o P {(log(1/h )) 1/2 }. For the secod term II, observe that II ĝ g I ξ j e itw j/h ϕ K (t) ϕ ε (t/h ) dt 1 O P (h α ) ĝ g I ξ j e ity j/h 1 dt = O P {h 2α 1/2 log(1/h )}, so that h α 1/2 1/2 II = O P {h α 1 1/2 log(1/h )} = o P {(log(1/h )) 1/2 }. For the first term I, observe that I ξ j Y j e itw j/h 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt 1 1 ξ j Y j e itw j/h = O P ( 1/2 h 2α m 1/2 ), 2 1 I 1/2 { 1 dt 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) so that h α 1/2 1/2 I = o P {(log(1/h )) 1 }. Hece we have proved (A.6). Note that the result of Step 1 ad the fact that E[ Z G I ] = O( log(1/h )) imply that 2 dt } 1/2 Z ξ I = O P ( log(1/h )), which i tur implies that Z ξ I = O P ( log(1/h )). Hece which leads to (3.3). Ẑξ Z ξ I s ( )/ŝ ( ) 1 I Z ξ I = o P {(log(1/h )) 1/2 }, Step 3. We shall prove the last two assertios of the theorem. Observe that f X (x) { } h (ĝ(x) g(x)) Ẑ(x) = { ŝ (x) f h (ĝ(x) g(x)) s (x) X (x) f X (x)} + ŝ }{{} (x) ŝ (x) 1 Ẑ (x), =:Ẑ (x) 39 I }

40 40 K. KATO AND Y. SASAKI ad the right had side is o P {(log(1/h )) 1/2 } uiformly i x I. To see this, sice ĝ g I = O P {h α (h ) 1/2 log(1/h )} ad f X f X I = O P {h α (h ) 1/2 log(1/h )} (which follows from Corollary 1 i Kato ad Sasaki (2016)), the right had side o the above displayed equatio is O P {h α (h ) 1/2 log(1/h )} O P ( log(1/h )) + o P {(log(1/h )) 1 } O P ( log(1/h )) = o P {(log(1/h )) 1/2 } uiformly i x I. Now, Theorem 3.1 ad the ati-cocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yield that sup P{ Ẑ I z} P{ Z G I z} 0. z We are to show that P{ Ẑ I ĉ (1 τ)} 1 τ. From the result (3.3), there exists a sequece 0 such that with probability greater tha 1, sup P{ Ẑξ I z D } P{ Z G I z}, z (A.7) ad let E be the evet that (A.7) holds. Takig 0 more slowly if ecessary, we have that sup z P{ Ẑ I z} P{ Z G I z}. ecall that c G (1 τ) is the (1 τ)-quatile of Z G I, ad observe that o the evet E, P{ Ẑξ I c G (1 τ + )} P{ Z G I c G (1 τ + )} = 1 τ, where the last equality holds sice the distributio fuctio of Z G I is cotiuous (which follows from Lemma A.3). Hece o the evet E, it holds that ĉ (1 τ) c G (1 τ + ), so that P{ Ẑ I ĉ (1 τ)} P{ Ẑ I c G (1 τ + )} + P{ Z G I c G (1 τ + )} + 2 = 1 τ + 3. Likewise, we have P{ Ẑ I ĉ (1 τ)} 1 τ 3, which shows that P{ Ẑ I ĉ (1 τ)} 1 τ ad thus (3.4) holds. Fially, the Borell-Sudakov-Tsirelso iequality (va der Vaart ad Weller, 1996, Lemma A.2.2) yields that c G (1 τ + ) E[ Z G I ] + 2 log(1/(τ )) log(1/h ), which implies that ĉ (1 τ) = O P ( log(1/h )). Furthermore, sup x I ŝ (x) ŝ (x) sup s (x) sup x I x I s (x) = O P(h α+1/2 ). Therefore, the supremum width of the bad Ĉ1 τ is 2 sup x I This completes the proof. ŝ (x) h ĉ (1 τ) = O P {h α (h ) 1/2 log(1/h ) }.

41 A.4. Proof of Theorem 7.1. The proof is completely aalogous to those of Theorems 3.1 ad 3.2, give the facts that g(y, x) = E[1(Y y) X = x] ad the fuctio class {1( y) : y J} is a VC class. Hece we omit the detail for brevity. 41

42 42 K. KATO AND Y. SASAKI Appedix B. Tables for Sectio 5 egressio: g(x) = x σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 1. Simulated uiform coverage probabilities of g(x) = x by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

43 43 egressio: g(x) = x 2 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 2. Simulated uiform coverage probabilities of g(x) = x 2 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

44 44 K. KATO AND Y. SASAKI egressio: g(x) = x 3 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 3. Simulated uiform coverage probabilities of g(x) = x 3 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

45 45 egressio: g(x) = si(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 4. Simulated uiform coverage probabilities of g(x) = si(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5

46 46 K. KATO AND Y. SASAKI egressio: g(x) = cos(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 5. Simulated uiform coverage probabilities of g(x) = cos(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5 Appedix C. Figures for Sectio 6

47 47 (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 1. Estimates ad cofidece bads for the oparametric regressio of medical expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the medical expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.

48 48 K. KATO AND Y. SASAKI (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 2. Estimates ad cofidece bads for the oparametric regressio of prescriptio expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the prescriptio expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by gray-scaled colors idicate 80%, 90%, ad 95% cofidece bads.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Math 2784 (or 2794W) University of Connecticut

Math 2784 (or 2794W) University of Connecticut ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

MAS111 Convergence and Continuity

MAS111 Convergence and Continuity MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece

More information

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State Bayesia Cotrol Charts for the Two-parameter Expoetial Distributio if the Locatio Parameter Ca Take o Ay Value Betwee Mius Iity ad Plus Iity R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Local Polynomial Regression

Local Polynomial Regression Local Polyomial Regressio Joh Hughes October 2, 2013 Recall that the oparametric regressio model is Y i f x i ) + ε i, where f is the regressio fuctio ad the ε i are errors such that Eε i 0. The Nadaraya-Watso

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties

More information

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors

A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors Appl. Math. J. Chiese Uiv. 008, 3(): 97-0 A ote o self-ormalized Dickey-Fuller test for uit root i autoregressive time series with GARCH errors YANG Xiao-rog ZHANG Li-xi Abstract. I this article, the uit

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity

More information

Sequences and Limits

Sequences and Limits Chapter Sequeces ad Limits Let { a } be a sequece of real or complex umbers A ecessary ad sufficiet coditio for the sequece to coverge is that for ay ɛ > 0 there exists a iteger N > 0 such that a p a q

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

2.2. Central limit theorem.

2.2. Central limit theorem. 36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard

More information

Asymptotic distribution of the first-stage F-statistic under weak IVs

Asymptotic distribution of the first-stage F-statistic under weak IVs November 6 Eco 59A WEAK INSTRUMENTS III Testig for Weak Istrumets From the results discussed i Weak Istrumets II we kow that at least i the case of a sigle edogeous regressor there are weak-idetificatio-robust

More information

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference

A constructive analysis of convex-valued demand correspondence for weakly uniformly rotund and monotonic preference MPRA Muich Persoal RePEc Archive A costructive aalysis of covex-valued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.ui-mueche.de/55889/

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1 Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity

More information

Basis for simulation techniques

Basis for simulation techniques Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,

More information

Stat 200 -Testing Summary Page 1

Stat 200 -Testing Summary Page 1 Stat 00 -Testig Summary Page 1 Mathematicias are like Frechme; whatever you say to them, they traslate it ito their ow laguage ad forthwith it is somethig etirely differet Goethe 1 Large Sample Cofidece

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions

ECE 901 Lecture 4: Estimation of Lipschitz smooth functions ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig

More information

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests Joural of Moder Applied Statistical Methods Volume 5 Issue Article --5 Bootstrap Itervals of the Parameters of Logormal Distributio Usig Power Rule Model ad Accelerated Life Tests Mohammed Al-Ha Ebrahem

More information

ON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES. 1. Introduction Basic hypergeometric series (cf. [GR]) with the base q is defined by

ON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES. 1. Introduction Basic hypergeometric series (cf. [GR]) with the base q is defined by ON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES TOSHIO OSHIMA Abstract. We examie the covergece of q-hypergeometric series whe q =. We give a coditio so that the radius of the covergece is positive ad get

More information

Sequences of Definite Integrals, Factorials and Double Factorials

Sequences of Definite Integrals, Factorials and Double Factorials 47 6 Joural of Iteger Sequeces, Vol. 8 (5), Article 5.4.6 Sequeces of Defiite Itegrals, Factorials ad Double Factorials Thierry Daa-Picard Departmet of Applied Mathematics Jerusalem College of Techology

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Introduction to Probability. Ariel Yadin

Introduction to Probability. Ariel Yadin Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways

More information

Probability and Statistics

Probability and Statistics ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,

More information

DISTRIBUTION LAW Okunev I.V.

DISTRIBUTION LAW Okunev I.V. 1 DISTRIBUTION LAW Okuev I.V. Distributio law belogs to a umber of the most complicated theoretical laws of mathematics. But it is also a very importat practical law. Nothig ca help uderstad complicated

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father

More information

Inference under shape restrictions

Inference under shape restrictions Iferece uder shape restrictios Joachim Freyberger Brado Reeves July 3, 207 Abstract We propose a uiformly valid iferece method for a ukow fuctio or parameter vector satisfyig certai shape restrictios.

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

Power Comparison of Some Goodness-of-fit Tests

Power Comparison of Some Goodness-of-fit Tests Florida Iteratioal Uiversity FIU Digital Commos FIU Electroic Theses ad Dissertatios Uiversity Graduate School 7-6-2016 Power Compariso of Some Goodess-of-fit Tests Tiayi Liu tliu019@fiu.edu DOI: 10.25148/etd.FIDC000750

More information

STATISTICAL INFERENCE

STATISTICAL INFERENCE STATISTICAL INFERENCE POPULATION AND SAMPLE Populatio = all elemets of iterest Characterized by a distributio F with some parameter θ Sample = the data X 1,..., X, selected subset of the populatio = sample

More information

Posted-Price, Sealed-Bid Auctions

Posted-Price, Sealed-Bid Auctions Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Gamma Distribution and Gamma Approximation

Gamma Distribution and Gamma Approximation Gamma Distributio ad Gamma Approimatio Xiaomig Zeg a Fuhua (Frak Cheg b a Xiame Uiversity, Xiame 365, Chia mzeg@jigia.mu.edu.c b Uiversity of Ketucky, Leigto, Ketucky 456-46, USA cheg@cs.uky.edu Abstract

More information

A) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set.

A) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set. M.A./M.Sc. (Mathematics) Etrace Examiatio 016-17 Max Time: hours Max Marks: 150 Istructios: There are 50 questios. Every questio has four choices of which exactly oe is correct. For correct aswer, 3 marks

More information

FUNDAMENTALS OF REAL ANALYSIS by

FUNDAMENTALS OF REAL ANALYSIS by FUNDAMENTALS OF REAL ANALYSIS by Doğa Çömez Backgroud: All of Math 450/1 material. Namely: basic set theory, relatios ad PMI, structure of N, Z, Q ad R, basic properties of (cotiuous ad differetiable)

More information

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol Discrete-Evet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.

More information

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure

Probabilistic and Average Linear Widths in L -Norm with Respect to r-fold Wiener Measure joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L -Norm with Respect to r-fold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

Some Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation

Some Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation Some Properties of the Exact ad Score Methods for Biomial Proportio ad Sample Size Calculatio K. KRISHNAMOORTHY AND JIE PENG Departmet of Mathematics, Uiversity of Louisiaa at Lafayette Lafayette, LA 70504-1010,

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l SRC Techical Note 1997-011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/

More information

The Gamma function Michael Taylor. Abstract. This material is excerpted from 18 and Appendix J of [T].

The Gamma function Michael Taylor. Abstract. This material is excerpted from 18 and Appendix J of [T]. The Gamma fuctio Michael Taylor Abstract. This material is excerpted from 8 ad Appedix J of [T]. The Gamma fuctio has bee previewed i 5.7 5.8, arisig i the computatio of a atural Laplace trasform: 8. ft

More information

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction SAMPLING LIPSCHITZ CONTINUOUS DENSITIES OLIVIER BINETTE Abstract. A simple ad efficiet algorithm for geeratig radom variates from the class of Lipschitz cotiuous desities is described. A MatLab implemetatio

More information

ON POINTWISE BINOMIAL APPROXIMATION

ON POINTWISE BINOMIAL APPROXIMATION Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece

More information

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets B671-672 Supplemetal otes 2 Hypergeometric, Biomial, Poisso ad Multiomial Radom Variables ad Borel Sets 1 Biomial Approximatio to the Hypergeometric Recall that the Hypergeometric istributio is fx = x

More information

SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS

SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, AND KENGO KATO Abstract. I ecoometric applicatios it is commo

More information