arxiv: v3 [math.st] 24 Feb 2017

 Janis Farmer
 11 months ago
 Views:
Transcription
1 UNIFOM CONFIDENCE BANDS FO NONPAAMETIC EOSINVAIABLES EGESSION KENGO KATO AND YUYA SASAKI arxiv: v3 [math.st] 24 Feb 2017 Abstract. This paper develops a method to costruct uiform cofidece bads for a oparametric regressio fuctio where a predictor variable is subject to a measuremet error. We allow for the distributio of the measuremet error to be ukow, but assume that there is a idepedet sample from the measuremet error distributio. The sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. The availability of a sample from the measuremet error distributio is satisfied if, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed, are available. The proposed cofidece bad builds o the decovolutio kerel estimatio ad a ovel applicatio of the multiplier (or wild) bootstrap method. We establish asymptotic validity of the proposed cofidece bad uder ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric errorsivariables regressio. We also propose a ovel datadrive method to choose a badwidth, ad coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. Applyig our method to a combiatio of two empirical data sets, we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal errorfree regressors, ad cofidece bads for coditioal distributio fuctios. 1. Itroductio Cosider the oparametric errorsivariables (EIV) regressio model classical measuremet error Y = g(x) + U, E[U X, ε] = 0, (1.1) W = X + ε, where each of Y, X, U, W, ad ε is a uivariate radom variable, ad ε is idepedet from X. We observe (Y, W ), but observe either X or ε. Furthermore, we assume that the distributio of ε is ukow. The variable X is a latet predictor variable, while ε is a measuremet error. Of iterest are estimatio of ad iferece o the regressio fuctio g(x) = E[Y X = x]. I Date: First arxiv versio: February 11, This versio: February 27, K. Kato is supported by GratiAid for Scietific esearch (C) (15K03392) from the JSPS. We would like to thak Tatsushi Oka ad Holger Dette for useful commets ad discussios. 1
2 2 K. KATO AND Y. SASAKI particular, we are iterested i costructig uiform cofidece bads for g. Cofidece bads provide a simple graphical descriptio of the extet to which a oparametric estimator varies at desig poits, thereby quatifyig ucertaities of the oparametric estimator. However, costructio of cofidece bads teds to be challegig, especially for complex oparametric models. 1 Ideed, despite the rich literature o cosistet estimatio of oparametric EIV regressio, the literature o poitwise or uiform cofidece bads for oparametric EIV regressio is limited see below for a literature review likely because of its complexity. Eve poitwise iferece o g uder the assumptio that the measuremet error distributio is kow is cosidered by experts to be difficult. 2 This is because, as discussed i Delaigle et al. (2015): 1) the asymptotic variace of the decovolutio kerel estimator of g is otrivial to estimate ad so iferece based o limitig distributios is difficult to implemet; ad 2) it is ot straightforward to devise a way to implemet bootstrap for iferece o g due to the uobservability of X i data. With all these challeges recogized i the literature, the preset paper attempts to solve a eve more challegig problem of costructig uiform cofidece bads for the regressio fuctio g without assumig that the measuremet error distributio is kow. To deal with ukow measuremet error distributio, we assume that, i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} from the distributio of (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio where m = m as. (The auxiliary sample {η 1,..., η m } eed ot be idepedet from {(Y 1, W 1 ),..., (Y, W )}.) For example, i atural sciece, measuremet errors are ofte due to measurig devices; i such cases, oe ca obtai prelimiary calibratio measures i the absece of sigal, which produce a sample from the measuremet error distributio; see the itroductio of Comte ad Lacour (2011), for example. Other real data scearios of such additioal data availability that are plausible i ecoomics, social scieces, ad biomedical scieces iclude: the case where validatio data is available for data combiatio; ad the case where repeated measuremets (pael data) o X with errors oe of which is symmetrically distributed are available. These patters of data requiremets are ofte cosidered i the existig literature with measuremet errors that we review below. Uder this setup, we develop a method to costruct cofidece bads for the regressio fuctio g. Our method builds o the decovolutio kerel estimatio (Fa ad Truog, 1993), ad a 1 We refer to Wasserma (2006) ad Gié ad Nickl (2016) as geeral refereces o cofidece bads i oparametric statistical models. 2 Delaigle et al. (2015), who study poitwise cofidece bads for oparametric EIV regressio uder the assumptio that the measuremet error distributio is kow, state that despite their practical importace, to our kowledge cofidece bads i oparametric EIV regressio have largely bee igored so far. We show that the problem is particularly complex, much more so tha i the stadard errorfree settig. (Delaigle et al., 2015, p.149)
3 ovel applicatio of the multiplier (or wild) bootstrap method. Our costructio of the multiplier process differs from the stadard approach i the errorfree case (cf. Neuma ad Polzehl, 1998), ad is tailored to EIV regressio; see emark 2.1 ahead. Buildig o otrivial applicatios of the probabilistic techiques developed i Cherozhukov et al. (2014a,b, 2016), we establish asymptotic validity of the proposed cofidece bad, i.e., the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. I the preset paper, as i Bissatz et al. (2007), SchmidtHieber et al. (2013), Delaigle et al. (2015) that study iferece i decovolutio ad EIV regressio, we focus for a techical reaso o the case where the measuremet error desity is ordiary smooth, i.e., the characteristic fuctio of the measuremet error distributio decays at most polyomially fast i the tail (cf. Fa, 1991a; Fa ad Truog, 1993). I additio to these cotributios, we also propose a ovel datadrive method to choose a badwidth. I the theoretical study, we require to take the badwidth i such a way that it udersmoothes the decovolutio kerel estimate, so that the bias is egligible relative to the variace part. Existig datadrive methods for badwidth selectio typically aim at choosig a badwidth miimizig the MISE, thereby yieldig a oudersmoothig badwidth (cf. Delaigle ad Hall, 2008). We propose a alterative method for badwidth selectio that aims at yieldig a udersmoothig badwidth. We coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. The simulatio studies show that the proposed cofidece bad, combied with the proposed badwidth selectio rule, works well. Applyig our method to a combiatio of the two data sets, the Natioal Health ad Nutritio Examiatio Survey (NHANES) ad the Pael Survey of Icome Dyamics (PSID), we draw cofidece bads for oparametric regressios of medical costs o the body mass idex (BMI), accoutig for measuremet errors i BMI. Fially, we discuss extesios of our results to specificatio testig, cases with additioal errorfree regressors, ad cofidece bads for coditioal distributio fuctios. I order to locate the preset paper i the cotext of the relevat literature, it is useful to first review measuremet error models ad decovolutio. We refer to books by Fuller (1987), Carroll et al. (2006), Meister (2009) ad Horowitz (2009, Chapter 5) ad surveys by Che et al. (2011) ad Scheach (2016) for geeral refereces. The geesis of this literature features the decovolutio kerel desity estimatio with kow error distributios (Carroll ad Hall, 1988; Stefaski ad Carroll, 1990; Fa, 1991a,b), followed by that with ukow error distributios (Diggle ad Hall, 1993; Horowitz ad Markatou, 1996; Neuma, 1997; Efromovich, 1997; Li ad Vuog, 1998; Delaigle et al., 2008; Johaes, 2009; Comte ad Lacour, 2011). Diggle ad Hall (1993); Neuma (1997); Efromovich (1997); Johaes (2009); Comte ad Lacour (2011) assume the availability of a sample from the measuremet error distributio, while Horowitz ad Markatou (1996); Delaigle et al. (2008) assume repeated measuremets (pael data) with symmetrically ad 3
4 4 K. KATO AND Y. SASAKI idetically distributed errors. For repeated measuremets (pael data) without symmetry of error distributios, Li ad Vuog (1998) propose a alterative desity estimator based o Kotlarski s lemma (cf. Kotlarski, 1967; ao, 1992) that does ot require kow error distributio; see also Bohomme ad obi (2010) ad Comte ad Kappus (2015) for further developmets. Methods to costruct cofidece bads i decovolutio are developed by Bissatz et al. (2007); Bissatz ad Holzma (2008); va Es ad Gugushvili (2008); Louici ad Nickl (2011); SchmidtHieber et al. (2013) for the case of kow error distributio, ad more recetly by Kato ad Sasaki (2016) for the case of ukow error distributio. Similarly to the desity estimatio, the literature o oparametric EIV regressio estimatio ofte takes the decovolutio kerel approach. Fa ad Truog (1993) propose to substitute the decovolutio kerel i the NadarayaWatso estimator also see Fa ad Masry (1992) for poitwise asymptotic ormality, Delaigle ad Meister (2007) for extesios to heteroscedastic measuremet errors, Delaigle et al. (2009) for local polyomial extesios, ad Delaigle et al. (2015) for poitwise iferece. These papers focus o the case of kow error distributio. Delaigle et al. (2008) estimate the error characteristic fuctio usig repeated measuremets o X with symmetrically ad idetically distributed errors, ad substitute the estimated error characteristic fuctio ito the decovolutio kerel. Scheach (2004) also works with cases with repeated measuremets but without assumig symmetry of error distributios, ad proposes a alterative approach to estimate the regressio fuctio based o Kotlarski s lemma. See also Carroll et al. (1999); Scheach et al. (2012); Scheach ad Hu (2013); Hu ad Sasaki (2015). Our method of iferece is based o the decovolutio kerel estimatio. We maily focus o (i) the case where a sample draw from the error distributio is available; (ii) the case where validatio data is available for data combiatio; ad (iii) the case where repeated measuremets with errors oe of which is symmetrically distributed are available. For (ii) data combiatio with validatio data, our model shares similarities albeit differet assumptios to that of the oparametric istrumetal variables (NPIV) regressio, for which Horowitz ad Lee (2012), Che ad Christese (2015) ad Babii (2016) develop methods to costruct cofidece bads as we do for oparametric EIV regressio. We ote the followig two refereces as particularly relevat bechmarks for idetifyig our cotributios. Oe referece is Scheach (2004) that derives poitwise asymptotic ormality for the oparametric EIV regressio estimator differet from ours, uder ukow error distributio. To this existig result, our cotributios are fourfold. First, we provide a method of uiform iferece as opposed to a poitwise oe. Secod, we propose a method of badwidth selectio for valid iferece. Third, while the existig result left aside the issue of variace estimatio ad thus are ot readily applicable i practice, we provide a bootstrap method for ease of practical implemetatio. Fourth, we devise lowerlevel assumptios which are easier to verify with cocrete examples of distributio ad coditioal momet fuctios. The other referece
5 5 is Delaigle et al. (2015) that suggests a method of poitwise iferece via bootstrap for oparametric EIV regressio with kow error distributio. To this existig result, our cotributios are threefold. First, our method allows for ukow error distributio. Secod, we provide a method of uiform iferece as opposed to a poitwise oe. Third, we provide formal theories to support the asymptotic validity of our bootstrap method. Delaigle et al. (2015) metio how to modify their methodology to the case where the measuremet error distributio is ukow, ad to costructio of uiform cofidece bads. However, their theoretical results do ot formally cover those cases. Fially, Birke et al. (2010) ad Proksh et al. (2015) obtai cofidece bads for iverse regressio with fixed equidistat desigs (the fixed equidistat desig assumptio is substatial i their setups ad aalyses); the iverse regressio is related to but differet from our EIV regressio (1.1), ad our setup does ot allow fixed equidistat desigs because of measuremet errors. The methodologies ad the proof strategies are also differet; for example both of those papers rely o Gumbel approximatios for validity of the cofidece bads, which we do ot. Importatly, to the best of our kowledge, oe of the existig results covers uiform cofidece bads for EIV regressio (1.1), eve uder the simpler settig that the measuremet error distributio is kow. The preset paper fills this importat void. The rest of the paper is orgaized as follows. I Sectio 2, we iformally preset our methodology to costruct uiform cofidece bads for g. I Sectio 3, we preset asymptotic validity of the proposed cofidece bad uder suitable regularity coditios. I Sectio 4, we propose a practical method to choose the badwidth. I Sectio 5, we coduct simulatio studies to verify the fiite sample performace of the proposed cofidece bad. I Sectio 6, we apply the proposed method to a combiatio of two empirical data sets. I Sectio 7, we discuss extesios of our results to specificatio testig of the coditioal mea fuctio, cases with additioal regressors without measuremet errors, ad costructio of cofidece bads for the coditioal distributio fuctio. Sectio 8 cocludes. All the proofs are deferred to Appedix Notatios. For a oempty set T ad a (complexvalued) fuctio f o T, we use the otatio f T = sup t T f(t). Let l (T ) deote the Baach space of all bouded realvalued fuctios o T with orm T. The Fourier trasform of a itegrable fuctio f o is defied by ϕ f (t) = e itx f(x)dx, t, where i = 1 deotes the imagiary uit throughout the paper. We refer to Follad (1999) as a basic referece o Fourier aalysis. For ay positive sequeces a ad b, we write a b if a /b is bouded ad bouded away from zero. For ay a, b, let a b = mi{a, b} ad a b = max{a, b}. For a, b > 0, we use the shorthad otatio [a ± b] = [a b, a + b]. Let = d deote the equality i distributio.
6 6 K. KATO AND Y. SASAKI 2. Methodology I this sectio, we iformally preset our methodology to costruct cofidece bads for g. The formal aalysis of our cofidece bads will be carried out i the ext sectio. We will also discuss some examples of situatios where a auxiliary sample from the measuremet error distributio is available Decovolutio kerel estimatio. We first itroduce a decovolutio kerel method to estimate f X ad g uder the assumptio that the distributio of ε is kow. Let {(Y 1, W 1 ),..., (Y, W )} be a idepedet sample from the distributio of (Y, W ). I this paper, we assume that the desities of X ad ε exist ad are deoted by f X ad f ε, respectively. Let ϕ W, ϕ X, ad ϕ ε deote the characteristic fuctios of W, X, ad ε, respectively. By the idepedece betwee X ad ε, the desity of W exists ad is give by the covolutio of the desities of X ad ε, amely, f W (w) = (f X f ε )(w) = f X (w x)f ε (x)dx, w, where deotes the covolutio. This i tur implies that the characteristic fuctio of W is idetical to the product of those of X ad ε, amely, ϕ W (t) = ϕ X (t)ϕ ε (t), t. Provided that ϕ ε is ovaishig o ad ϕ X is itegrable o with respect to the Lebesgue measure (we hereafter omit with respect to the Lebesgue measure ), the Fourier iversio formula yields that f X (x) = 1 2π e itx ϕ X (t)dt = 1 2π e itx ϕ W (t) dt, x. (2.1) ϕ ε (t) The expressio (2.1) leads to a method to estimate f X. However, simply replacig ϕ W by the empirical characteristic fuctio of W, amely, ϕ W (t) = 1 e itw j, t does ot work. Specifically, the fuctio t e itx ϕ W (t)/ϕ ε (t) is ot itegrable o because ϕ ε (t) 0 as t by the iemalebesgue lemma while ϕ W is the characteristic fuctio of the discrete distributio (i.e., the empirical distributio) ad lim sup t ϕ ε (t) = 1 (cf. Sato, 1999, Propositio 27.28). A stadard approach to dealig with this problem is to use a kerel fuctio to restrict the itegral regio i (2.1) to a compact iterval. Let K : be a kerel fuctio such that K is itegrable o, K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1] (i.e., ϕ K (t) = 0 for all t > 1). Whe f ε is kow, the decovolutio kerel
7 desity estimator of f X is give by f X(x) = 1 2π e itx ϕ W (t) ϕ K(th ) dt. ϕ ε (t) This estimator was first cosidered by Carroll ad Hall (1988) ad Stefaski ad Carroll (1990). ates of covergece ad poitwise asymptotic ormality of f X are studied i Fa (1991a,b), amog others. Alteratively, by a chage of variables, we may rewrite f X as f X(x) = 1 K ((x W j )/h ), (2.2) h where the fuctio K, called the decovolutio kerel, is defied by K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that K is realvalued sice K (x) = 1 2π e itx ϕ K(t) ϕ ε (t/h ) dt = 1 2π e itx ϕ K( t) ϕ ε ( t/h ) dt = K (x), where z deotes the complex cojugate of a complex umber z. The secod expressio (2.2) resembles a stadard kerel desity estimator without measuremet errors. Aalogously, Fa ad Truog (1993) propose to estimate the regressio fuctio g(x) by ĝ (x) = µ (x)/ f X (x), where µ (x) = 1 2π e itx ( 1 Y je itw j ) ϕk (th ) dt = 1 ϕ ε (t) h Y j K ((x W j )/h ). To uderstad the ratioal behid this estimator, observe that E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = E[g(X)e it(x+ε) ] = E[g(X)e itx ]ϕ ε (t), ad E[g(X)e itx ] is the Fourier trasform of gf X, i.e., E[g(X)e itx ] = ϕ gfx (t). Hece ϕ gfx (t) = E[Y e itw ]/ϕ ε (t), ad provided that ϕ gfx o, the Fourier iversio formula yields that g(x)f X (x) = 1 2π 7 is itegrable e itx E[Y eitw ] dt. (2.3) ϕ ε (t) It is worth poitig out that estimatio of f X ad gf X correspods to solvig certai Fredholm itegral equatios of the first kid, ad therefore estimatio of f X ad gf X (or g) is a statistical illposed iverse problem. I fact, f X ad gf X satisfy f X f ε = f W ad (gf X ) f ε = E[Y W = ]f W ; these are Fredholm itegral equatios of the first kid where the right had side fuctios are directly estimable. 3 ates of covergece ad poitwise asymptotic ormality of ĝ are studied by Fa ad Truog (1993); Fa ad Masry (1992), amog others. The discussio so far has presumed that the distributio of ε is kow. However, i may applicatios, the distributio of ε is ukow, ad hece the estimators f X ad ĝ are ifeasible. 3 See, for example, Che (2007), Carrasco et al. (2007), Cavalier (2008), ad Horowitz (2009) for overview of statistical illposed iverse problems.
8 8 K. KATO AND Y. SASAKI I the preset paper, we assume that there is a idepedet sample {η 1,..., η m } from the distributio of ε: where m = m as. η 1,..., η m f ε i.i.d., We do ot assume that η 1,..., η m are idepedet from {(Y 1, W 1 ),..., (Y, W )}. I Sectio 2.3, we will discuss examples where such observatios from the measuremet error distributio are available. Give {η 1,..., η m }, we may estimate ϕ ε by the empirical characteristic fuctio, amely, ϕ ε (t) = 1 m m e itη j, ad estimate the decovolutio kerel K by the plugi method: K (x) = 1 e itx ϕ K(t) 2π ϕ ε (t/h ) dt. Note that uder the regularity coditios stated below, if t h 1 ϕ ε(t) > 0 with probability approachig oe, so that K is welldefied with probability approachig oe. Note also that K is realvalued. Now, we estimate g(x) by ĝ(x) = µ(x)/ f X (x), where µ(x) = 1 h Y j K ((x W j )/h ) ad fx (x) = 1 h K ((x W j )/h ). Desity estimators of the form f X are studied i Diggle ad Hall (1993), Neuma (1997), ad Efromovich (1997), amog others, ad oparametric regressio estimators of the form ĝ are studied i Delaigle et al. (2008), amog others Costructio of cofidece bads. We ow describe our method to costruct cofidece bads for g based o the estimator ĝ. Uder the regularity coditios stated below, we will show that ĝ(x) g(x) ca be approximated by 1 [{Y j g(x)}k ((x W j )/h ) A (x)] f X (x)h uiformly i x I, where I is a compact iterval i o which f X is bouded away from zero, ad A (x) = E[{Y g(x)}k ((x W )/h )]. Let ad cosider the process Z (x) = s 2 (x) = Var ({Y g(x)}k ((x W )/h )), 1 s (x) [{Y j g(x)}k ((x W j )/h )) A (x)], x I, where s (x) = s 2 (x). Note that uder the regularity coditios stated below, if x I s (x) > 0 for sufficietly large, so that Z is welldefied. Furthermore, we will show that there exists a
9 tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, P{ Z I z} P{ Z G I z} 0. sup z ecall that Z I = sup x I Z(x). This i tur yields that { } sup P Ẑ I z P { Z G I z } 0, z where {Ẑ(x) : x I} is a process defied by Therefore, if we deote by Ẑ (x) = f X(x) h (ĝ(x) g(x)), x I. (2.4) s (x) c G (1 τ) = (1 τ)quatile of Z G I for τ (0, 1), the a bad of the form [ ] Ĉ1 τ s (x) (x) = ĝ(x) ± f X (x) c G (1 τ), x I h will cotai g(x), x I with probability at least 1 τ + o(1) as. I fact, it holds that { } { } P g(x) Ĉ 1 τ (x) x I = P Ẑ I c G (1 τ) = P { Z G I c G (1 τ) } + o(1) 1 τ + o(1). I practice, f X (x), s 2 (x), ad c G (1 τ) are all ukow, ad we have to estimate them. We estimate f X (x) ad s 2 (x) by f X (x) ad ŝ 2 (x) = 1 {Y j ĝ(x)} 2 K2 ((x W j )/h ), respectively. Note that (E[A (x)]) 2 is egligible relative to s 2 (x) so that we have igored (E[A (x)]) 2 i estimatio of s 2 (x). Note also that (Y j ĝ(x)) K ((x W j )/h ) = 0. Next, we estimate the quatile c G (1 τ) by the Gaussia multiplier bootstrap. Geerate ξ 1,..., ξ N(0, 1) i.i.d., idepedetly of the data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m }, ad cosider the multiplier process Ẑ ξ (x) = 1 ŝ (x) ξ j {Y j ĝ(x)} K ((x W j )/h ), (2.5) where ŝ (x) = ŝ 2 (x). Note that uder the regularity coditios stated below, if x I ŝ (x) > 0 with probability approachig oe. Coditioally o the data D, Ẑξ is a Gaussia process with mea zero ad covariace fuctio (presumably) close to that of Z. Ideed, for f,x (y, w) = 9
10 10 K. KATO AND Y. SASAKI {y g(x)}k ((x w)/h )/s (x) ad f,x (y, w) = {y ĝ(x)} K ((x w)/h )/ŝ (x), the covariace fuctio of Ẑξ coditioally o D is E[Ẑξ (x)ẑξ (x ) D ] = 1 f,x (Y j, W j ) f,x (Y j, W j ) for x, x I, which estimates the covariace fuctio of Z G give by E[Z G (x)z G (x )] = E[f,x (Y, W )f,x (Y, W )] E[f,x (Y, W )]E[f,x (Y, W )] for x, x I. Hece, we estimate c G (1 τ) by ĉ (1 τ) = coditioal (1 τ)quatile of Ẑξ I give D, which ca be computed via simulatios. Now, the resultig cofidece bad is defied by [ ] ŝ (x) Ĉ 1 τ (x) = ĝ(x) ± f X (x) ĉ (1 τ), x I. (2.6) h Note that, except for the choice of the badwidth, this cofidece bad is completely datadrive. We will discuss practical choice of the badwidth i Sectio 4. emark 2.1 (Novelty of our costructio of the multiplier process). I the errorfree case, amely whe we ca observe (Y 1, X 1 ),..., (Y, X ), the deviatio of a stadard kerel regressio estimator ǧ with kerel K from the true regressio fuctio g is uiformly approximated as {f X (x)h } 1 U jk((x X j )/h ) uder suitable regularity coditios. So, to costruct cofidece bads for g via the multiplier bootstrap method, oe would costruct a multiplier stochastic process of the form 1 x σ (x) ξ j U j K((x X j )/h ) with σ (x) = Var(UK((x X)/h )), (2.7) ad the compute the coditioal (1 τ)quatile of the supremum i absolute value of the multiplier process. I practice, we replace U j ad σ (x) by suitable estimators; for example, a atural estimator of U j would be Ûj = Y j ǧ(x j ). See, for example, Neuma ad Polzehl (1998); see also Sectio 4.3 i Cherozhukov et al. (2013) for applicatios of the multiplier bootstrap method to a differet but related problem of iferece i itersectio boud models usig kerel methods. I the measuremet error case, give a cosmetic similarity betwee the decovolutio kerel estimatio ad the errorfree kerel estimatio, oe might be tempted to modify the multiplier process (2.7) by just replacig σ (x) ad K((x X j )/h ) with Var(UK ((x W )/h )) ad K ((x W j )/h ), respectively, but this will ot result i a valid cofidece bad eve if U 1,..., U were assumed to be kow. The reaso is that, i cotrast to the errorfree case, approximatio to ĝ(x) g(x) by {f X (x)h } 1 U jk ((x W j )/h )) is icorrect, which highlights oe
11 distictive feature of oparametric EIV regressio. Hece, i the preset paper, we develop a ovel costructio of the multiplier process (2.5) tailored to oparametric EIV regressio Examples. I this sectio, we preset a couple of examples where a auxiliary sample from the measuremet error distributio are available. Example 2.1 (epeated measuremets or pael data, Carroll et al. (2006), p.298). Suppose that we observe repeated measuremets or pael data o X with measuremet errors: W (1) = X + ε (1), W (2) = X + ε (2) where X ad (ε (1), ε (2) ) are idepedet, ad the coditioal distributio of ε (2) give ε (1) is symmetric. The distributio of ε (1) eed ot be symmetric (i particular, the distributios of ε (1) ad ε (2) may be differet), ad idepedece betwee ε (1) ad ε (2) is ot ecessary. If we defie W = (X (1) + X (2) )/2, ε = (ε (1) + ε (2) )/2, ad η = (W (1) W (2) )/2 = (ε (1) ε (2) )/2, the we have that W = X + ε, ε = d η, where η is observable. For this pael data setup, Scheach (2004) proposes a alterative estimator of g based o Kotlarski s lemma which does ot require the symmetry assumptio. The form of Scheach s estimator is more complex tha ours, ad to the best of our kowledge, there is o existig result o asymptotically valid uiform cofidece bads for Scheach s estimator. It is worth otig that while Scheach s approach ca drop the symmetry assumptio, it requires aother techical assumptio that the characteristic fuctio ϕ X (t) = E[e itx ] of X does ot vaish o the etire real lie. Both Scheach (2004) ad we (ad i fact most of papers o decovolutio ad EIV regressio) assume that the characteristic fuctios of the error variables do ot vaish o, but our approach does allow ϕ X to take zeros. The assumptio that ϕ X does ot vaish o is ot iocuous; it is otrivial to fid desities that are compactly supported ad have ovaishig characteristic fuctios (though these properties are ot mutually exclusive; see, e.g., Scheach (2016), Footote 4), ad the assumptio excludes desities covolved with distributios whose characteristic fuctios take zeros, ad so o. 4 So, we believe that Scheach s approach ad ours are complemetary to each other. Example 2.2 (Data combiatio 5 ). Suppose that we have access to data o (Y, W ) ad (W, X), separately, but do ot have access to data o (Y, X). This case is ofte faced by empirical 4 For example, covolutios of k uiform desities o [a, b] are piecewise polyomials with degrees k 1, ad covex combiatios of such piecewise polyomials form a rich family of desities, but their characteristic fuctios take zeros. 5 We thak Tatsushi Oka for poitig out this example. 11
12 12 K. KATO AND Y. SASAKI researchers, ad various techiques are proposed to combie the two separate samples see a survey by idder ad Moffitt (2007). To fix ideas, cosider the demad model Y = g(x) + U, where Y deotes the quatity purchased of a product ad X deotes the logarithm of its price. Marketig scietists ad ecoomists ofte use Nielse Homesca data for quatities ad prices to aalyze this demad model, but the homescaed prices i this data are subject to imputatio errors ε = W X. To overcome this issue, Eiav et al. (2010) collect data o (W, X) from a large grocery retailer by matchig trasactio prices X that were recorded by the retailer (at the store) to the prices W recorded by the Homesca paelists. Together with Nielse Homesca data o (Y, W ), Eiav et al. suggest to combie the two separate data sets to aalyze the demad model. Specifically, we ca costruct a sample {Y 1,..., Y, W 1,..., W, η 1,..., η m } from the two separate data o (Y, W ) ad (W, X). I the literature, validatio data are used as a way to relax the classical measuremet error assumptio that X ad ε are idepedet; see, for example, Che et al. (2005). While they allow for oclassical measuremet errors, Che et al. (2005) focus o the case where the parameter of iterest is fiite dimesioal. It is worth otig that, whe validatio data o (X, W ) are available, the problem of estimatio of g ca be cosidered as a oparametric istrumetal variable (NPIV) problem treatig X as a edogeous variable ad W as a istrumetal variable (see, for example, Newey ad Powell, 2003; Hall ad Horowitz, 2005; Bludell et al., 2007; Che ad eiss, 2011; Horowitz, 2011, for NPIV models). I fact, observe that E[Y W ] = E[g(X) W ]. For NPIV models, Horowitz ad Lee (2012) ad the more recet paper by Che ad Christese (2015) develop methods to costruct cofidece bads for the structural fuctio usig series methods, although these papers do ot formally cosider cases where samples o (Y, W ) ad (X, W ) are differet. 6 However, we would like to poit out that there are differece i uderlyig assumptios betwee series estimatio of NPIV models ad decovolutio kerel estimatio i EIV regressio. For example, i series estimatio of NPIV models, it is ofte assumed that the distributio of W is compactly supported ad the desity of W is bouded away from zero o its support (cf. Bludell et al., 2007; Che ad Christese, 2015). O the other had, i EIV regressio, it is commoly assumed that the characteristic fuctio of the measuremet error ε is ovaishig o (which leads to idetificatio of the fuctio g via (2.3)), ad i may cases the measuremet error ε the has ubouded support, which i tur implies that W has ubouded support. Further, while both NPIV ad EIV regressios are statistical illposed iverse problems, the ways i which the illposedess is defied are differet; i series estimatio of NPIV models, the illposedess is defied for give basis fuctios, while i EIV regressio, the illposedess is defied via how 6 Babii (2016) also develop methods to costruct cofidece bads for Tikhoov regularized estimators i NPIV models, but his cofidece bads are asymptotically coservative i the sese that the coverage probabilities are i geeral strictly larger tha the omial level eve asymptotically.
13 fast the characteristic fuctio of the measuremet error distributio decays. Hece we believe that our iferece results cover differet situatios tha those developed i the NPIV literature. 3. Mai results I this sectio, we study asymptotic validity of the proposed cofidece bad (2.6). To this ed, we make the followig assumptio. For ay give costats β, B > 0, let Σ(β, B) deote a class of fuctios defied by { Σ(β, B) = f : : f is ktimes differetiable, } f (k) (x) f (k) (y) B x y β k, x, y, where k is the iteger such that k < β k + 1, ad f (k) deotes the kderivative of f (f (0) = f). Let I be a compact iterval i. Assumptio 3.1. We assume the followig coditios. (i) E[Y 4 ] <, the fuctio w E[Y 2 W = w]f W (w) is bouded ad cotiuous, ad for each l = 1, 2, the fuctio w E[ Y 2+l W = w]f W (w) is bouded. (ii) The fuctios ϕ X (t) = E[e itx ] ad ψ X (t) = E[g(X)e itx ] for t are itegrable o. (iii) The measuremet error ε has fiite mea, E[ ε ] <, ad its characteristic fuctio, ϕ ε (t) = E[e itε ], t, does ot vaish o. Furthermore, there exist costats C 1 > 1 ad α > 0 such that C 1 1 t α ϕ ε (t) C 1 t α, ϕ ε(t) C 1 t α 1, t 1. (iv) The fuctios f X ad gf X belog to Σ(β, B) for some β > 1/2 ad B > 0. Let k deote the iteger such that k < β k + 1. (v) Let K be a realvalued itegrable fuctio (kerel) o, ot ecessarily oegative, such that K(x)dx = 1, ad its Fourier trasform ϕ K is supported i [ 1, 1]. Furthermore, ϕ K is (k + 3)times cotiuously differetiable with ϕ (l) K (0) = 0 for l = 1,..., k. (vi) For all x I, f X (x) > 0 ad E[{Y g(x)} 2 W = x]f W (x) > 0. (log(1/h )) 2 ( m)h 2α+2 0, h log(1/h ) m 13 0, ad h α+β h log(1/h ) 0. (3.1) Coditio (i) is a momet coditio o Y, which we believe is ot restrictive. Note that, for each l = 0, 1, 2, if E[ Y 2+l X, ε] = E[ Y 2+l X], the by comparig the Fourier trasforms of both sides, we arrive at the idetity E[ Y 2+l W = w]f W (w) = ((Υ l f X ) f ε ) (w), where Υ l (x) = E[ Y 2+l X = x], ad the right had side is bouded ad cotiuous if Υ l f X is bouded (which allows Υ l to be ubouded globally). For Coditio (ii), we first ote that ψ X is the Fourier trasform of gf X (which is itegrable by E[ Y ] < ). Coditio (ii) implies that f X
14 14 K. KATO AND Y. SASAKI ad gf X are (cotiuous ad) bouded, which i tur implies that f W (w) = f X(w x)f ε (x)dx is bouded ad cotiuous. Coditio (ii) is satisfied if, for example, f X ad gf X are twice cotiuously differetiable with itegrable derivatives up to the secod order; i fact, uder such coditios, ϕ X (t) = o( t 2 ) ad ψ X (t) = o( t 2 ) as t. However, differetiability of f X ad gf X is ot strictly ecessary for Coditio (ii) to hold; for example, a Laplace desity is ot differetiable but its Fourier trasform is itegrable. Coditio (iii) is cocered with the characteristic fuctio of the measuremet error. Note that fiiteess of the first momet of ε esures that ϕ ε is cotiuously differetiable. I the preset paper, as i Bissatz et al. (2007), SchmidtHieber et al. (2013), ad Delaigle et al. (2015), we assume that the measuremet error desity is ordiary smooth, amely, ϕ ε (t) decays at most polyomially fast as t (cf. Fa, 1991a). Iformally, the smoother f ε is, the faster ϕ ε (t) decays as t, so Coditio (iii) restricts smoothess of f ε. Laplace ad Gamma distributios, together with their covolutios, (suitable) mixtures, ad symmetrizatios 7, are typical examples of distributios satisfyig Coditio (iii), but ormal ad Cauchy distributios do ot satisfy Coditio (iii). Normal ad Cauchy desities are examples of supersmooth desities, i.e., their characteristic fuctios decay expoetially fast as t. 8 Coditio (iv) is cocered with smoothess of the fuctios f X ad g. Coditio (v) is about a kerel fuctio. By chages of variables, Coditio (iv) esures that x k+1 K(x) dx < ad xl K(x)dx = i l ϕ (l) K (0) = 0 for l = 1,..., k, that is, K is a (k + 1)th order kerel (but we allow for the possibility that xk+1 K(x)dx = 0). 9 Coditio (vi) esures that if x I f X (x) > 0 (sice f X is cotiuous) ad if x I E[{Y g(x)} 2 W = x]f W (x) > 0 (see the proof of Lemma A.4(ii)). Note that sice gf X is bouded, we have that g I gf X I / if x I f X(x) <. It is worth metioig that uder these coditios, we have that s 2 (x) = Var({Y g(x)}k ((x W )/h )) h 2α+1 uiformly i x I (see Lemma A.4), ad the right had side is larger by factor h 2α tha the correspodig term i the errorfree case (recall that i stadard kerel regressio without measuremet errors, the variace of UK((x X)/h ) is h ). This results i slower rates of covergece of kerel regressio estimators i presece of measuremet errors tha those i the errorfree case, ad the value of α is a key parameter that cotrols the difficulty of estimatig g, 7 ecall that if a radom variable η has characteristic fuctio ϕη, the η η for a idepedet copy η of η has characteristic fuctio ϕ η 2. 8 Covolutios of ordiary smooth ad supersmooth desities are supersmooth, but mixtures of ordiary smooth ad supersmooth desities are ordiary smooth. 9 I the simulatio studies, we will use a flap top kerel (McMurry ad Politis, 2004), which is a ifiite order kerel.
15 amely, the larger the value of α is, the more difficult estimatio of g will be. I other words, the value of α quatifies the degree of illposedess of estimatio of g. Coditio (vii) restricts the badwidth h ad the sample size m from the measuremet error distributio. The secod coditio i (3.1) allows m to be of smaller order tha, which i particular covers the pael data setup discussed i Example 2.1. The last coditio i (3.1) meas that we are choosig udersmoothig badwidths, that is, choosig badwidths that are of smaller order tha optimal rates for estimatio of g. Ispectio of the proof of Theorem 3.1 shows that without the last coditio i (3.1), we have that ĝ g I = O P {h α (h ) 1/2 log(1/h )} + O(h β ), where the O(h β ) term comes from the determiistic bias. So, choosig h (/ log ) 1/(2α+2β+1) optimizes the rate o the right had side, ad the resultig rate of covergece of ĝ g I is O P {(/ log ) β/(2α+2β+1) }. The last coditio i (3.1) requires to choose h of smaller order tha (/ log ) 1/(2α+2β+1) (by log factors), so that the variace term domiates the bias term. We will later discuss the problem of bias after presetig the theorems (see emark 3.3). For Coditio (vii) to be ovoid, we require β > 1/2. We first state a theorem that establishes that, uder Assumptio 3.1, the distributio of Ẑ I = sup x I Ẑ(x), where {Ẑ(x) : x I} is defied i (2.4), ca be approximated by that of the supremum of a certai Gaussia process, which is a buildig block for provig validity of the proposed cofidece bad. ecall that a Gaussia process {Z(x) : x I} idexed by I is a tight radom variable i l (I) if ad oly if I is totally bouded for the itrisic pseudometric ρ 2 (x, y) = E[{Z(x) Z(y)} 2 ] for x, y I, ad Z has sample paths almost surely uiformly ρ 2 cotiuous; see va der Vaart ad Weller (1996, p.41). Theorem 3.1 (Gaussia approximatio). Uder Assumptio 3.1, for each sufficietly large, there exists a tight Gaussia radom variable Z G i l (I) with mea zero ad the same covariace fuctio as Z, ad such that as, { } sup P Ẑ I z P { Z G I z } 0. (3.2) z Theorem 3.1 derives a itermediate Gaussia approximatio to the process Ẑ, i the sese that the approximatig Gaussia process Z G depeds o the sample size. It could be possible to further show that, if I is ot sigleto, uder additioal coditios, for some sequeces a > 0 ad b, a ( ẐG I b ) coverges i distributio to a Gumbel distributio. However, while it is mathematically itriguig, we avoid to use the Gumbel approximatio, sice 1) the Gumbel approximatio is slow ad the coverage error of the resultig cofidece bad is of order 1/ log (see Hall, 1991), ad 2) derivig the Gumbel approximatio would require additioal restrictive coditios o the measuremet error distributio. For example, i a problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) derive a 15
16 16 K. KATO AND Y. SASAKI Gumbel approximatio to the supremum deviatio of the decovolutio kerel desity estimator, thereby establishig a SmirovBickeloseblatt type theorem (Smirov, 1950; Bickel ad oseblatt, 1973) for the decovolutio kerel desity estimator. But to do so, they require more restrictive coditios o the measuremet error distributio tha those i the preset paper (see their Assumptio 2). The followig theorem shows asymptotic validity of the proposed cofidece bad. Theorem 3.2 (Validity of multiplier bootstrap cofidece bad). Uder Assumptio 3.1, as, { Ẑξ } sup P I z D P { Z G I z } P 0, (3.3) z where Z G is a Gaussia radom variable i l (I) give i Theorem 3.1. Therefore, for the cofidece bad Ĉ1 τ defied i (2.6), we have as, { } P g(x) Ĉ1 τ (x) x I = 1 τ + o(1). (3.4) Fially, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 3.1. Ispectio of the proof shows that the result (3.4) holds eve whe τ = τ 0 as. Furthermore, the supremum width of the bad is O P {h α (h ) 1/2 log(1/h ) log(1/τ )}. emark 3.2. If we take h = v (/ log ) 1/(2α+2β+1) for v (log ) 1, the the supremum width of the bad Ĉ1 τ is (/ log ) β/(2α+2β+1) (log ) α+1/2. emark 3.3 (Bias). For ay oparametric iferece problem, how to deal with the determiistic bias is a delicate ad difficult problem. See Sectio 5.7 i Wasserma (2006) for related discussios. I the preset paper, we employ udersmoothig badwidths so that the bias is egligible relative to the variace part. A alterative approach is to estimate the bias at each poit, ad costruct a bias correct cofidece bad. See, for example, Eubak ad Speckma (1993) ad Xia (1998) for the errorfree case. 10 However, i EIV regressio, estimatio of the bias is ot quite attractive for a couple of reasos. First, the bias cosists of higher order derivatives of g ad f X, ad estimatio of these higher order derivatives is difficult, especially i the EIV case. This is because estimatio of g ad f X is a illposed iverse problem ad rates of covergece of the derivative estimators of g ad f X are eve slower tha those i the errorfree case. Secod, oe of popular kerels used i EIV regressio ad decovolutio is a flap top kerel (McMurry ad Politis, 2004) which is a ifiite order kerel, ad if we use a flap top kerel, the the bias 10 More recet discussios regardig the problem of bias i oparametric iferece problems iclude Hall ad Horowitz (2013), Cherozhukov et al. (2014b), Armstrog ad Kolesár (2014), Caloico et al. (2015), ad Scheach (2015). These paper do ot cover EIV regressio.
17 is ot calculated i a closed form. 11 See emark 1 i Bissatz et al. (2007) for a related issue i the decovolutio case. emark 3.4 (Supersmooth case). I the preset paper, we focus o the case where the measuremet error desity is ordiary smooth, similarly to Bissatz et al. (2007), SchmidtHieber et al. (2013), ad Delaigle et al. (2015) that study iferece i decovolutio ad oparametric EIV regressio. If the measuremet error desity is supersmooth, i.e., its characteristic fuctio decays expoetially fast as t, the 1) i view of the poitwise asymptotic ormality result i Fa ad Masry (1992), the asymptotic behavior of the variace fuctio s 2 (x) is much more complex; 2) miimax rates of covergece for estimatio of g uder the suporm loss are logarithmically slow (i.e., of the form (log ) c for some costat c > 0), eve whe the measuremet error distributio is assumed to be kow (Fa ad Truog, 1993). These difficulties prevet us from directly extedig our aalysis to the supersmooth case. Hece the supersmooth case is left for future research. The proofs of Theorems 3.1 ad 3.2 build o otrivial applicatios of the itermediate Gaussia ad multiplier bootstrap approximatio theorems developed i Cherozhukov et al. (2014a,b, 2016). However, we stress that Theorems 3.1 ad 3.2 do ot follow directly from the geeral theorems i Cherozhukov et al. (2014a,b, 2016) ad require substatial work. This is because 1) first of all, how to device a multiplier bootstrap i EIV regressio is ot apparet, ad as discussed i emark 2.1 our costructio of the multiplier process appears to be ovel; 2) the populatio decovolutio kerel K is implicitly defied via the Fourier iversio ad substatially differet from stadard kerels i the errorfree case; ad 3) the decovolutio kerel K is i fact ukow ad estimated, so that its estimatio error has to be take ito accout. A alterative stadard techique to derive Gaussia approximatios similar to (3.2) is to apply the KomlósMajorTusády (KMT) strog approximatio (Komlós et al., 1975). problem of costructig cofidece bads i decovolutio with kow error distributio, Bissatz et al. (2007) (ad SchmidtHieber et al. (2013)) use the KMT approximatio to derive Gaussia approximatios to the decovolutio kerel desity estimator. However, the KMT approximatio is tailored to empirical processes idexed by uivariate fuctios ad hece is ot applicable to our problem. Alteratively, we ca use io s couplig (see io, 1994), but to apply io s couplig, we would have to assume (at least) that Y is bouded (rather tha fiite fourth momet) ad K has total variatio of order h α (which requires additioal coditios o the measuremet error distributio). By employig the techiques developed i Cherozhukov et al. (2014a,b, 2016), we are able to avoid such restrictive coditios. 17 I a 11 For example, Scheach (2004) ad Bissatz et al. (2007) use flap top kerels i their simulatio studies.
18 18 K. KATO AND Y. SASAKI 4. Badwidth selectio The theory developed i the previous sectio prescribes admissible rates for the badwidth h that require udersmoothig. The literature provides datadrive approaches to badwidth selectio, which typically aim at miimizig the MISE (cf. Delaigle ad Hall, 2008). These datadrive approaches ted to yield oudersmoothig rates for the badwidth, ad are cotrary to our requiremets. I this light, we propose here a ovel alterative approach to the badwidth selectio. To emphasize the depedece o a arbitrary cadidate badwidth h > 0, write s 2 (x; h) = Var({Y g(x)}k ((x W )/h; h)), A (x; h) = E[{Y g(x)}k ((x W )/h; h)], K (x; h) = 1 e itx ϕ K(t) 2π ϕ ε (t/h) dt. ad Note that A (x) = A (x; h ), s 2 (x) = s (x; h ), ad K (x) = K (x; h ). A optimal choice h (igorig the log factor) balaces the uiform squared bias A 2 ( ; h) I ad the uiform variace s 2 ( ; h)/ I, i.e., h A2 ( ; h) I = h=h h s2 ( ; h)/ I h=h A atural way of udersmoothig is to choose the smallest h > 0 such that c h A2 ( ; h) I h s2 ( ; h)/ I for some c > 1 where c is icreasig i. We will try alterative sequeces {c } =1 i the subsequet simulatio studies to recommed practical choices. I practice, we do ot kow g or the distributio of (Y, X). For g to be used for badwidth selectio, we use a polyomial regressio g uder EIV, e.g., g(x) = g 0 + g 1 x where ) ( ( g0 = g W j 1 W j 1 W 2 j m 1 m η2 j ) 1 ( 1 Y ) j 1 W. jy j A polyomial of degree three will be employed throughout i the simulatio studies. We make a grid 0 < h,1 <... < h,j of cadidate badwidths, ad the choose h,j with the smallest j {2,..., J} such that c ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ),
19 where ŝ 2 (x; h) = 1 Â (x; h) = 1 K (x; h) = 1 2π {Y j g(x)} 2 K2 ((x W j )/h; h) Â2 (x; h), {Y j g(x)} K ((x W j )/h; h), ad e itx ϕ K(t) ϕ ε (t/h) dt Because we use the fiite sample estimates, either ( Â2 ( ; h,j ) I Â2 ( ; h,j 1 ) I ) ( ŝ 2 ( ; h,j )/ I ŝ 2 ( ; h,j 1 )/ I ) eeds to be mootoe icreasig i the idex j i geeral. As ( Â2 such, we mootoize these differeces ) of the estimates i the followig maer. Let ( ; h,j ) I Â2 ( ; h,j 1 ) I ad ŝ 2,j = ( ŝ 2 ( ; h,j )/ I ŝ 2 ) ( ; h,j 1 )/ I. Â,j = The mootoizatio algorithm executes the followig assigmets i the icreasig order of j: Â,j if Â,j > Â,j+1 Â,j+1 := ad Â,j+1 if Â,j Â,j+1 ŝ 2 ŝ 2,j if ŝ 2,j,j+1 := < ŝ2,j+1. ŝ 2,j+1 if ŝ 2,j ŝ2,j+1 emark 4.1. The above guide to badwidth selectio applies to the case of α > 1/2. We could accommodate the case of α 1/2 if we modify this method by replacig s 2 (x; h), A (x; h), ŝ 2 (x; h) ad Â(x; h) by s 2 (x; h)/h 2, A (x; h)/h, ŝ 2 (x; h)/h 2 ad Â(x; h)/h, respectively. We implemeted simulatio studies uder both of these two alterative methods of badwidth selectio, ad foud that the method described above shows superior performaces i terms of the distace betwee omial ad simulated coverage probabilities for the data geeratig models that we cosider. Therefore, we oly suggest the method which we describe above, ad preset simulatio studies below oly for this versio of badwidth selectio rule. 5. Simulatio studies 5.1. Simulatio Framework. We cosider two data geeratig models, reflectig two commo patters of data availability. For the first model, the data D = {(Y j, W j, η j )} is costructed by Model 1 Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) d W j = X j + ε j ε j = ηj Laplace (0, 2 1/2 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε j, ad η j are mutually idepedet. The characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /2) 1, which is ovaishig o ad ordiary smooth of order α = 2. The sigaltooise ratio is Var(X)/ Var(ε) = σ X. 19 or
20 20 K. KATO AND Y. SASAKI For the secod model, we cosider the followig repeated measuremet or pael data setup. Y j = g(x j ) + U j X j N(0, σx 2 ) ad U j N(0, 1) Model 2 W (1) j = X j + ε (1) j ε (1) j Laplace (0, 2 1 ) W (2) j = X j + ε (2) j ε (1) j Laplace (0, 2 1 ) for j = 1,...,, where the primitive latet variables, X j, U j, ε (1) j, ad ε (2) j are mutually idepedet. We observe {(Y j, W (1) j, W (2) j )}. By defiig W j := (W (1) j + W (2) j )/2 ad η j := (W (1) j W (2) j )/2, we obtai the geerated data D = {(Y j, W j, η j )} such that W j = X j + ε j with ε j = (ε 1 +ε 2 )/2 d = η j. For Model 2, the characteristic fuctio of ε j is ϕ ε (t) = (1+t 2 /16) 2, which is ovaishig o ad ordiary smooth with order α = 4. The sigaltooise ratio is give by Var(X)/ Var(ε) = 2σ X. Simulatios are ru across five differet specificatios of g, ad alterative values of the sigaltooise ratio σ X {2, 4}. The five specificatios of g are g(x) = x, g(x) = x 2, g(x) = x 3, g(x) = si(x), ad g(x) = cos(x). We use Mote Carlo simulatios to evaluate the coverage probabilities of our cofidece bads for g o the iterval I = [ σ X, σ X ]. We use the kerel fuctio K defied by its Fourier trasform ϕ K give by 1 if t c { } ϕ K (t) = exp b exp( b/( t c) 2 ) if c < t < 1 ( t 1) 2 0 if 1 t where b = 1 ad c = 0.05 (cf. McMurry ad Politis, 2004; Bissatz et al., 2007). The fuctio ϕ K is ifiitely differetiable with support [ 1, 1], ad its iverse Fourier trasform K is realvalued ad itegrable with K(x)dx = 1. We follow the badwidth selectio rule discussed i Sectio 4. I this simulatio study, we try alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) Simulatio esults. Tables 1, 2, 3, 4, ad 5 show simulatio results for g(x) = x, x 2, x 3, si(x), ad cos(x), respectively. Each table cotais results for each of Model 1 ad Model 2, for each of the three sample sizes = 250, 500, ad 1000, ad for each of σ X = 2.0 ad 4.0 that cotrols the sigaltooise ratio. Simulated coverage probabilities are reported for each of the three omial coverage probabilities, 0.800, 0.900, ad I all the cases, simulated coverage probabilities are reasoably close to the desiged omial coverage probabilities for large sample sizes. I particular, the results for polyomial specificatios exhibit a very high coverage accuracy. The high performace for the polyomial specificatios may well be imputed to our method of badwidth selectio which relies o a prelimiary polyomial regressio uder EIV. However, it is otable that the coverage accuracy is reasoably high eve for opolyomial periodic fuctios like g(x) = si(x) ad g(x) = cos(x).
21 There seems o systematic patter as to which of the alterative sequeces {c } =1 across c = (/100) 0.1, c = (/100) 0.3, ad c = (/100) 0.5 ted to yield better coverage results. As such, we recommed the itermediate choice c = (/100) 0.3 as a practical guidelie eal data aalysis Accordig to Ceters for Disease Cotrol ad Prevetio (CDC) of the US Departmet of Health ad Huma Services, more tha oethird (36.5%) of US adults have obesity (defied by body mass idex or BMI > 30) i the period betwee 2011 ad 2014 (Ogde et al., 2015). The estimated aual medical cost of obesity i the Uited States was 147 billio 2008 U.S. dollars, with the medical costs for people who are obese beig $1,429 higher tha those of ormal weight (Fikelstei et al., 2009). While there is a extesive body of literature o cost estimatio of obesity, it is a limitatio that commoly used data sets cotai oly selfreported body measures, ad hece the values of BMI geerated from them are proe to biases (Boud, et al., 2001). More recetly, Cawley ad Meyerhoefer (2012) use the istrumetal variable approach to address this issue i cost estimatio of obesity. I this sectio, we employ our data combiatio approach to treat the selfreportig errors, ad draw cofidece bads for oparametric regressios of medical costs o BMI. We focus o costs measured by medical expeditures. With this said, we ote that there are also idirect costs of obesity which we do ot accout for, e.g., the costs of obesity are kow to be passed o to obese workers with employersposored health isurace i the form of lower cash wages ad labor market discrimiatio agaist obese job seekers by isuraceprovidig employers (Bhattacharya ad Budorf, 2009) see also Cawley (2004). Details of the two data sets which we combie are as follows. The Natioal Health ad Nutritio Examiatio Survey (NHANES) of CDC cotais data of survey resposes, medical examiatio results, ad laboratory test results. The survey resposes iclude demographic characteristics, such as geder ad age. I additio to the demographic characteristics, the survey resposes also cotai selfreported body measures ad selfreported health coditios. Amog the self reported body measures are height i iches ad weight i pouds. These two variables allow us to costruct the BMI i lbs/i 2 as a geerated variable. We covert this uit ito the metric uit (kg/m 2 ). The NHANES also cotais medical examiatio results, icludig cliically measured BMI i kg/m 2. We treat the BMI costructed from the selfreported body measures as W j, ad the cliically measured BMI as X j. From the NHANES as a validatio data set of size m, we ca compute η j = W j X j for each j = 1,..., m. The Pael Survey of Icome Dyamics (PSID) is a logitudial pael survey of America families coducted by the Survey esearch Ceter at the Uiversity of Michiga. This data set cotais a log list of variables icludig demographic characteristics, socioecoomic attributes, expeses, ad health coditios, amog others. I particular, the PSID cotais selfreported body measures of the household head, icludig height i iches ad weight i pouds. These
22 22 K. KATO AND Y. SASAKI two variables allow us to costruct the body mass idex (BMI) i lbs/i 2 as a geerated variable. Agai, we covert this uit ito the metric uit (kg/m 2 ). The PSID also cotais medical ad prescriptio expeses. We treat the BMI costructed from the selfreported body measures as W j, ad the medical ad prescriptio expeses as Y j. We ote that the iformatio cotaied i the PSID are mostly at the household level, as opposed to the idividual level, ad thus Y j idicates the total medical ad prescriptio expeses of household j. To focus o the idividual medical ad prescriptio expeses rather tha household expeses, we oly cosider the subsample of the households of sigle me with o depedet family, for which the total medical ad prescriptio expeses of the household equal to the idividual medical ad prescriptio expeses of the household head. Hece, the reported regressio results cocer these selected subpopulatios. Combiig the NHANES of size m ad the PSID of size, we obtai the geerated data D = {Y 1,..., Y, W 1,..., W, η 1,..., η m } to which we ca apply our method i order to draw cofidece bads for the regressio fuctio g of the model Y = g(x) + U with E[U X, ε] = 0. We set I = [15, 35] as the iterval o which we draw cofidece bads. This iterval I has 25 (the WHO cutoff poit for overweight) as the midpoit, ad is cotaied i the covex hull of the empirical support of W. The kerel fuctio ad the badwidth rule carry over form our simulatio studies. The sequece {c } =1 used for badwidth choice is defied by c = (/100) 0.3 followig the recommedatio which we made from our simulatio results. To accout for the differet medical coditios across ages, we categorize the sample ito the followig subsamples: (a) male idividuals aged 20 34, (b) male idividuals aged 35 49, (c) male idividuals aged 50 64, ad (d) male idividuals aged 65 or above. Note that this stratificatio takes ito accout the fact that 64 ad 65 make the cutoff of medicare eligibility, ad hece that group (d) faces differet expediture schedules ad differet ecoomic icetives of health care utilizatio from groups (a) (c) see Card et al. (2008). After deletig observatios with missig fields from the NHANES , we obtai the followig sample sizes of these four subsamples: (a) m = 407, (b) m = 435, (c) m = 407, ad (d) m = 431. After deletig observatios with missig fields from the PSID 2009 for total medical expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 413, (b) = 181, (c) = 180, ad (d) = 64. Similarly, after deletig observatios with missig fields from the PSID 2009 for prescriptio expeses as the depedet variable Y, we obtai the followig sample sizes of these four subsamples: (a) = 528, (b) = 243, (c) = 247, ad (d) = 106. Note that we use similar survey periods aroud 2009 for both the NHANES ad PSID to remove potetial time effects. Figure 1 displays estimates ad cofidece bads for total medical expeses i 2009 US dollars as the depedet variable. Figure 2 similarly displays estimates ad cofidece bads for prescriptio expeses i 2009 US dollars as the depedet variable. I both figures, the estimates
23 are idicated by solid black curves. The areas shaded by grayscaled colors idicate 80%, 90%, ad 95% cofidece bads. The four parts of the figure represet (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. We see that the levels of both total medical expeses ad prescriptio expeses ted to icrease i age, as expected. For the groups (a) (b) of youg me, both total medical expeses ad ad prescriptio expeses exhibit little partial correlatio with BMI. For the group (c) of middle aged me, o the other had, the relatios tur ito positive oes. For the group (d) of seior me, total medical expeses ad BMI cotiue to have a positive relatioship, but prescriptio expeses exhibit little partial correlatio with BMI. If we look at the 90% cofidece bad for the group (c) of me aged from 50 to 64, aual average total medical expeses are approximately $5,399 $17,015 if BMI = 20, approximately $7,316 $18,119 if BMI = 25, ad approximately $7,868 $21,934 if BMI = 30. Likewise, aual average prescriptio expeses are approximately $283 $636 if BMI = 20, approximately $372 $761 if BMI = 25, ad approximately $429 $951 if BMI = 30. These cocrete umbers illustrate that cofidece bads are useful to make iterval predictios of icurred average costs, ad this coveiet feature has practical values added to the existig methods which oly allow for reportig estimates with ukow extets of ucertaities. 7. Extesios 7.1. Applicatio to specificatio testig. The results of the preset paper ca be used for specificatio testig of the regressio fuctio g. 23 Specificatio testig i EIV models is importat sice oparametric estimatio of a regressio fuctio has slow rates of covergece, eve slower tha stadard errorfree oparametric regressio, while correct specificatio of a parametric model eables us to estimate the regressio fuctio with faster rates, ofte of oder 1/. Suppose that we wat to test whether the regressio fuctio g belogs to a parametric class {g θ : θ Θ} where Θ is a subset of a metric space (i most cases a Euclidea space). Popular specificatios of g iclude liear ad polyomial fuctios. I cases where g is liear or polyomial, it is possible to estimate the coefficiets with rate uder suitable regularity coditios (Fuller, 1987; Cha ad Mak, 1985; Hausma et al., 1991; Cheg ad Scheeweiss, 1998). Suppose ow that g = g θ for some θ Θ ad θ ca be estimated by θ with a sufficietly fast rate, i.e., g g θ θ I = o P {h α {h log(1/h )} 1/2 }, ad that Assumptio 3.1 is satisfied with g = g θ. The it is ot difficult to see from the proof of Theorem 3.2 that f X (x) h (ĝ(x) g θ(x)) ŝ (x) uiformly i x I, so that = f X (x) h (ĝ(x) g θ (x)) ŝ (x) = f X (x) h (ĝ(x) g θ (x)) ŝ (x) { } P g θ(x) / Ĉ1 τ (x) for some x I τ. + f X (x) h (g θ (x) g θ(x)) ŝ (x) + o P {(log(1/h )) 1/2 },
24 24 K. KATO AND Y. SASAKI Therefore, the test that rejects the hypothesis that g = g θ for some θ Θ if g θ(x) / Ĉ1 τ (x) for some x I is asymptotically of level τ. We summarize the above discussio as a corollary. Corollary 7.1. Suppose that g = g θ for some θ Θ where Θ is a subset of a metric space, ad that Assumptio 3.1 is satisfied with g = g θ. Let θ be ay estimator of θ such that g θ g θ I = o P {h α {h log(1/h )} 1/2 }; the P{g θ(x) / Ĉ1 τ (x) for some x I} τ. emark 7.1 (Literature o specificatio testig i EIV regressio). The literature o specificatio testig for EIV regressio is large. See Zhu et al. (2003), Zhu ad Cui (2005), Hall ad Ma (2007), Sog (2008), Otsu ad Taylor (2016), ad refereces therei. However, oe of those papers cosiders L based specificatio tests Additioal regressors without measuremet errors. I practical applicatios, we may have additioal regressors Z, possibly vector valued, without measuremet errors. Suppose that we are iterested i estimatio ad makig iferece o g(x, z) = E[Y X = x, Z = z]. We assume that E[Y g(x, Z) X, Z, ε] = 0, ad ε is idepedet from X coditioally o Z. I priciple, the aalysis ca be reduced to the case where there are o additioal regressors by coditioig o Z = z. If Z is discretely distributed with fiitely may mass poits, the g(x, z), where z is a mass poit, ca be estimated by usig oly observatios j for which Z j = z. If Z is cotiuously distributed, the g(x, z) ca be estimated by usig observatios j for which Z j is close to z, which ca be implemeted by usig kerel weights. However, the detailed aalysis of this case is ot preseted here for brevity Cofidece bads for coditioal distributio fuctios. The techiques used to derive cofidece bads for the coditioal mea i EIV regressio ca be exteded to the coditioal distributio fuctio. Suppose ow that we are iterested i costructig cofidece bads for the coditioal distributio fuctio g(y, x) = P(Y y X = x) o a compact rectagle J I where J ad I are compact itervals, ad where we do ot observe X but istead observe W = X + ε with ε (measuremet error) beig idepedet of (Y, X). As before, we assume that i additio to a idepedet sample {(Y 1, W 1 ),..., (Y, W )} o (Y, W ), there is a idepedet sample {η 1,..., η m } from the measuremet error distributio. Sice g(y, x) = E[1(Y y) X = x] where 1( ) deotes the idicator fuctio, we may estimate g(y, x) by ĝ(y, x) = µ(y, x)/ f X (x), where µ(y, x) = 1 h 1(Y j y) K ((x W j )/h ). To costruct a cofidece bad for g(y, x), we apply the methodology developed i Sectio 2 with Y j replaced by 1(Y j y) for each y. Let ŝ 2 (y, x) = 1 {1(Y j y) ĝ(y, x)} 2 K2 ((x W j )/h ),
25 ad geerate idepedet stadard ormal radom variables ξ 1,..., ξ idepedet of the data D. Cosider the multiplier stochastic process Ẑ(y, ξ 1 x) = ŝ (y, x) ξ j {1(Y j y) ĝ(y, x)} K ((x W j )/h ), ad for τ (0, 1), let ĉ (1 τ) = coditioal (1 τ)quatile of Ẑξ J I give D. The the resultig cofidece bad for g(y, x) o J I is give by [ ] Ĉ 1 τ (y, x) = ĝ(y, x) ± ŝ(y, x) f X (x) ĉ (1 τ), (y, x) J I. h We make the followig assumptio, which is aalogous to Assumptio 3.1. Assumptio 7.1. Let I, J be compact itervals i. (i) The fuctio (y, w) P(Y y W = w)f W (w) is cotiuous i w uiformly i y J. (ii) The characteristic fuctio of X, ϕ X (t) = E[e itx ], t, is itegrable o. Furthermore, sup y J E[g(y, X)eitX ] dt <. (iii) Coditio (iii) i Assumptio 3.1. (iv) The fuctios f X ad g(y, )f X ( ) belog to Σ(β, B) for some β > 1/2 ad B > 0 for all y J. Let k deote the iteger such that k < β k + 1. (v) Coditio (v) i Assumptio 3.1. (vi) For all x I, f X (x) > 0, ad if (y,x) J I E[{1(Y y) g(y, x)} 2 W = x]f W (x) > 0. (vii) Coditio (vii) i Assumptio 3.1. Theorem 7.1. Uder Assumptio 7.1, as, P{g(y, x) Ĉ1 τ (y, x) (y, x) J I} 1 τ. Furthermore, the supremum width of the bad Ĉ1 τ is O P {h α (h ) 1/2 log(1/h )}. emark 7.2. To the best of our kowledge, Theorem 7.1 is also a ew result. 8. Coclusio I this paper, we develop a method to costruct uiform cofidece bads for oparametric EIV regressio fuctio g. We cosider the practically relevat case where the distributio of the measuremet error is ukow. We assume that there is a idepedet sample from the measuremet error distributio, where the sample from the measuremet error distributio eed ot be idepedet from the sample o respose ad predictor variables. Such a sample from the measuremet error distributio is available if there is, for example, either 1) validatio data or 2) repeated measuremets (pael data) o the latet predictor variable with measuremet errors, oe of which is symmetrically distributed. We establish asymptotic validity of the proposed cofidece bad for ordiary smooth measuremet error desities, showig that the proposed cofidece bad cotais the true regressio fuctio with probability approachig the omial coverage probability. To the best of our kowledge, this is the first paper to derive asymptotically valid uiform cofidece bads for oparametric EIV regressio. We also propose a practical 25
26 26 K. KATO AND Y. SASAKI method to choose a udersmoothig badwidth for valid iferece. Simulatio studies verify the fiite sample performace of the proposed cofidece bad. Fially, we discuss extesios of our results to specificatio testig, cases with additioal regressors without measuremet errors, ad cofidece bads for coditioal distributio fuctios.
27 27 Appedix A. Proofs A.1. Techical tools. I this sectio, we collect techical tools that will be used i the proofs of Theorems 3.1 ad 3.2. The proofs rely o moder empirical process theory. For a probability measure Q o a measurable space (S, S) ad a class of measurable fuctios F o S such that F L 2 (Q), let N(F, Q,2, δ) deote the δcoverig umber for F with respect to the L 2 (Q) semiorm Q,2. The class F is said to be poitwise measurable if there exists a coutable subclass G F such that for every f F there exists a sequece g m G with g m f poitwise. A fuctio F : S [0, ) is said to be a evelope for F if F (x) sup f F f(x) for all x S. See Sectio 2.1 i va der Vaart ad Weller (1996) for details. Lemma A.1 (A useful maximal iequality). Let X, X 1,..., X be i.i.d. radom variables takig values i a measurable space (S, S), ad let F be a poitwise measurable class of (measurable) realvalued fuctios o S with measurable evelope F. Suppose that there exist costats A e ad V 1 such that sup N(F, Q,2, ε F Q,2 ) (A/δ) V, 0 < δ 1, Q where sup Q is take over all fiitely discrete distributios o S. Furthermore, suppose that 0 < E[F 2 (X)] <, ad let σ 2 > 0 be ay positive costat such that sup f F E[f 2 (X)] σ 2 E[F 2 (X)]. Defie B = E[max 1 j F 2 (X j )]. The E 1 {f(x j ) E[f(X)]} F ( C V σ 2 log where C > 0 is a uiversal costat. A ) ( E[F 2 (X)] + V B log σ A ) E[F 2 (X)], σ Proof. See Corollary 5.1 i Cherozhukov et al. (2014a). Lemma A.2 (A auxiliary maximal iequality). Let ζ 1,..., ζ be radom variables such that E[ ζ j r ] < for all j = 1,..., for some r 1. The [ ] E max ζ j 1/r max (E[ ζ j r ]) 1/r. 1 j 1 j Proof. This iequality is well kow, ad follows from Jese s iequality. Ideed, E[max 1 j ζ j ] (E[max 1 j ζ j r ]) 1/r ( E[ ζ j r ]) 1/r 1/r max 1 j (E[ ζ j r ]) 1/r. The followig aticocetratio iequality for the supremum of a Gaussia process will play a crucial role i the proofs of Theorems 3.1 ad 3.2.
28 28 K. KATO AND Y. SASAKI Lemma A.3 (Aticocetratio for the supremum of a Gaussia process). Let T be a oempty set, ad let X = (X t : t T ) be a tight Gaussia radom variable i l (T ) with mea zero ad E[Xt 2 ] = 1 for all t T. The for ay h > 0, sup P{ X T x h} 4h(1 + E[ X T ]). x Proof. See Corollary 2.1 i Cherozhukov et al. (2014b); see also Theorem 3 i Cherozhukov et al. (2015). A.2. Proof of Theorem 3.1. I what follows, we always assume Assumptio 3.1. Before provig Theorem 3.1, we first prove some prelimiary lemmas. ecall that A (x) = E[{Y g(x)}k ((x W )/h )] ad s 2 (x) = Var({Y g(x)}k ((x W )/h )). Observe that K = O(h α ) uder our assumptio. I what follows, the otatio sigifies that the left had side is bouded by the right had side up to a positive costat idepedet of ad x. Lemma A.4. The followig bouds hold: (i) A I = O(h β+1 ). (ii) For sufficietly large, if x I s 2 (x) h 2α+1. (iii) For l = 0, 1, 2, we have sup x E[ Y K ((x W )/h ) 2+l ] = O(h (2+l)α+1 ). Proof. (i). Sice E[Y e itw ] = E[{g(X) + U}e it(x+ε) ] = ψ X (t)ϕ ε (t), we have that E[Y K ((x W )/h )] = h 2π = h 2π e itx E[Y e itw ] ϕ K(th ) dt ϕ ε (t) e itx ψ X (t)ϕ K (th )dt. Sice ψ X ( ) ad ϕ K ( h ) are the Fourier trasforms of gf X ad h 1 K( /h ), respectively, the Fourier iversio formula yields that h 2π e itx ( ψ X (t)ϕ K (th )dt = h gfx (h 1 = K( /h )) ) (x) g(w)f X (w)k((x w)/h )dw. Note that the far left ad right had sides are cotiuous i x, ad so the equality holds for all x. Likewise, we have E[K ((x W )/h )] = f X(w)K((x w)/h )dw for all x, so that A (x) = {g(w) g(x)}k((x w)/h )f X (w)dw = h {g(x h w) g(x)}f X (x h w)k(w)dw.
29 29 By the Taylor expasio, for ay x, w, {g(x h w) g(x)}f X (x h w) = k 1 (gf X ) (j) (x) g(x)f (j) X (x) ( h w) j j! + (gf X) (k) (x θh w) g(x)f (k) X (x θh w) ( h w) k, k! for some θ [0, 1]. Sice wj K(w)dw = 0 for j = 1,..., k ad f X, gf X Σ(β, B), we have {g(x h w) g(x)}f X (x h w)k(w)dw k = (gf X ) {g(x (j) (x) g(x)f (j) X h w) g(x)}f X (x h w) (x) ( h w) j K(w)dw j! (1 + g I)Bh β w β K(w) dw. k! This shows that A I = O(h β+1 ). (ii). Sice A (x) = E[{Y g(x)}k ((x W )/h )] = O(h β+1 ) uiformly i x I, it suffices to show that if E[{Y x I g(x)}2 K((x 2 W )/h )] (1 o(1))h 2α+1. Observe that E[Y W = w]f W (w) = ((gf X ) f ε ) (w) (compare the Fourier trasforms of both sides), ad defie V (x, w) = E[{Y g(x)} 2 W = w]f W (w) = (E[Y 2 W = w] + g 2 (x))f W (w) 2g(x) ((gf X ) f ε ) (w). The fuctio (gf X ) f ε is bouded ad cotiuous by boudedess of gf X. Sice E[Y 2 W = ], f W, ad (gf X ) f ε are bouded ad cotiuous o, ad g is bouded ad cotiuous o I, we have that the fuctio (x, w) V (x, w) is bouded ad cotiuous o I. I particular, sice V (x, x) > 0 for all x I uder our assumptio, we have that if x I V (x, x) > 0. Now, observe that E[{Y g(x)} 2 K((x 2 W )/h )] = V (x, w)k((x 2 w)/h )dw = h V (x, x h w)k(w)dw. 2 Furthermore, we have that K(w)dw 2 = 1 ϕ K (t) 2 dt h 2α 2π ϕ ε (t/h ) 2 by Placherel s theorem. Hece, it suffices to show that h2α {V (x, x h w) V (x, x)}k(w)dw 2 0. sup x I (A.1)
30 30 K. KATO AND Y. SASAKI From the proof of Lemma 3 i Kato ad Sasaki (2016), we have that h 2α K 2 (x) mi{1, x 2 }. By the defiitio of V (x, w), for ay ρ > 0, there exists sufficietly small δ > 0 such that V (x, x + w) V (x, x) ρ for all x I wheever w δ. Therefore, sup V (x, x h w) V (x, x) h 2α K(w)dw 2 x I ρ mi{1, w 2 }dw + 2 V I w 2 dw ρ + o(1). w δ/h w >δ/h (iii). Pick ay l = 0, 1, 2. Sice K h α, we have that E[ Y K ((x W )/h ) 2+l ] = h E[ Y 2+l W = x h w] K (w) 2+l f W (x h w)dw h lα+1 V l (x h w)k 2 (w)dw h lα+1 V l K(w)dw 2 h (2+l)α+1, where V l (w) = E[ Y 2+l W = w]f W (w). This completes the proof. Lemma A.5. ϕ ε ϕ ε [ h 1,h 1 ] = O P{m 1/2 log(1/h )}. Proof. See Lemma 4 i Kato ad Sasaki (2016); see also Theorem 4.1 i Neuma ad eiß (2009). Cosider the followig classes of fuctios F (1) = {(y, w) yk ((x w)/h ) : x }, F (2) = F (3) F (4) = { (y, w) 1 s (x) {y g(x)}k ((x w)/h ) : x I = {(y, w) {y g(x)}k((x 2 w)/h ) : x I}, { (y, w) 1 s 2 (x){y g(x)}2 K((x 2 w)/h ) : x I }, }. (A.2) I view of the fact that K h α (idepedet of ) such that K D 1 h α ad if x I s (x) h α+1/2, choose costats D 1, D 2 > 0 ad 1/s I D 2 h α 1/2. Let F (1) (y, w) = D 1 y h α, F (2) (y, w) = D 1 D 2 ( y + g I )/ h, F (3) (y, w) = D 1 ( y + g I )h 2α, F (4) (y, w) = {F (2) (y, w)} 2. Note that F (l) is a evelope fuctio for F (l) for each l = 1,..., 4. Lemma A.6. There exist costats A, v e idepedet of such that sup Q N(F (l), Q,2, δ F (l) Q,2 ) (A/δ) v, 0 < δ 1, (A.3) for all l = 1,..., 4, where sup Q is take over all fiitely discrete distributios o 2.
31 31 Proof. Cosider the followig classes of fuctios K = {w K ((x w)/h ) : x }, K 2 = {f 2 : f K }. Lemma 1 i Kato ad Sasaki (2016) ad Corollary A.1 i Cherozhukov et al. (2014a) yield that there exist costats A 1, v 1 e idepedet of such that sup Q N(K, Q,2, D 1 h α δ) (A 1 /δ) v 1 ad sup Q N(K, 2 Q,2, D1 2h 2α δ) (A 1 /δ) v 1 for all 0 < δ 1. I what follows, we oly prove (A.3) for l = 2; the proofs for the other cases are completely aalogous give the above bouds o the coverig umbers for K ad K. 2 Let H = {y {y g(x)}/s (x) : x I}, ad observe that, sice 1/s I D 2 h α 1/2, there exist costats A 2, v 2 e idepedet of such that sup Q N(H, Q,2, δ H Q,2 ) (A 2 /δ) v 2 for all 0 < δ 1, where H (y) = D 2 ( y + g I )h α 1/2 is a evelope fuctio for H. This ca be verified by a direct calculatio, or observig that H ( {y ay + b : a > 0, b }) is a VC subgraph class with VC idex at most 4 (cf. va der Vaart ad Weller, 1996, Lemma ), ad applyig Theorem i va der Vaart ad Weller (1996). Let H K := {(y, w) f 1 (y)f 2 (w) : f 1 H, f 2 K } F (2), ad ote that H (y)d 1 h α = F (2) (y, w). From Corollary A.1 i Cherozhukov et al. (2014a), there exist costats A 3, v 3 e idepedet of such that sup Q N(H K, Q,2, δ F (2) Q,2 ) (A 3 /δ) v 3 for all 0 < δ 1. Now, the desired result follows from the observatio that N(F (2), Q,2, 2δ) N(H K, Q,2, δ) for all δ > 0. Lemma A.7. We have f X ( ) E[ f X ( )] = O P {h α (h ) 1/2 log(1/h )} ad E[ f X ( )] f X ( ) = O(h β ) = o{h α (h log(1/h )) 1/2 }. Furthermore, µ ( ) E[ µ ( )] = O P {h α (h ) 1/2 log(1/h )}. Proof. The first two results are implicit i the proofs of Corollaries 1 ad 2 i Kato ad Sasaki (2016). To prove the last result, we shall apply Lemma A.1 to the class of fuctios F (1). From Lemma A.4(iii), we have that sup x E[Y 2 K((x 2 W )/h )] = O(h 2α+1 ). I view of the coverig umber boud for F (1) give i Lemma A.6, we may apply Lemma A.1 to F (1) to coclude that (h )E[ µ ( ) E[ µ ( )] ] = E {f(y j, W j ) E[f(Y, W )]} h α h log(1/h ) + h α F (1) E[ max Y 2 1 j j ] log(1/h ). From Lemma A.2, we have E[max 1 j Yj 2] = O(1/2 ), so that we have (h )E[ µ ( ) E[ µ ( )] ] h α h log(1/h ) + h α 1/4 log(1/h ) h α h log(1/h ), where the secod iequality follows from the first coditio i (3.1). This completes the proof.
32 32 K. KATO AND Y. SASAKI We are ow i positio to prove Theorem 3.1. Proof of Theorem 3.1. We divide the proof ito two steps. Step 1. Let r = h α {h log(1/h )} 1/2. We first prove that ĝ(x) g(x) = 1 1 f X (x) h uiformly i x I. [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) To this ed, we shall show that µ µ = o P (r ). First, observe from Lemma A.5 that if t h 1 ϕ ε (t) if t h 1 ϕ ε (t) O P {m 1/2 log(1/h )} (1 o P (1))h α. Let ψ Y W (t) = E[Y e itw ] = E[{g(X)+U}e it(x+ε) ] = ψ X (t)ϕ ε (t), ad let ψ Y W (t) = 1 Y je itw j. Decompose µ(x) µ (x) as µ(x) µ (x) = 1 e itx ψy W (t) ϕ K(th ) dt 1 e itx ψy W (t) ϕ K(th ) dt 2π ϕ ε (t) 2π ϕ ε (t) = 1 e itx ϕ K (th ) ψ Y W (t) ϕ ε (t) 2π {ψ X 0} ψ Y W (t) ϕ ε (t) ψ X(t)dt + 1 e itx ϕ K(th ) ψ Y W (t) ϕ ε(t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) dt 1 e itx ϕ K (th ) ψ Y W (t) 2π {ψ X 0} ψ Y W (t) ψ X(t)dt 1 e itx ϕ K(th ) ψ Y W (t)dt 2π {ψ X =0} ϕ ε (t) = 1 { } {ϕε } ψy e itx W (t) ϕ K (th ) 2π {ψ X 0} ψ Y W (t) 1 (t) ϕ ε (t) 1 ψ X (t)dt + 1 e itx ϕ { } K(th ) ϕε (t) ψ Y W (t) 2π {ψ X =0} ϕ ε (t) ϕ ε (t) 1 dt + 1 { } e itx ϕε (t) ϕ K (th ) 2π ϕ ε (t) 1 ψ X (t)dt. Hece the CauchySchwarz iequality yields that µ(x) µ (x) 2 ψ Y W (t) 2 { h 1 } {ψ X 0} [ h 1,h 1 ] ψ Y W (t) 1 ψ X (t) 2 dt ϕ ε (t) h 1 ϕ ε (t) 1 2 dt { } { h 1 } + h 2α ψ Y W (t) 2 dt ϕ ε (t) {ψ X =0} [ h 1,h 1 ] h 1 ϕ ε (t) 1 2 dt h 1 + ϕ ε (t) h 1 ϕ ε (t) 1 2 ψ X (t) dt. (A.4) We shall boud each term o the right had side. Observe that h 1 h 1 ϕ ε (t) ϕ ε (t) 1 2 h 1 dt O P (h 2α ) ϕ ε (t) ϕ ε (t) 2 dt h 1
33 33 ad the itegral o the right had side is O P {(mh ) 1 } sice h 1 h 1 h 1 E[ ϕ ε (t) ϕ ε (t) 2 ]dt m 1 h 1 dt = 2(mh ) 1. Likewise, usig the fact that ψ X is itegrable, we have that the last term o the right had side of (A.4) is O P (h 2α m 1 ). For ay t with ψ X (t) 0, we have E[ ψ Y W (t)/ψ Y W (t) 1 2 ] E[Y 2 ]/{ ψ Y W (t) 2 }, so that E {ψ X 0} [ h 1,h 1 ] ψ Y W (t) ψ Y W (t) 1 2 ψ X (t) 2 dt h 1 1 h 1 Fially, for ay t with ψ X (t) = 0, we have ψ Y W (t) = 0, so that [ ] E ψ Y W (t) 2 dt (h ) 1. {ψ X =0} [ h 1,h 1 ] 1 dt h 2α ϕ ε (t) 2 (h ) 1. Therefore, we have µ µ 2 = O P(h 4α 2 1 m 1 + h 2α m 1 ) = o P (r). 2 From Step 2 i the proof of Theorem 1 of Kato ad Sasaki (2016), it follows that f X f X = o P (r ), which i particular implies that f X f X I f X f X I + f X f X I = o P (1) so that 1/ f X I = O P (1). Furthermore, µ I E[ µ ( )] I + µ ( ) E[ µ ( )] I ψ X(t) dt + o P (1) = O P (1). Therefore, Now, observe that ĝ ĝ I 1/ f X I µ µ I + µ I 1/ f X 1/ f X I o P (r ) + O P (1) f X f X I = o P (r ). ĝ (x) g(x) = 1 f X (x) 1 h {Y j g(x)}k ((x W j )/h ). Sice A I = O(h β+1 ) = o(h r ), we have ĝ (x) g(x) = 1 1 f X (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) h uiformly i x I. Sice uiformly i x I, ad 1 h [{Y j g(x)}k ((x W j )/h ) A (x)] = µ (x) E[ µ (x)] g(x){ f X(x) E[ f X(x)]} = O P {h α (h ) 1/2 log(1/h )} 1/ f X 1/f X I O P (1) f X f X I = O P {h α (h ) 1/2 log(1/h )},
34 34 K. KATO AND Y. SASAKI we coclude that ĝ (x) g(x) = 1 1 f X (x) h [{Y j g(x)}k ((x W j )/h ) A (x)] + o P (r ) uiformly i x I. This leads to the desired result of Step 1. Furthermore, the derivatio so far yields that ĝ g I = O P {h α (h ) 1/2 log(1/h )}. Step 2. By Step 1 together with the fact that if x I s (x) h α+1/2, we have Ẑ (x) = f X(x) h (ĝ(x) g(x)) s (x) 1 = s (x) [{Y j g(x)}k ((x W j )/h ) A (x)] + o P {(log(1/h )) 1/2 } = Z (x) + o P {(log(1/h )) 1/2 } uiformly i x I. ecall the class of fuctios F (2) process idexed by F (2) : ν (f) = 1 defied i (A.2), ad cosider the empirical {f(y j, W j ) E[f(Y, W )]}, f F (2). We apply Theorem 2.1 i Cherozhukov et al. (2016) to approximate ν (2) F = Z I by the supremum of a Gaussia process. To this ed, we shall verify the coditios i Cherozhukov et al. (2016). First, from the coverig umber boud for F (2) give i Lemma A.6 ad fiiteess of the secod momet of F (2) (Y, W ), there exists a tight Gaussia radom variable G i l (F (2) ) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) }. Exted ν liearly to F (2) ( F (2) ) = {f, f : f F (2) }, ad observe that ν (2) F = sup (2) f F ( F (2) ) ν (f). Note that from Theorem i Gié ad Nickl (2016), G exteds to the liear hull of F (2) i such a way that G has liear sample paths, so that G (2) F = sup (2) f F ( F (2) ) G (f), ad i additio G has uiformly cotiuous paths o the symmetric covex hull of F (2). It is ot difficult to verify that the coverig umber of F (2) ( F (2) ) is at most twice that of F (2). I particular, {G (f) : f F (2) ( F (2) )} is a tight Gaussia radom variable i l (F (2) ( F (2) )) with mea zero ad the same covariace fuctio as {ν (f) : f F (2) ( F (2) )}. Next, observe that E[ Y K ((x W )/h ) 2+l ] h (2+l)α+1 for l = 0, 1, 2 from Lemma A.4(iii), so that sup f F (2) E[ F (2) (Y, W ) 4 ] h 2 E[ f(y, W ) 2+l ] h l/2 (E[Y 4 ] + g 4 I et al. (2016) to F (2) ( F (2) for l = 0, 1, 2. Furthermore, observe that ) h 2. Therefore, applyig Theorem 2.1 i Cherozhukov ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V havig the same distributio as
35 35 G F (2) such that { } (log ) 5/4 ν (2) F V = OP 1/4 h 1/2 + log (h ) 1/6 Now, for f,x (y, w) = {y g(x)}k ((x w)/h )/s (x), defie Z G (x) = G (f,x ), x I, = o P {(log(1/h )) 1/2 }. ad observe that Z G is a tight Gaussia radom variable i l (I) with mea zero ad the same covariace fuctio as Z such that Z G I has the same distributio as V. Sice Ẑ I V Ẑ I Z I + Z I V = o P {(log(1/h )) 1/2 }, there exists a sequece 0 such that P{ Ẑ I V > (log(1/h )) 1/2 } (which follows from the fact that the Ky Fa metric metrizes covergece i probability; see Theorem i Dudley (2002)). Observe that for ay z, P{ Ẑ I z} P{V z + (log(1/h )) 1/2 } + P{ Ẑ I V > (log(1/h )) 1/2 } = P{ Z G I z + (log(1/h )) 1/2 } +. The aticocetratio iequality for the supremum of a Gaussia process (Lemma A.3) the yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + 4 (log(1/h )) 1/2 {1 + E[ Z G I ]}. From the coverig umber boud for F (2) give i Lemma A.6, together with the facts that E[F (2) (Y, W ) 2 ] h 1 ad Var(f,x (Y, W )) = 1 for all x I, Dudley s etropy itegral boud (cf. va der Vaart ad Weller, 1996, Corollary 2.2.8) yields that which implies that E[ Z G I ] = E[ G (2) F ] log(1/(δ h ))dδ log(1/h ), P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Ẑ I z} P{ Z G I z} o(1) uiformly i z. This completes the proof. A.3. Proof of Theorem 3.2. We first prove the followig techical lemma. Lemma A.8. ŝ 2 ( )/s 2 ( ) 1 I = o P {(log(1/h )) 1 }. Proof. Observe that {Y j ĝ(x)} 2 K2 ((x W j )/h ) = {Y j g(x)} 2 K 2 ((x W j )/h ) + {g(x) ĝ(x)} 2 K 2 ((x W j )/h ) + 2{g(x) ĝ(x)}{y j g(x)}k 2 ((x W j )/h ) + {Y j ĝ(x)} 2 { K 2 ((x W j )/h ) K 2 ((x W j )/h )},
36 36 K. KATO AND Y. SASAKI so that 1 {Y j ĝ( )} 2 K2 (( W j )/h ) 1 {Y j g( )} 2 K 2 (( W j )/h ) I O P (h 2α ) ĝ g 2 I + 2 ĝ g I 1 {Y j g( )}K 2 (( W j )/h ) + 2 (Y 2 j + ĝ 2 I) K 2 K 2. (A.5) I From Step 1 i the proof of Theorem 3.1, ĝ g I = O P {h α (h ) 1/2 log(1/h )}, so that the first term o the right had side of (A.5) is O P {h 4α (h ) 1 log(1/h )}. Sice K K 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt O P (h 2α ) ϕ ε (t/h ) ϕ ε (t/h ) ϕ K (t) dt we have that = O P (h 2α m 1/2 ), K 2 K 2 K K K + K = O P (h 3α m 1/2 ), which implies that the last term o the right had side o (A.5) is O P (h 3α m 1/2 ). To boud the secod term, observe first that, sice E[Y W = w]f W (w) = ((gf X ) f ε ) (w) is bouded (i absolute value) by gf X, Hece, 1 E[{Y g( )}K(( 2 W )/h )] I h ( gf X + g I f W ) K 2 (w)dw h 2α+1. {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )] }{{} I I + 1 =O(h 2α+1 ) {Y j g( )}K(( 2 W j )/h ) E[{Y g( )}K(( 2 W )/h )]. I The secod term o the right had side is idetical to 1 {f(y j, W j ) E[f(Y, W )]} F (3).
37 I view of the coverig umber boud for F (3) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is 1/2 E[{F (3) (Y, W )]} 2 ] h 2α 1/2. Therefore, the right had side o (A.5) is { O P h 4α which is o P {h 2α+1 (h ) 1 log(1/h ) + h α (h ) 1/2 log(1/h )(h 2α+1 1 s 2 (x) = (log(1/h )) 1 }. Hece, sice if x I s 2 (x) h 2α+1 1 s 2 (x) {Y j ĝ(x)} 2 K2 ((x W j )/h ) 37 + h 2α 1/2 ) + h 3α m 1/2},, we have {Y j g(x)} 2 K((x 2 W j )/h ) + o P {(log(1/h )) 1 } uiformly i x I. Sice A 2 ( )/s 2 ( ) I = O(h 2α+2β+1 ), it remais to prove that 1 [ 1 s 2 {Y j g( )} 2 K(( 2 W j )/h ) E ( ) s 2 ( ) {Y g( )}2 K(( 2 W ))/h )] I = 1 {f(y j, W j ) E[f(Y, W )]} F (4) is o P {(log(1/h )) 1 }. I view of the coverig umber boud for F (4) give i Lemma A.6, together with Theorem i va der Vaart ad Weller (1996), the expectatio of the last term is This completes the proof. 1/2 E[{F (4) (Y, W )]} 2 ] h 1 1/2 = o{(log(1/h )) 1 }. Proof of Theorem 3.2. We divide the proof ito several steps. Step 1. Defie Z ξ (x) = 1 s (x) ξ j [{Y j g(x)}k ((x W j )/h ) 1 ] j =1 {Y j g(x)}k ((x W j )/h ) for x I. We first prove that sup P{ Z ξ I z D } P{ Z G I z} P 0. z To this ed, we shall apply Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ). Let ν(f) ξ = 1 ξ j {f(y j, W j ) 1 j =1 f(y j, W (2) j )}, f F.
38 38 K. KATO AND Y. SASAKI The applyig Theorem 2.2 i Cherozhukov et al. (2016) to F (2) ( F (2) ) with B(f) 0, q = 4, A 1, v 1, b h 1/2, σ 1 ad γ 1/ log, yields that there exists a radom variable V ξ of which the coditioal distributio give D is idetical to the distributio of G (2) F (= Z G I ), ad such that ν ξ F (2) { V ξ (log ) 9/4 = O P 1/4 h 1/2 + } (log )2 (h ) 1/4 = o P {(log(1/h )) 1/2 }, which shows that there exists a sequece 0 such that { } ν ξ P (2) F V ξ > (log(1/h )) 1/2 P D 0. Sice ν ξ F (2) = Z ξ I, we have P{ Z ξ I z D } P{V ξ z + (log(1/h )) 1/2 D } + o P (1) = P{ Z G I z + (log(1/h )) 1/2 } + o P (1) uiformly i z, ad the aticocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yields that P{ Z G I z + (log(1/h )) 1/2 } P{ Z G I z} + o(1) uiformly i z. Likewise, we have P{ Z ξ I z D } P{ Z G I z} o P (1) uiformly i z. Step 2. I view of the proof of Step 1, i order to prove the result (3.3), it is eough to prove that Ẑξ Z ξ I = o P {(log(1/h )) 1/2 }. To this ed, defie Z ξ (x) = for x I, ad we first prove that 1 s (x) ξ j {Y j ĝ(x)} K ((x W j )/h ) Z ξ Z ξ I = o P {(log(1/h )) 1/2 }. (A.6) We begi with otig that 1 {Y j g(x)}k ((x W j )/h ) = h { µ (x) E[ µ (x)]} h g(x){ f X(x) E[ f X(x)]} + A (x) = O P {h α+1 (h ) 1/2 log(1/h )} uiformly i x I, so that it suffices to verify that 1 s ( ) ξ j {Y j ĝ( )} K (( W j )/h ) ξ j {Y j g( )} K (( W j )/h ) I
39 is o P {(log(1/h )) 1/2 }. Sice 1/s I h α 1/2, the last term is h α 1/2 { 1/2 ξ j Y j { K (( W j )/h ) K (( W j )/h )} + ĝ g I ξ j K (( W j )/h ) + g I ξ j { K (( W j )/h ) K ((x W j )/h )} I =: h α 1/2 1/2 {I + II + III }. Step 2 i the proof of Theorem 2 i Kato ad Sasaki (2016) shows that h α 1/2 1/2 III = o P {(log(1/h )) 1/2 }. For the secod term II, observe that II ĝ g I ξ j e itw j/h ϕ K (t) ϕ ε (t/h ) dt 1 O P (h α ) ĝ g I ξ j e ity j/h 1 dt = O P {h 2α 1/2 log(1/h )}, so that h α 1/2 1/2 II = O P {h α 1 1/2 log(1/h )} = o P {(log(1/h )) 1/2 }. For the first term I, observe that I ξ j Y j e itw j/h 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) ϕ K(t) dt 1 1 ξ j Y j e itw j/h = O P ( 1/2 h 2α m 1/2 ), 2 1 I 1/2 { 1 dt 1 ϕ ε (t/h ) 1 ϕ ε (t/h ) so that h α 1/2 1/2 I = o P {(log(1/h )) 1 }. Hece we have proved (A.6). Note that the result of Step 1 ad the fact that E[ Z G I ] = O( log(1/h )) imply that 2 dt } 1/2 Z ξ I = O P ( log(1/h )), which i tur implies that Z ξ I = O P ( log(1/h )). Hece which leads to (3.3). Ẑξ Z ξ I s ( )/ŝ ( ) 1 I Z ξ I = o P {(log(1/h )) 1/2 }, Step 3. We shall prove the last two assertios of the theorem. Observe that f X (x) { } h (ĝ(x) g(x)) Ẑ(x) = { ŝ (x) f h (ĝ(x) g(x)) s (x) X (x) f X (x)} + ŝ }{{} (x) ŝ (x) 1 Ẑ (x), =:Ẑ (x) 39 I }
40 40 K. KATO AND Y. SASAKI ad the right had side is o P {(log(1/h )) 1/2 } uiformly i x I. To see this, sice ĝ g I = O P {h α (h ) 1/2 log(1/h )} ad f X f X I = O P {h α (h ) 1/2 log(1/h )} (which follows from Corollary 1 i Kato ad Sasaki (2016)), the right had side o the above displayed equatio is O P {h α (h ) 1/2 log(1/h )} O P ( log(1/h )) + o P {(log(1/h )) 1 } O P ( log(1/h )) = o P {(log(1/h )) 1/2 } uiformly i x I. Now, Theorem 3.1 ad the aticocetratio iequality for the supremum of a Gaussia process (Lemma A.3) yield that sup P{ Ẑ I z} P{ Z G I z} 0. z We are to show that P{ Ẑ I ĉ (1 τ)} 1 τ. From the result (3.3), there exists a sequece 0 such that with probability greater tha 1, sup P{ Ẑξ I z D } P{ Z G I z}, z (A.7) ad let E be the evet that (A.7) holds. Takig 0 more slowly if ecessary, we have that sup z P{ Ẑ I z} P{ Z G I z}. ecall that c G (1 τ) is the (1 τ)quatile of Z G I, ad observe that o the evet E, P{ Ẑξ I c G (1 τ + )} P{ Z G I c G (1 τ + )} = 1 τ, where the last equality holds sice the distributio fuctio of Z G I is cotiuous (which follows from Lemma A.3). Hece o the evet E, it holds that ĉ (1 τ) c G (1 τ + ), so that P{ Ẑ I ĉ (1 τ)} P{ Ẑ I c G (1 τ + )} + P{ Z G I c G (1 τ + )} + 2 = 1 τ + 3. Likewise, we have P{ Ẑ I ĉ (1 τ)} 1 τ 3, which shows that P{ Ẑ I ĉ (1 τ)} 1 τ ad thus (3.4) holds. Fially, the BorellSudakovTsirelso iequality (va der Vaart ad Weller, 1996, Lemma A.2.2) yields that c G (1 τ + ) E[ Z G I ] + 2 log(1/(τ )) log(1/h ), which implies that ĉ (1 τ) = O P ( log(1/h )). Furthermore, sup x I ŝ (x) ŝ (x) sup s (x) sup x I x I s (x) = O P(h α+1/2 ). Therefore, the supremum width of the bad Ĉ1 τ is 2 sup x I This completes the proof. ŝ (x) h ĉ (1 τ) = O P {h α (h ) 1/2 log(1/h ) }.
41 A.4. Proof of Theorem 7.1. The proof is completely aalogous to those of Theorems 3.1 ad 3.2, give the facts that g(y, x) = E[1(Y y) X = x] ad the fuctio class {1( y) : y J} is a VC class. Hece we omit the detail for brevity. 41
42 42 K. KATO AND Y. SASAKI Appedix B. Tables for Sectio 5 egressio: g(x) = x σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 1. Simulated uiform coverage probabilities of g(x) = x by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5
43 43 egressio: g(x) = x 2 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 2. Simulated uiform coverage probabilities of g(x) = x 2 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5
44 44 K. KATO AND Y. SASAKI egressio: g(x) = x 3 σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 3. Simulated uiform coverage probabilities of g(x) = x 3 by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5
45 45 egressio: g(x) = si(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 4. Simulated uiform coverage probabilities of g(x) = si(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5
46 46 K. KATO AND Y. SASAKI egressio: g(x) = cos(x) σ X = 2 σ X = 4 Nomial Sample {c } =1 {c } =1 ( Model Probability Size () ) 0.1 ( ) 0.3 ( ) 0.5 ( ) 0.1 ( ) 0.3 ( , , , , , , Table 5. Simulated uiform coverage probabilities of g(x) = cos(x) by estimated cofidece bads i I = [ σ X, σ X ] uder ormally distributed X with σ X {2, 4} ad Laplace distributed ε. Alterative sequeces {c } are used for badwidth selectio procedure. The simulated probabilities are computed for each of the three omial coverage probabilities, 80%, 90%, ad 95%, based o 2,000 Mote Carlo iteratios. ) 0.5 Appedix C. Figures for Sectio 6
47 47 (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 1. Estimates ad cofidece bads for the oparametric regressio of medical expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the medical expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by grayscaled colors idicate 80%, 90%, ad 95% cofidece bads.
48 48 K. KATO AND Y. SASAKI (a) Me Aged from 20 to 34 (b) Me Aged from 35 to 49 (c) Me Aged from 50 to 64 (d) Me Aged 65 or Above Figure 2. Estimates ad cofidece bads for the oparametric regressio of prescriptio expeses o BMI for (a) me aged from 20 to 34, (b) me aged from 35 to 49, (c) me aged from 50 to 64, ad (d) me aged 65 or above. The horizotal axes measure the BMI i kg/m 2. The vertical axes measure the prescriptio expeses i 2009 US dollars. The estimates are idicated by solid black curves. The areas shaded by grayscaled colors idicate 80%, 90%, ad 95% cofidece bads.
Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationA RANK STATISTIC FOR NONPARAMETRIC KSAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NONPARAMETRIC KSAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider ksample ad chage poit problems for idepedet data i a
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Crosssectioal data. 2. Time series data.
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationIt should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.
Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hardcopy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationMath 2784 (or 2794W) University of Connecticut
ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationOutput Analysis and RunLength Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad RuLegth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationLesson 10: Limits and Continuity
www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More informationFirst Year Quantitative Comp Exam Spring, Part I  203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I1 Part I  203A A radom variable X is distributed with the margial desity: >
More informationLecture 9: September 19
36700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio BiasVariace
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chisquare Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chisquare Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationMAS111 Convergence and Continuity
MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece
More informationR. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State
Bayesia Cotrol Charts for the Twoparameter Expoetial Distributio if the Locatio Parameter Ca Take o Ay Value Betwee Mius Iity ad Plus Iity R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationLocal Polynomial Regression
Local Polyomial Regressio Joh Hughes October 2, 2013 Recall that the oparametric regressio model is Y i f x i ) + ε i, where f is the regressio fuctio ad the ε i are errors such that Eε i 0. The NadarayaWatso
More informationRegression with an Evaporating Logarithmic Trend
Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 6 9/23/203 Browia motio. Itroductio Cotet.. A heuristic costructio of a Browia motio from a radom walk. 2. Defiitio ad basic properties
More informationGoodnessOfFit For The Generalized Exponential Distribution. Abstract
GoodessOfFit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationDimensionfree PACBayesian bounds for the estimation of the mean of a random vector
Dimesiofree PACBayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationA note on selfnormalized DickeyFuller test for unit root in autoregressive time series with GARCH errors
Appl. Math. J. Chiese Uiv. 008, 3(): 970 A ote o selformalized DickeyFuller test for uit root i autoregressive time series with GARCH errors YANG Xiaorog ZHANG Lixi Abstract. I this article, the uit
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationSingular Continuous Measures by Michael Pejic 5/14/10
Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σalgebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable
More information1 Covariance Estimation
Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationG. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan
Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity
More informationSequences and Limits
Chapter Sequeces ad Limits Let { a } be a sequece of real or complex umbers A ecessary ad sufficiet coditio for the sequece to coverge is that for ay ɛ > 0 there exists a iteger N > 0 such that a p a q
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationProblems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:
Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral LidebergFeller CLT, it is most stadard
More informationAsymptotic distribution of the firststage Fstatistic under weak IVs
November 6 Eco 59A WEAK INSTRUMENTS III Testig for Weak Istrumets From the results discussed i Weak Istrumets II we kow that at least i the case of a sigle edogeous regressor there are weakidetificatiorobust
More informationIntegrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number
MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationA constructive analysis of convexvalued demand correspondence for weakly uniformly rotund and monotonic preference
MPRA Muich Persoal RePEc Archive A costructive aalysis of covexvalued demad correspodece for weakly uiformly rotud ad mootoic preferece Yasuhito Taaka ad Atsuhiro Satoh. May 04 Olie at http://mpra.ub.uimueche.de/55889/
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationLaw of the sum of Bernoulli random variables
Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible
More informationSolution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1
Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity
More informationBasis for simulation techniques
Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationA goodnessoffit test based on the empirical characteristic function and a comparison of tests for normality
A goodessoffit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,
More informationStat 200 Testing Summary Page 1
Stat 00 Testig Summary Page 1 Mathematicias are like Frechme; whatever you say to them, they traslate it ito their ow laguage ad forthwith it is somethig etirely differet Goethe 1 Large Sample Cofidece
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCIB609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationECE 901 Lecture 4: Estimation of Lipschitz smooth functions
ECE 9 Lecture 4: Estiatio of Lipschitz sooth fuctios R. Nowak 5/7/29 Cosider the followig settig. Let Y f (X) + W, where X is a rado variable (r.v.) o X [, ], W is a r.v. o Y R, idepedet of X ad satisfyig
More informationBootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests
Joural of Moder Applied Statistical Methods Volume 5 Issue Article 5 Bootstrap Itervals of the Parameters of Logormal Distributio Usig Power Rule Model ad Accelerated Life Tests Mohammed AlHa Ebrahem
More informationON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES. 1. Introduction Basic hypergeometric series (cf. [GR]) with the base q is defined by
ON CONVERGENCE OF BASIC HYPERGEOMETRIC SERIES TOSHIO OSHIMA Abstract. We examie the covergece of qhypergeometric series whe q =. We give a coditio so that the radius of the covergece is positive ad get
More informationSequences of Definite Integrals, Factorials and Double Factorials
47 6 Joural of Iteger Sequeces, Vol. 8 (5), Article 5.4.6 Sequeces of Defiite Itegrals, Factorials ad Double Factorials Thierry DaaPicard Departmet of Applied Mathematics Jerusalem College of Techology
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More information71. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
71 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7 Sectio 1. Samplig Distributio 73 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationIntroduction to Probability. Ariel Yadin
Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways
More informationProbability and Statistics
ICME Refresher Course: robability ad Statistics Staford Uiversity robability ad Statistics Luyag Che September 20, 2016 1 Basic robability Theory 11 robability Spaces A probability space is a triple (Ω,
More informationDISTRIBUTION LAW Okunev I.V.
1 DISTRIBUTION LAW Okuev I.V. Distributio law belogs to a umber of the most complicated theoretical laws of mathematics. But it is also a very importat practical law. Nothig ca help uderstad complicated
More informationLecture 1 Probability and Statistics
Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationWeek 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed
Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father
More informationInference under shape restrictions
Iferece uder shape restrictios Joachim Freyberger Brado Reeves July 3, 207 Abstract We propose a uiformly valid iferece method for a ukow fuctio or parameter vector satisfyig certai shape restrictios.
More informationMA131  Analysis 1. Workbook 2 Sequences I
MA3  Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationAsymptotic Results for the Linear Regression Model
Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is
More informationKLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions
We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give
More informationPower Comparison of Some Goodnessoffit Tests
Florida Iteratioal Uiversity FIU Digital Commos FIU Electroic Theses ad Dissertatios Uiversity Graduate School 762016 Power Compariso of Some Goodessoffit Tests Tiayi Liu tliu019@fiu.edu DOI: 10.25148/etd.FIDC000750
More informationSTATISTICAL INFERENCE
STATISTICAL INFERENCE POPULATION AND SAMPLE Populatio = all elemets of iterest Characterized by a distributio F with some parameter θ Sample = the data X 1,..., X, selected subset of the populatio = sample
More informationPostedPrice, SealedBid Auctions
PostedPrice, SealedBid Auctios Professors Greewald ad Oyakawa 2070208 We itroduce the postedprice, sealedbid auctio. This auctio format itroduces the idea of approximatios. We describe how well this
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationGamma Distribution and Gamma Approximation
Gamma Distributio ad Gamma Approimatio Xiaomig Zeg a Fuhua (Frak Cheg b a Xiame Uiversity, Xiame 365, Chia mzeg@jigia.mu.edu.c b Uiversity of Ketucky, Leigto, Ketucky 45646, USA cheg@cs.uky.edu Abstract
More informationA) is empty. B) is a finite set. C) can be a countably infinite set. D) can be an uncountable set.
M.A./M.Sc. (Mathematics) Etrace Examiatio 01617 Max Time: hours Max Marks: 150 Istructios: There are 50 questios. Every questio has four choices of which exactly oe is correct. For correct aswer, 3 marks
More informationFUNDAMENTALS OF REAL ANALYSIS by
FUNDAMENTALS OF REAL ANALYSIS by Doğa Çömez Backgroud: All of Math 450/1 material. Namely: basic set theory, relatios ad PMI, structure of N, Z, Q ad R, basic properties of (cotiuous ad differetiable)
More informationChapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol DiscreteEvent System Simulation
Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol DiscreteEvet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.
More informationProbabilistic and Average Linear Widths in L Norm with Respect to rfold Wiener Measure
joural of approximatio theory 84, 3140 (1996) Article No. 0003 Probabilistic ad Average Liear Widths i L Norm with Respect to rfold Wieer Measure V. E. Maiorov Departmet of Mathematics, Techio, Haifa,
More informationSolutions: Homework 3
Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More informationSome Properties of the Exact and Score Methods for Binomial Proportion and Sample Size Calculation
Some Properties of the Exact ad Score Methods for Biomial Proportio ad Sample Size Calculatio K. KRISHNAMOORTHY AND JIE PENG Departmet of Mathematics, Uiversity of Louisiaa at Lafayette Lafayette, LA 705041010,
More informationA Risk Comparison of Ordinary Least Squares vs Ridge Regression
Joural of Machie Learig Research 14 (2013) 15051511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer
More informationSRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l
SRC Techical Note 1997011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/
More informationThe Gamma function Michael Taylor. Abstract. This material is excerpted from 18 and Appendix J of [T].
The Gamma fuctio Michael Taylor Abstract. This material is excerpted from 8 ad Appedix J of [T]. The Gamma fuctio has bee previewed i 5.7 5.8, arisig i the computatio of a atural Laplace trasform: 8. ft
More informationSAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction
SAMPLING LIPSCHITZ CONTINUOUS DENSITIES OLIVIER BINETTE Abstract. A simple ad efficiet algorithm for geeratig radom variates from the class of Lipschitz cotiuous desities is described. A MatLab implemetatio
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 5766 ON POINTWISE BINOMIAL APPROXIMATION BY wfunctions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationB Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets
B671672 Supplemetal otes 2 Hypergeometric, Biomial, Poisso ad Multiomial Radom Variables ad Borel Sets 1 Biomial Approximatio to the Hypergeometric Recall that the Hypergeometric istributio is fx = x
More informationSOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS
SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, AND KENGO KATO Abstract. I ecoometric applicatios it is commo
More information