SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS

Size: px
Start display at page:

Download "SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS"

Transcription

1 SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, AND KENGO KATO Abstract. I ecoometric applicatios it is commo that the exact form of a coditioal expectatio is ukow ad havig flexible fuctioal forms ca lead to improvemets over a pre-specified fuctioal form, especially if they est some successful parametric ecoomically-motivated forms. Series method offers exactly that by approximatig the ukow fuctio based o k basis fuctios, where k is allowed to grow with the sample size to balace the trade off betwee variace ad bias. I this work we cosider series estimators for the coditioal mea i light of four ew igrediets: (i) sharp LLNs for matrices derived from the o-commutative Khichi iequalities, (ii) bouds o the Lebesgue factor that cotrols the ratio betwee the L ad L 2 -orms of approximatio errors, (iii) maximal iequalities for processes whose etropy itegrals diverge at some rate, ad (iv) strog approximatios to series-type processes. These techical tools allow us to cotribute to the series literature, specifically the semial work of Newey (1997), as follows. First, we weake cosiderably the coditio o the umber k of approximatig fuctios used i series estimatio from the typical k 2 / 0 to k/ 0, up to log factors, which was available oly for splie series before. Secod, uder the same weak coditios we derive L 2 rates ad poitwise cetral limit theorems results whe the approximatio error vaishes. Uder a icorrectly specified model, i.e. whe the approximatio error does ot vaish, aalogous results are also show. Third, uder stroger coditios we derive uiform rates ad fuctioal cetral limit theorems that hold if the approximatio error vaishes or ot. That is, we derive the strog approximatio for the etire estimate of the oparametric fuctio. Fially ad most importatly, from a poit of view of practice, we derive uiform rates, Gaussia approximatios, ad uiform cofidece bads for a wide collectio of liear fuctioals of the coditioal expectatio fuctio, for example, the fuctio itself, the partial derivative fuctio, the coditioal average partial derivative fuctio, ad other similar quatities. All of these results are ew. Date: First versio: May 2006, This versio is of Jauary 7, Submitted to ArXiv ad for publicatio: December 3, JEL Classificatio: C01, C14. Key words ad phrases. least squares series, strog approximatios, uiform cofidece bads. 1

2 2 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO 1. Itroductio Series estimators have bee playig a cetral role i various fields. I ecoometric applicatios it is commo that the exact form of a coditioal expectatio is ukow ad havig a flexible fuctioal form ca lead to improvemets over a pre-specified fuctioal form, especially if it ests some successful parametric ecoomic models. Series estimatio offers exactly that by approximatig the ukow fuctio based o k basis fuctios, where k is allowed to grow with the sample size to balace the trade off betwee variace ad bias. Moreover, the series modellig allows for coveiet estig of some theory-based models, by simply usig correspodig terms as the first k 0 k basis fuctios. For istace, our series could cotai liear ad quadratic fuctios to est the caoical Micer equatios i the cotext of wage equatio modellig or the caoical traslog demad ad productio fuctios i the cotext of demad ad supply modellig. Several asymptotic properties of series estimators have bee ivestigated i the literature. The focus has bee o covergece rates ad asymptotic ormality results (see vadegeer, 1990; Adrews, 1991; Eastwood ad Gallat, 1991; Gallat ad Souza, 1991; Newey, 1997; vadegeer, 2002; Huag, 2003b; Che, 2007; Cattaeo ad Farrell, 2013, ad the refereces therei). This work revisits the topic by makig use of ew critical igrediets: 1. The sharp LLNs for matrices derived from the o-commutative Khichi iequalities. 2. The sharp bouds o the Lebesgue factor that cotrols the ratio betwee the L ad L 2 -orms of the least squares approximatio of fuctios (which is bouded or grows like a logk i may cases). 3. Sharp maximal iequalities for processes whose etropy itegrals diverge at some rate. 4. Strog approximatios to empirical processes of series types. To the best of our kowledge, our results are the first applicatios of the first igrediet to statistical estimatio problems. After the use i this work, some recet workig papers are also usig related matrix iequalities ad extedig some results i differet directios, e.g. Che ad Christese (2013) allows β-mixig depedece, ad Hase (2014) hadles ubouded regressors ad also characterizes a trade-off betwee the umber of fiite momets ad the allowable rate of expasio of the umber of series terms. Regardig the

3 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 3 secod igrediet, it has already bee used by Huag (2003a) but for splies oly. All of these igrediets are critical for geeratig sharp results. This approach allows us to cotribute to the series literature i several directios. First, we weake cosiderably the coditio o the umber k of approximatig fuctios used i series estimatio from the typical k 2 / 0 (see Newey, 1997) to k/ 0 (up to logs) for bouded or local bases which was previously available oly for splie series (Huag, 2003a; Stoe, 1994), ad recetly established for local polyomial partitio series (Cattaeo ad Farrell, 2013). A example of a bouded basis is Fourier series; examples of local bases are splie, wavelet, ad local polyomial partitio series. To be more specific, for such bases we require klogk/ 0. Note that the last coditio is similar to the coditio o the badwidth value required for local polyomial (kerel) regressio estimators (h d log(1/h)/ 0 where h = 1/k 1/d is the badwidth value). Secod, uder the same weak coditios we derive L 2 rates ad poitwise cetral limit theorems results whe the approximatio error vaishes. Uder a misspecified model, i.e. whe the approximatio error does ot vaish, aalogous results are also show. Third, uder stroger coditios we derive uiform rates that hold if the approximatio error vaishes or ot. A importat cotributio here is that we show that the series estimator achieves the optimal uiform rate of covergece uder quite geeral coditios. Previously, the same result was show oly for local polyomial partitio series estimator (Cattaeo ad Farrell, 2013). I additio, we derive a fuctioal cetral limit theorem. By the fuctioal cetral limit theorem we mea here that the etire estimate of the oparametric fuctio is uiformly close to a Gaussia process that ca chage with. That is, we derive the strog approximatio for the etire estimate of the oparametric fuctio. Perhaps the most importat cotributio of the paper is a set of completely ew results that provide estimatio ad iferece methods for the etire liear fuctioals θ( ) of the coditioal mea fuctio g : X R. Examples of liear fuctioals θ( ) of iterest iclude 1. the partial derivative fuctio: x θ(x) = j g(x); 2. the average partial derivative: θ = j g(x)dµ(x); 3. the coditioal average partial derivative: x s θ(x s ) = j g(x)dµ(x x s ). where j g(x) deotes the partial derivative of g(x) with respect to jth compoet of x, x s is a subvector of x, ad the measure µ eterig the defiitios above is take as kow; the result ca be exteded to iclude estimated measures. We derive uiform (i x) rates

4 4 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO of covergece, large sample distributioal approximatios, ad iferece methods for the fuctios above based o the Gaussia approximatio. To the best of our kowledge all these results are ew, especially the distributioal ad iferetial results. For example, usig these results we ca ow perform iferece o the etire partial derivative fuctio. The oly other referece that provides aalogous results but for quatile series estimator is Belloi et al. (2011). Before doig uiform aalysis, we also update the poitwise results of Newey (1997) to weaker, more geeral coditios. Notatio. I what follows, all parameter values are idexed by the sample size, but we omit the idex wheever this does ot cause cofusio. We use the otatio (a) + = max{a,0}, a b = max{a,b} ad a b = mi{a,b}. The l 2 -orm of a vector v is deoted by v, while for a matrix Q the operator orm is deoted by Q. We also use stadard otatio i the empirical process literature, E [f] = E [f(w i )] = 1 f(w i ) ad G [f] = G [f(w i )] = 1 (f(w i ) E[f(w i )]) i=1 i=1 ad we use the otatio a b to deote a cb for some costat c > 0 that does ot deped o ; ad a P b to deote a = O P (b). Moreover, for two radom variables X,Y we say that X = d Y if they have the same probability distributio. Fially, S k 1 deotes the space of vectors α i R k with uit Euclidea orm: α = Set-Up Throughout the paper, we cosider a sequece of models, idexed by the sample size, y i = g(x i )+ǫ i, E[ǫ i x i ] = 0, x i X R d, i = 1,...,, (2.1) where y i is a respose variable, x i a vector of covariates (basic regressors), ǫ i oise, ad x g(x) = E[y i x i = x] a regressio (coditioal mea) fuctio; that is, we cosider a triagular array of models with y i = y i,, x i = x i,, ǫ i = ǫ i,, ad g = g. We assume that g G where G is some class of fuctios. Sice we cosider a sequece of models idexed by, we allow the fuctio class G = G, where the regressio fuctio g belogs to, to deped o as well. I additio, we allow X = X to deped o but we assume for the sake of simplicity that the diameter of X is bouded from above uiformly over (droppig the uiform boudedess coditio is possible at the expese of more techicalities; for example, without uiform boudedess coditio, we would have a additioal term log diam(x) i (4.20) ad (4.22) of Lemma 4.2). We deote σi 2 = E[ǫ2 i x i], σ 2 := sup x X E[ǫ 2 i x i = x], ad

5 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 5 σ 2 := if x X E[ǫ 2 i x i = x]. For otatioal coveiece, we omit idexig by where it does ot lead to cofusio. Coditio A.1 (Sample) For each, radom vectors (y i,x i ), i = 1,...,, are i.i.d. ad satisfy (2.1). We approximate the fuctio x g(x) by liear forms x p(x) b, where x p(x) := (p 1 (x),...,p k (x)) is a vector of approximatig fuctios that ca chage with ; i particular, k may icrease with. We deote the regressors as p i := p(x i ) := (p 1 (x i ),...,p k (x i )). The ext assumptio imposes regularity coditios o the regressors. Coditio A.2 (Eigevalues) Uiformly over all, eigevalues of Q := E[p i p i ] are bouded above ad away from zero. Coditio A.2 imposes the restrictio that p 1 (x i ),...,p k (x i ) are ot too co-liear. Give this assumptio, it is without loss of geerality to impose the followig ormalizatio: Normalizatio. To simplify otatio, we ormalize Q = I, but we shall treat Q as ukow, that is we deal with radom desig. The followig propositio establishes a simple sufficiet coditio for A.2 based o orthoormal bases with respect to some measure. Propositio 2.1 (Stability of Bouds o Eigevalues). Assume that x i F where F is a probability measure o X, ad that the regressors p 1 (x),...,p k (x) are orthoormal o (X,µ) for some measure µ. The A.2 is satisfied if df/dµ is bouded above ad away from zero. It is well kow that the least squares parameter β is defied by β := argmi b R k E [ (y i p ib) 2], which by (2.1) also implies that β = β g where β g is defied by β g := argmi b R k E [ (g(x i ) p i b)2]. (2.2) We call x g(x) the target fuctio ad x g k (x) = p(x) β the surrogate fuctio. I this settig, the surrogate fuctio provides the best liear approximatio to the target fuctio.

6 6 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO For all x X, let r(x) := r g (x) := g(x) p(x) β g (2.3) deote the approximatio error at the poit x, ad let r i := r(x i ) = g(x i ) p(x i ) β g deote the approximatio error for the observatio i. Usig this otatio, we obtai a may regressors model The least squares estimator of β is y i = p i β +u i, E[u i x i ] = 0, u i := r i +ǫ i. β := argmi b R k E [ (yi p i b)2] = Q 1 E [p i y i ] (2.4) where Q := E [p i p i ]. The least squares estimator β iduces the estimator ĝ(x) := p(x) β for the target fuctio g(x). The it follows from (2.3) that we ca decompose the error i estimatig the target fuctio as ĝ(x) g(x) = p(x) ( β β) r(x), where the first term o the right-had side is the estimatio error ad the secod term is the approximatio error. We are also iterested i various liear fuctioals θ of the coditioal mea fuctio. As discussed i the itroductio, examples iclude the partial derivative fuctio, the average partial derivative fuctio, ad the coditioal average partial derivative. Importatly, i each example above we could be iterested i estimatig θ = θ(w) simultaeously for may values w I. By the liearity of the series approximatios, the above parameters ca be see as liear fuctios of the least squares coefficiets β up to a approximatio error, that is θ(w) = l θ (w) β +r θ (w), w I, (2.5) where l θ (w) β is the series approximatio, with l θ (w) deotig the k-vector of loadigs o the coefficiets, ad r θ (w) is the remaider term, which correspods to the approximatio error. Ideed, the decompositio (2.5) arises from the applicatio of differet liear operators A to the decompositio g( ) = p( ) β +r( ) ad evaluatig the resultig fuctios at w: (Ag( ))[w] = (Ap( ))[w] β +(Ar( ))[w]. (2.6) Examples of the operator A correspodig to the cases eumerated i the itroductio are give by, respectively,

7 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 7 1. a differetial operator: (Af)[x] = ( j f)[x], so that l θ (x) = j p(x), r θ (x) = j r(x); 2. a itegro-differetial operator: Af = j f(x)dµ(x), so that l θ = j p(x)dµ(x), r θ = j r(x)dµ(x); 3. a partial itegro-differetial operator: (Af)[x 2 ] = j f(x)dµ(x x s ), so that l θ (x s ) = j p(x)dµ(x x s ), r θ (x s ) = j r(x)dµ(x x s ), where x s is a subvector of x. For otatioal coveiece, we use the formulatio (2.5) i the aalysis, istead of the motivatioal formulatio (2.6). We shall provide the iferece tools that will be valid for iferece o the series approximatio l θ (w) β, w I. If the approximatio error r θ (w), w I, is small eough as compared to the estimatio error, these tools will also be valid for iferece o the fuctioal of iterest θ(w), w I. Ithiscase, theseriesapproximatio l θ (w) isaimportatitermediary target, whereasthe fuctioal θ(w) is the ultimate target. The iferece will be based o the plug-i estimator θ(w) := l θ (w) β of the the series approximatio lθ (w) β ad hece of the fial target θ(w). 3. Approximatio Properties of Least Squares Next we cosider approximatio properties of the least squares estimator. Not surprisigly, approximatio properties must rely o the particular choice of approximatig fuctios. At this poit it is istructive to cosider particular examples of relevat bases used i the literature. For each example, we state a boud o the followig quatity: ξ k := sup p(x). x X This quatity will play a key role i our aalysis. 1 Excellet reviews of approximatig properties of differet series ca also be foud i Huag (1998) ad Che (2007), where additioal refereces are provided. 1 Most results exted directly to the case that ξk max i p(x i) holds with probability 1 o(1). We refer to Hase (2014) for recet results that explicit allows for ubouded regressors which required extedig the cocetratio iequalities for matrices.

8 8 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Example 3.1 (Polyomial series). Let X = [0, 1] ad cosider a polyomial series give by p(x) = (1,x,x 2,...,x k 1 ). I order to reduce colliearity problems, it is useful to orthoormalize the polyomial series with respect to the Lebesgue measure o [0,1] to get the Legedre polyomial series p(x) = (1, 3x, 5/4(3x 2 1),...). The Legedre polyomial series satisfies ξ k k; see, for example, Newey (1997). Example 3.2 (Fourier series). Let X = [0,1] ad cosider a Fourier series give by p(x) = (1,cos(2πjx),si(2πjx),j = 1,2,...,(k 1)/2), for k odd. Fourier series is orthoormal with respect to the Lebesgue measure o [0,1] ad satisfies ξ k k, which follows trivially from the fact that every elemet of p(x) is bouded i absolute value by oe. Example 3.3 (Splie series). Let X = [0, 1] ad cosider the liear regressio splie series, or regressio splieseries of order 1, with a fiiteumberof equally spaced kots l 1,...,l k 2 i X: p(x) = (1,x,(x l 1 ) +,...,(x l k 2 ) + ), or cosider the cubic regressio splie series, or regressio splie series of order 3, with a fiite umber of equally spaced kots l 1,...,l k 4 : p(x) = (1,x,x 2,x 3,(x l 1 ) 3 +,...,(x l k 4 ) 3 +). Similarly, oe ca defie the regressio splie series of ay order s 0 (here s 0 is a oegative iteger). The fuctio x p(x) b costructed usig regressio splies of order s 0 is s 0 1 times cotiuously differetiable i x for ay b. Istead of regressio splies, it is ofte helpful to cosider B-splies p(x) = (p 1 (x),...,p k (x)), which are liear trasformatios of the regressio splies with lower multicolliearity; see De Boor (2001) for the itroductio to thetheory of splies. B-splies arelocal i thesesethat each B-spliep j (x) is supported o the iterval [l j(1),l j(2) ] for some j(1) ad j(2) satisfyig j(2) j(1) 1 ad there is at

9 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 9 most s 0 +1 o-zero B-splies o each iterval [l j 1,l j ]. From this property of B-splies, it is easy to see that B-splie series satisfies ξ k k; see, for example, Newey (1997). Example 3.4 (Cohe-Deubechies-Vial wavelet series). Let X = [0, 1] ad cosider Cohe- Deubechies-Vial (CDV) wavelet bases; see Sectio 4 i Cohe et al. (1993), Chapter 7.5 i Mallat (2009), ad Chapter 7 ad Appedix B i Johstoe (2011) for details o CDV wavelet bases. CDV wavelet bases is a class of orthoormal with respect to the Lebesgue measure o [0, 1] bases. Each such basis is built from a Daubechies scalig fuctio φ (defied o R) ad the wavelet ψ of order s 0 startig from a fixed resolutio level J 0 such that 2 J 0 2s 0. The fuctios φ ad ψ are supported o [0,2s 0 1] ad [ s 0 + 1,s 0 ], respectively. Traslate φ so that it has the support [ s 0 +1,s 0 ]. Let φ l,m (x) = 2 l/2 φ(2 l x m), ψ l,m (x) = 2 l/2 ψ(2 l x m), l,m 0. The we ca create the CDV wavelet basis from these fuctios as follows. Take all the fuctios φ J0,m,ψ l,m, l J 0, that are supported i the iterior of [0,1] (these are fuctios φ J0,m withm = s 0 1,...,2 J 0 s 0 adψ l,m withm = s 0 1,...,2 l s 0,l J 0 ). Deotethese fuctios φ J0,m, ψl,m. To this set of fuctios, add suitable boudary corrected fuctios φ J0,0,..., φ J0,s 0 2, φ J0,2 J 0 s 0 +1,..., φ J0,2 J 0 1, ψ l,0,..., ψ l,s0 2, ψ l,2 J 0 s0 +1,..., ψ l,2 J 0 1, l J 0, so that { φ J0,m} 0 m<2 J 0 { ψ l,m } 0 m<2 l,l J 0 forms a orthoormal basis of L 2 [0,1]. Suppose that k = 2 J for some J > J 0. The the CDV series takes the form: This series satisfies p(x) = ( φ J0,0(x),..., φ J0,2 J 0 1 (x), ψ J0,0(x),..., ψ J 1,2 J 1 1(x)). ξ k k. This boud ca be derived by the same argumet as that for B-splies (see, for example, Kato, 2013, Lemma 1 (i) for its proof). CDV wavelet bases is a flexible tool to approximate may differet fuctio classes. See, for example, Johstoe (2011), Appedix B. Example 3.5 (Local polyomial partitio series). Let X = [0, 1] ad defie a local polyomial partitio series as follows. Let s 0 be a oegative iteger. Partitio X as 0 = l 0 < l 1, < l k 1 < l k = 1 where k := [k/(s 0 + 1)] + 1 where [a] is the largest iteger

10 10 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO that is strictly smaller tha a. For j = 1,..., k, defie δ j : [0,1] {0,1} by δ j (x) = 1 if x (l j 1,l j ] ad 0 otherwise. For j = 1,...,k, defie p j (x) := δ [j/(s0 +1)]+1(x)x j 1 (s 0+1)[j/(s 0 +1)] for all x X. Fially, defie the local polyomial partitio series p 1 ( ),...,p k ( ) of order s 0 as a orthoormalizatio of p 1 ( ),..., p k ( ) with respect to the Lebesgue (or some other) measure o X. The local polyomial partitio series estimator was aalyzed i detail i Cattaeo ad Farrell(2013). Its properties are somewhat similar to those of local polyomial estimator of Stoe (1982). Whe the partitio l 0 satisfies l,...,l k j l j 1 1/ k, that is there exist costats c,c > 0 idepedet of ad such that c/ k l j l j 1 C/ k for all j = 1,..., k, ad the Lebesgue measure is used, the local polyomial partitio series satisfies ξ k k. This boud ca be derived by the same argumet as that for B-splies. Example 3.6 (Tesor Products). Geeralizatios to multiple covariates are straightforward usig tesor products of uidimesioal series. Suppose that the basic regressors are x i = (x 1i,...,x di ). The we ca create d series for each basic regressor. The we take all iteractios of fuctios from these d series, called tesor products, ad collect them ito a vector of regressors p i. If each series for a basic regressor has J terms, the the fial regressor has dimesio k = J d, which explodes expoetially i the dimesio d. The bouds o ξ k i terms of k remai the same as i oe-dimesioal case. Each basis described i Examples has differet approximatio properties which also deped o the particular class of fuctios G. The followig assumptio captures the essece of this depedece ito two quatities. Coditio A.3 (Approximatio) For each ad k, there are fiite costats c k ad l k such that for each f G, r f F,2 := x X r2 f (x)df(x) c k ad r f F, := sup r f (x) l k c k. x X

11 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 11 Here r f is defied by (2.2) ad (2.3) with g replaced by f. We call l k the Lebesgue factor because of its relatio to the Lebesgue costat defied i Sectio 3.2 below. Together c k ad l k characterize the approximatio properties of the uderlyig class of fuctios uder L 2 (X,F) ad uiform distaces. Note that costats c k = c k (G) ad l k = l k (G) are allowed to deped but we omit idexig by for simplicity of otatio. Next we discuss primitive bouds o c k ad l k Bouds o c k. I what follows, we call the case where c k 0 as k the correctly specified case. I particular, if the series are formed from bases that spa G, the c k 0 as k. However, if series are formed from bases that do ot spa G, the c k 0 as k. We call ay case where c k 0 the icorrectly specified (misspecified) case. To give a example of the misspecified case, suppose that d = 2, so that x = (x 1,x 2 ) ad g(x) = g(x 1,x 2 ). Further, suppose that the researcher mistakely assumes that g(x) is additively separable i x 1 ad x 2 : g(x 1,x 2 ) = g 1 (x 1 ) +g(x 2 ). Give this assumptio, the researcher forms the vector of approximatig fuctios p(x 1,x 2 ) such that each compoet of this vector depedseither o x 1 or x 2 but ot o both; see Newey (1997) ad Newey et al. (1999) for the descriptio of oparametric series estimators of separately additive models. The ote that if the true fuctio g(x 1,x 2 ) is ot separately additive, liear combiatios p(x 1,x 2 ) b will ot be able to accurately approximate g(x 1,x 2 ) for ay b, so that c k does ot coverge to zero as k. Sice aalysis of misspecified models plays a importat role i ecoometrics, we iclude results both for correctly ad icorrectly specified models. To provide a boud o c k, ote that for ay f G, if b f p b F,2 if b f p b F,, so that it suffices to set c k such that c k sup f G if b f p b F,. Next, the bouds for if b f p b F, are readily available from the Approximatio Theory; see DeVore ad Loretz (1993). A typical example is based o the cocept of s-smooth classes, amely Hölder classes of smoothess order s, Σ s (X). For s (0,1], the Hölder class of smoothess order s, Σ s (X), is defied as the set of all fuctios f : X R such that for C > 0, ( d f(x) f( x) C (x j x j ) 2) s/2 for all x = (x 1,...,x d ) ad x = ( x 1,..., x d ) i X. The smallest C satisfyig this iequality defies a orm of f i Σ s (X), which we deote by f s. For s > 1, Σ s (X) ca be defied j=1

12 12 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO as follows. For a d-tuple α = (α 1,...,α d ) of oegative itegers, let D α = α 1 x 1... α d x d. Let [s] deote the largest iteger strictly smaller tha s. The Σ s (X) is defied as the set of all fuctios f : X R such that f is [s] times cotiuously differetiable ad for some C > 0, ( d D α f(x) D α f( x) C (x j x j ) 2) (s [s])/2 ad D β f(x) C j=1 for all x = (x 1,...,x d ) ad x = ( x 1,..., x d ) i X ad for all d-tuples α = (α 1,...,α d ) ad β = (β 1,...,β d ) of oegative itegers satisfyig α 1 + +α d = [s] ad β 1 + +β d [s]. Agai, the smallest C satisfyig these iequalities defies a orm of f i Σ s (X), which we deote f s. If G is a set of fuctios f i Σ s (X) such that f s is bouded from above uiformly over all f G (that is, G is cotaied i a ball i Σ s (X) of fiite radius), the we ca take c k k s/d (3.7) for the polyomial series ad c k k (s s 0)/d for splie, CDV wavelet, ad local polyomial partitio series of order s 0. If i additio we assume that each elemet of G ca be exteded to a periodic fuctio, the (3.7) also holds for the Fourier series. See, for example, Newey (1997) ad Che (2007) for refereces Bouds o l k. We say that a least squares approximatio by a particular series for the fuctio class G is co-miimal if the Lebesgue factor l k is small i the sese of beig a slowly varyigfuctioi k. A simpleboudo l k, which is idepedetof G, is established i the followig propositio: Propositio 3.1. If c k is chose so that c k sup f G if b f p b F,, the Coditio A.3 holds with l k 1+ξ k. The proof of this propositio is based o the ideas of Newey (1997) ad is provided i the Appedix. The advatage of the boud established i this propositio is that it is uiversally applicable. It is, however, ot sharp i may cases because ξ k satisfies ξk 2 E[ p(x i) 2 ] = E[p(x i ) p(x i )] = k

13 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 13 so that ξ k k i all cases. Much sharper bouds follow from Approximatio Theory for some importat cases. To apply these bouds, defie the Lebesgue costat: p l k := sup( ) β f F, : f F, 0,f f Ḡ, F, where Ḡ = G + {p b : b R k } = {f + p b : f G,b R k }. The followig propositio provides a boud o l k i terms of l k : Propositio 3.2. If c k is chose so that c k sup f G if b f p b F,, the Coditio A.3 holds with l k = 1+ l k. Note that i all examples above, we provided c k such that c k sup f G if b f p b F,, ad so the results of Propositios 3.1 ad 3.2 apply i our examples. We ow provide bouds o l k. Example 3.7 (Fourier series, cotiued). For Fourier series o X = [0,1], F = U(0,1), ad G C(X) l k C 0 logk +C 1, where here ad below C 0 ad C 1 are some uiversal costats; see Zygmud (2002). Example 3.8 (Splie series, cotiued). For cotiuous B-splie series o X = [0, 1], F = U(0,1), ad G C(X) l k C 0, uder approximately uiform placemet of kots; see Huag (2003b). I fact, the result of Huag states that l k C wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā. Example 3.9 (Wavelet series, cotiued). For cotiuous CDV wavelet series o X = [0, 1], F = U(0,1), ad G C(X) l k C 0. The proof of this result was recetly obtaied by Che ad Christese(2013) who exteded the argumet of Huag (2003b) for B-splies to cover wavelets. I fact, the result of Che ad Christese also shows that l k C wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā.

14 14 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Example 3.10 (Local polyomial partitio series, cotiued). For local polyomial partitio series o X, F = U(0,1), ad G C(X), l k C 0. To prove this boud, ote that first order coditios imply that for ay f Ḡ, β f = Q 1 E[p(x 1 )f(x 1 )] = E[p(x 1 )f(x 1 )]. Hece, for ay x X, p(x) β f = E[p(x) p(x 1 )f(x 1 )] f F, where the last iequality follows by otig that the sum p(x) p(x 1 ) = k j=1 p j(x)p j (x 1 ) cotais at most s 0 +1 ozero terms, all ozero terms i the sum are bouded by ξk 2 k, ad p(x) p(x 1 ) = 0 outside of a set with probability bouded from above by 1/k up to a costat. The boud follows. Moreover, the boud l k C cotiues to hold wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā. Example 3.11 (Polyomial series, cotiued). For Chebyshev polyomials with X = [0, 1], df(x)/dx = 1/ 1 x 2, ad G C(X) l k C 0 logk +C 1. This boud follows from a trigoometric represetatio of Chebyshev polyomials (see, for example, DeVore ad Loretz (1993)) ad Example 3.7. Example 3.12 (Legedre Polyomials). For Legedre polyomials that form a orthoormal basis o X = [0,1] with respect to F = (0,1), ad G = C(X) l k C 0 ξ k = C 1 k, for some costats C 0,C 1 > 0. See, for example, DeVore ad Loretz (1993)). This meas that eve though some series schemes geerate well-behaved uiform approximatios, others Legedre polyomials do ot i geeral. However, the followig example specifies tailored fuctio classes, for which Legedre ad other series methods do automatically provide uiformly well-behaved approximatios. Example 3.13 (Tailored Fuctio Classes). For each type of series approximatios, it is possible to specify fuctio classes for which the Lebesgue factors are costat or slowly

15 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 15 varyig with k. Specifically, cosider a collectio G k = { x f(x) = p(x) b+r(x) : r(x)p(x)df(x) = 0, r F, l k r F,2, r F,2 c k }, where l k C or l k Clogk. This example captures the idea, that for each type of series fuctios there are fuctio classes that are well-approximated by this type. For example, Legedre polyomials may have poor Lebesgue factors i geeral, but there are welldefied fuctio classes, where Legedre polyomials have well-behaved Lebesgue factors. This explais why polyomial approximatios, for example, usig Legedre polyomials, are frequetly employed i empirical work. We provide a empirically relevat example below, where polyomial approximatio works just as well as a B-splie approximatio. I ecoomic examples, both polyomial approximatios ad B-splie approximatios are well-motivated if we cosider them as more flexible forms of well-kow, well-motivated fuctioal forms i ecoomics (for example, as more flexible versios of the liear-quadratic Micer equatios, or the more flexible versios of traslog demad ad productio fuctios). The followig example illustrate the performace of the series estimator usig differet bases for a real data set. Example 3.14 (Approximatios of Coditioal Expected Wage Fuctio). Here g(x) is the mea of log wage (y) coditioal o educatio x {8,9,10,11,12,13,14,16,17,18,19,20}. The fuctio g(x) is computed usig populatio data the 1990 Cesus data for the U.S. me of prime age; see Agrist et al. (2006) for more details. So i this example, we kow the true populatio fuctio g(x). We would like to kow how well this fuctio is approximated whe commo approximatio methods are used to form the regressors. For simplicity we assume that x i is uiformly distributed (otherwise we ca weigh by the frequecy). I populatio, least squares estimator solves the approximatio problem: β = argmi b E[{g(x i ) p i b}2 ] for p i = p(x i ), where we form p(x) as (a) liear splie (Figure 1, left) ad (b) polyomial series (Figure 1, right), such that dimesio of p(x) is either k = 3 or k = 8. It is clear from these graphs that splie ad polyomial series yield similar approximatios. I the table below, we also preset L 2 ad L orms of approximatig errors:

16 16 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Approximatio by Liear Splies, K=3 ad 8 Approximatio by Polyomial Series, K=3 ad 8 cef cef ed ed Figure 1. Coditioal expectatio fuctio (cef) of log wage give educatio (ed) i the 1990 Cesus data for the U.S. me of prime age ad its least squares approximatio by splie (left pael) ad polyomial series (right pael). Solid lie - coditioal expectatio fuctio; dashed lie - approximatio by k = 3 series terms; dash-dot lie - approximatio by k = 8 series terms splie k = 3 splie k = 8 Poly k = 3 Poly k = 8 L 2 Error L Error We see from the table that i this example, the Lebesgue factor, which is defied as the ratio of L to L 2 errors, of the polyomial approximatios is comparable to the Lebesgue factor of the splie approximatios. 4. Limit Theory 4.1. L 2 Limit Theory. After we have established the set-up, we proceed to derive our results. WestartwitharesultotheL 2 rateofcovergece. Recallthat σ 2 = sup x X E[ǫ 2 i x i = x]. I the theorem below, we assume that σ 2 1. This is a mild regularity coditio. Theorem 4.1 (L 2 rate of covergece). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that ξk 2logk/ 0 ad σ2 1. The uder c k 0, ĝ g F,2 P k/+ck, (4.8)

17 ad uder c k 0, LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 17 ĝ p β F,2 P k/+(lk c k k/) (ξk c k / ), (4.9) Commet 4.1. (i) This is our firstmai resulti this paper. Thecoditio ξk 2 logk/ 0, which we impose, weakes (hece geeralizes) the coditios imposed i Newey (1997) who required kξk 2/ 0. For series satisfyig ξ k k, the coditio ξk 2 logk/ 0 amouts to klogk/ 0. (4.10) This coditio is the same as that imposed i Stoe (1994), Huag (2003a), ad recetly by Cattaeo ad Farrell (2013) but the result (4.8) is obtaied uder the coditio (4.10) i Stoe (1994) ad Huag (2003a) oly for splie series ad i Cattaeo ad Farrell (2013) oly for local polyomial partitio series. Therefore, our result improves o those i the literature by weakeig the rate requiremets o the growth of k (with respect to ) ad/or by allowig for a wider set of series fuctios. (ii) Uderthecorrect specificatio (c k 0), thefastest L 2 rate of covergece is achieved by settig k so that the approximatio error ad the samplig error are of the same order, k/ ck. Oe cosequece of this result is that for Hölder classes of smoothess order s, Σ s (X), with c k k s/d, we obtai the optimal L 2 rate of covergece by settig k d/(d+2s), which is allowed uder our coditios for all s > 0 if ξ k k (Fourier, splie, wavelet, ad local polyomial partitio series). O the other had, if ξ k is growig faster tha k, the it is ot possible to achieve optimal L 2 rate of covergece for some s > 0. For example, for polyomial series cosidered above, ξ k k, ad so the coditio ξk 2 logk/ 0 becomes k 2 logk/ 0. Hece, optimal L 2 rate of covergece is achieved by polyomial series oly if d/(d+2s) < 1/2 or, equivaletly, s > d/2. Eve though this coditio is somewhat restrictive, it weakes the coditio i Newey (1997) who required k 3 / 0 for polyomial series, so that optimal L 2 rate i his aalysis could be achieved oly if d/(d+2s) 1/3 or, equivaletly, s d. Therefore, our results allow to achieve optimal L 2 rate of covergece i a larger set of classes of fuctios for particular series. (iii) The result (4.9) is cocered with the case whe the model is misspecified (c k 0). It shows that whe k/ 0 ad (l k c k k/) (ξk c k / ) 0, the estimator ĝ( ) coverges i L 2 to the surrogate fuctio p( ) β that provides the best liear approximatio to the target fuctio g( ). I this case, the estimator ĝ( ) does ot geerally coverge i L 2 to the target fuctio g( ).

18 18 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO 4.2. Poitwise Limit Theory. Next we focus o poitwise limit theory (some authors refer to poitwise limit theory as local asymptotics; see Huag (2003b)). That is, we study asymptotic behavior of α ( β β) ad (ĝ(x) g(x)) for particular α S k 1 ad x X. Here S k 1 deotes the space of vectors α i R k with uit Euclidea orm: α = 1. Note that both α ad x implicitly deped o. As we will show, poitwise results ca be achieved uder weak coditios similar to those we required i Theorem 4.1. The followig lemma plays a key role i our asymptotic poitwise ormality result. Lemma 4.1 (Poitwise Liearizatio). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that ξ 2 k logk/ 0 ad σ2 1. The for ay α S k 1, α ( β β) = α G [p i (ǫ i +r i )]+R 1 (α), (4.11) where the term R 1 (α), summarizig the impact of ukow desig, obeys ξk 2 R 1 (α) logk P (1+ kl k c k ). (4.12) Moreover, α ( β β) = α G [p i ǫ i ]+R 1 (α)+r 2 (α), (4.13) where the term R 2 (α), summarizig the impact of approximatio error o the samplig error of the estimator, obeys R 2 (α) P l k c k. (4.14) Commet 4.2. (i) I summary, the oly coditio that geerally matters for liearizatio (4.11)-(4.12) is that R 1 (α) 0, which holds if ξk 2logk/ 0 ad kξ2 k l2 k c2 klogk/ 0. I particular, liearizatio (4.11)-(4.12) allows for misspecificatio (c k 0 is ot required). I priciple, liearizatio (4.13)-(4.14) also allows for misspecificatio but the bouds are oly useful if the model is correctly specified, so that l k c k 0. As i the theorem o L 2 rate of covergece, our mai coditio is that ξk 2 logk/ 0. (ii) We cojecture that the boud o R 1 (α) ca be improved for splies to ξk 2 R 1 (α) logk P (1+ logk l k c k ). (4.15) sice it is attaied by local polyomials ad splies are also similarly localized. With the help of Lemma 4.1, we derive our asymptotic poitwise ormality result. We will use the followig additioal otatio: Ω := Q 1 E[(ǫ i +r i ) 2 p i p i]q 1 ad Ω 0 := Q 1 E[ǫ 2 ip i p i]q 1.

19 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 19 Ithetheorem below, wewillimposethecoditio that sup x X E [ ǫ 2 i 1{ ǫ i > M} x i = x ] 0 as M uiformly over. This is a mild uiform itegrability coditio. Specifically, it holds if for some m > 2, sup x X E[ ǫ i m x i = x] 1. I additio, we will impose the coditio that 1 σ 2. This coditio is used to properly ormalize the estimator. Theorem 4.2 (Poitwise Normality). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that (i) sup x X E [ ǫ 2 i 1{ ǫ i > M} x i = x ] 0 as M uiformly over, (ii) 1 σ 2, ad (iii) (ξ 2 k logk/)1/2 (1+k 1/2 l k c k ) 0. The for ay α S k 1, α ( β β) α Ω 1/2 = d N(0,1)+o P (1), (4.16) where we set Ω = Ω but if R 2 (α) P 0, the we ca set Ω = Ω 0. Moreover, for ay x X ad s(x) := Ω 1/2 p(x), p(x) ( β β) s(x) = d N(0,1)+o P (1), (4.17) ad if the approximatio error is egligible relative tothe estimatio error, amely r(x) = o( s(x) ), the ĝ(x) g(x) s(x) = d N(0,1)+o P (1). (4.18) Commet 4.3. (i) This is our secod mai result i this paper. The result delivers poitwise covergece i distributio for ay sequeces α = α ad x = x with α S k 1 ad x X. I fact, the proof of the theorem implies that the covergece is uiform over all sequeces. Note that the ormalizatio factor s(x) is the poitwise stadard error, ad it is of a typical order s(x) k at most poits. I this case the coditio for egligibility of approximatio error r(x)/ s(x) 0, which ca be uderstood as a udersmoothig coditio, ca be replaced by /k lk c k 0. Whe l k c k k s/d logk, which is ofte the case if G is cotaied i a ball i Σ s (X) of fiite radius (see our examples i the previous sectio), this coditio substatially weakes a assumptio i Newey (1997) who required k s/d 0 i a similar set-up. (ii) Whe applied to splies, our result is somewhat less sharp tha that of Huag (2003b). Specifically, Huag required that ξk 2logk/ 0 ad (/k)1/2 l k c k 0 whereas we require (kξk 2logk/)1/2 l k c k 0 i additio to Huag s coditios (see coditio (iii) of the theorem). The differece ca likely be explaied by the fact that we use liearizatio boud (4.12) whereas for splies it is likely that (4.15) holds as well.

20 20 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO (iii) More geerally, our asymptotic poitwise ormality result, as well as other related results i this paper, applies to ay problem where the estimator of g(x) = p(x) β + r(x) takes the form p(x) β, where β admits liearizatio of the form (4.11)-(4.14) Uiform Limit Theory. Fially, we tur to a uiform limit theory. Not surprisig, stroger coditios are required for our results to hold whe compared to the poitwise case. Let m > 2. We will eed the followig assumptio o the tails of the regressio errors. Coditio A.4 (Disturbaces) Regressio errors satisfy sup x X E[ ǫ i m x i = x] 1. It will be coveiet to deote α(x) := p(x)/ p(x) i this subsectio. Moreover, deote ξ L k := sup α(x) α(x ) x,x X:x x x x We will also eed the followig assumptio o the basis fuctios to hold with the same m > 2 as that i Coditio A.4. Coditio A.5 (Basis) Basis fuctios are such that (i) ξ 2m/(m 2) k logk/ 1, (ii) logξk L logk, ad (iii) logξ k logk. The followig lemma provides uiform liearizatio of the series estimator ad plays a key role i our derivatio of the uiform rate of covergece. Lemma 4.2 (Uiform Liearizatio). Assume that Coditios A.1-A.5 are satisfied. The α(x) ( β β) = α(x) G [p i (ǫ i +r i )]+R 1 (α(x)), (4.19) where R 1 (α(x)), summarizig the impact of ukow desig, obeys ξk 2 R 1 (α(x)) logk P ( 1/m logk+ k l k c k ) =: R 1 (4.20) uiformly over x X. Moreover, α(x) ( β β) = α(x) G [p i ǫ i ]+R 1 (α(x)) +R 2 (α(x)), (4.21) where R 2 (α(x)), summarizig the impact of approximatio error o the samplig error of the estimator, obeys R 2 (α(x)) P logk lk c k =: R 2 (4.22) uiformly over x X.

21 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 21 Commet 4.4. As i the case of poitwise liearizatio, our results o uiform liearizatio (4.19)-(4.20) allow for misspecificatio (c k 0 is ot required). I priciple, liearizatio (4.21)-(4.22) also allows for misspecificatio but the bouds are most useful if the model is correctly specified so that (logk) 1/2 l k c k 0. We are ot aware of ay similar uiform liearizatio result i the literature. We believe that this result is useful i a variety of problems. Below we use this result to derive good uiform rate of covergece of the series estimator. Aother applicatio of this result would be i testig shape restrictios i the oparametric model. The followig theorem provides uiform rate of covergece of the series estimator. Theorem 4.3 (Uiform Rate of Covergece). Assume that Coditios A.1-A.5 are satisfied. The Moreover, for R 1 ad R 2 give above we have sup α(x) G [p i ǫ i ] P logk. (4.23) x X ad sup p(x) ξ ( β β) P k ( logk+ R 1 + R 2 ) (4.24) x X ξ sup ĝ(x) g(x) P k ( logk+ R 1 + R 2 )+l k c k. (4.25) x X Commet 4.5. This is our third mai result i this paper. Assume that G is a ball i Σ s (X) of fiite radius, l k c k k s/d, ξ k k, ad R 1 + R 2 (logk) 1/2. The the boud i (4.25) becomes klogk sup ĝ(x) g(x) P +k s/d. x X Therefore, settig k (log/) d/(2s+d), we obtai ( ) log s/(2s+d) sup ĝ(x) g(x) P, x X which is the optimal uiform rate of covergece i the fuctio class Σ s (X); see Stoe (1982). To the best of our kowledge, our paper is the first to show that the series estimator attais the optimal uiform rate of covergece uder these rather geeral coditios; see the ext commet. We also ote here that it has bee kow for a log time that a local polyomial (kerel) estimator achieves the same optimal uiform rate of covergece; see, for example, Tsybakov(2009), ad it was also show recetly by Cattaeo ad Farrell(2013) that local polyomial partitio series estimator also achieves the same rate. Recetly, i

22 22 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO a effort to relax the idepedece assumptio, the workig paper Che ad Christese (2013), which appeared i ArXiv i 2013, approximately 1 year after our paper was posted to ArXiv ad submitted for publicatio, 2 derived similar uiform rate of covergece result allowig for β-mixig coditios, see their Theorem 4.1 for specific coditios. Commet 4.6. Primitive coditios leadig to iequalities l k c k k s/d ad ξ k k are discussed i the previous sectio. Also, uder the assumptio that l k c k k s/d, iequality R 2 (logk) 1/2 follows automatically from the defiitio of R 2. Thus, oe of the critical coditios to attai the optimal uiform rate of covergece is that we require R 1 (logk) 1/2. Uder our other assumptios, this coditio holds if klogk/ 1 2/m 1 ad k 2 2s/d / 1, ad so we ca set k (log/) d/(2s+d) if d/(2s +d) < 1 2/m ad (2d 2s)/(2s+d) < 1 or, equivaletly, m > 2+d/s ad s/d > 1/4. After establishig the auxiliary results o the uiform rate of covergece, we preset two results o iferece based o the series estimator. The first result o iferece is cocered with the strog approximatio of a series process by a Gaussia process ad is a (relatively) mior extesio of the result obtaied by Cherozhukov et al. (2013). The extesio is udertake to allow for a o-vaishig specificatio error to cover misspecified models. I particular, we make a distictio betwee Ω = Q 1 E[(ǫ i + r i ) 2 p i p i ]Q 1, ad Ω 0 = Q 1 E[ǫ 2 i p ip i ]Q 1 which are potetially asymptotically differet if R 2 P 0. To state the result, let a be some sequece of positive umbers satisfyig a. Theorem 4.4 (Strog Approximatio by a Gaussia Process). Assume that Coditios A.1-A.5 are satisfied with m 3. I additio, assume that (i) R 1 = o P (a 1 ), (ii) 1 σ2, ad (iii) a 6 k 4 ξ 2 k (1+l3 k c3 k )2 log 2 / 0. The for some N k N(0,I k ), α(x) ( β β) α(x) Ω 1/2 = d so that for s(x) = Ω 1/2 p(x), p(x) ( β β) s(x) ad if sup x X r(x) / s(x) = o(a 1 ), the ĝ(x) g(x) s(x) α(x) Ω 1/2 α(x) Ω 1/2 N k +o P (a 1 ) i l (X), (4.26) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.27) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.28) where we set Ω = Ω but if R 2 = o P (a 1 ), the we ca set Ω = Ω 0. 2 Our paper was submitted for publicatio ad to ArXiv o December 3, Our result as stated here did ot chage sice the origial submissio.

23 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 23 Commet 4.7. Oe might hope to have a result of the form ĝ(x) g(x) s(x) d G(x) i l (X), (4.29) where {G(x) : x X} is some fixed zero-mea Gaussia process. However, oe ca show that the process o the left-had side of (4.29) is ot asymptotically equicotiuous, ad so it does ot have a limit distributio. Istead, Theorem 4.4 provides a approximatio of the series process by a sequece of zero-mea Gaussia processes {G k (x) : x X} G k (x) := α(x) Ω 1/2 α(x) Ω 1/2 N k, with the stochastic error of size o P (a 1 ). Sice a, uder our coditios the theorem implies that the series process is well approximated by a Gaussia process, ad so the theorem ca be iterpreted as sayig that i large samples, the distributio of the series process depeds o the distributio of the data oly via covariace matrix Ω; hece, it allows us to perform iferece based o the whole series process. Note that the coditios of the theorem are quite strog i terms of growth requiremets o k, but the result of the theorem is also much stroger tha the poitwise ormality result: it asserts that the etire series process is uiformly close to a Gaussia process of the stated form. Our result o the strog approximatio by a Gaussia process plays a importat role i our secod result o iferece that is cocered with the weighted bootstrap. Cosider a set of weights h 1,...,h that are i.i.d. draws from the stadard expoetial distributio ad are idepedet of the data. For each draw of such weights, defie the weighted bootstrap draw of the least squares estimator as a solutio to the least squares problem weighted by h 1,...,h, amely β b argmi b R k E [h i (y i p ib) 2 ]. (4.30) For all x X, deote ĝ b (x) = p(x) βb. The followig theorem establishes a ew result that states that the weighted bootstrap distributio is valid for approximatig the distributio of the series process. Theorem 4.5 (Weighted Bootstrap Method). (1) Assume that Coditios A.1-A.5 are satisfied. I additio, assume that (ξ k (log) 1/2 ) 2m/(m 2) 1. The the weighted bootstrap process satisfies α(x) ( β b β) = α(x) G [(h i 1)p i (ǫ i +r i )]+R b 1(α(x)),

24 24 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO where R1 b (α(x)) obeys R b 1(α(x)) P ξk 2log3 ( 1/m log+ k l k c k ) =: R 1 b (4.31) uiformly over x X. (2) If, i additio, Coditios A.4 ad A.5 are satisfied with m 3 ad (i) R b 1 = o P (a 1 ), (ii) 1 σ 2, ad (iii) a 6 k 4 ξ 2 k (1+l3 k c3 k )2 log 2 / 0 hold, the for s(x) = Ω 1/2 p(x) ad some N k N(0,I k ), ad so p(x) ( β b β) s(x) ĝ b (x) ĝ(x) s(x) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.32) = d s(x) s(x) N k +o P (a 1 ) i l (X). (4.33) where we set Ω = Ω, but if R 2 = o P (a 1 ), the we ca set Ω = Ω 0. (3) Moreover, the bouds (4.31), (4.32), ad (4.33) cotiue to hold i P-probability if we replace the ucoditioal probability P by the coditioal probability computed give the data, amely if we replace P by P ( D) where D = {(x i,y i ) : i = 1,...,}. Commet 4.8. (i) This is our fourth mai ad ew result i this paper. The theorem implies that the weighted bootstrap process ca be approximated by a copy of the same Gaussia process as that used to approximate origial series process. (ii) We emphasize that the theorem does ot require the correct specificatio, that is the case c k 0 is allowed. Also, i this theorem, symbol P refers to a joit probability measure with respect to the data D = {(x i,y i ) : i = 1,...,} ad the set of bootstrap weights {h i : i = 1,...,}. We close this sectio by establishig sufficiet coditios for cosistet estimatio of Ω. Recall that Q = E[p i p i ] = I. I additio, deote Σ = E[(ǫ i +r i ) 2 p i p i ], Q = E [p i p i ], ad Σ = E [ ǫ 2 i p ip i ] where ǫ i = y i p i β, ad let v = (E[max 1 i ǫ i 2 ]) 1/2. Theorem 4.6 (Matrices Estimatio). Assume that Coditios A.1-A.5 are satisfied. I additio, assume that R 1 + R 2 (logk) 1/2. The ξk 2 Q Q logk ξk 2 P = o(1) ad Σ Σ P (v 1+l k c k ) logk = o(1).

25 LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 25 Moreover, for Ω = Q 1 Σ Q 1 ad Ω = Q 1 ΣQ 1, ξk 2 Ω Ω P (v 1+l k c k ) logk = o(1). Commet 4.9. Theorem 4.6 allows for cosistet estimatio of the matrix Q uder the mild coditio ξk 2 logk/ 0 ad for cosistet estimatio of the matrices Σ ad Ω uder somewhat more restricted coditios. Not surprisigly, the estimatio of Σ ad Ω depeds o the tail behavior of the error term via the value of v. Note that uder Coditio A.4, we have that v 1/m. 5. Rates ad Iferece o Liear Fuctioals I this sectio, we derive rates ad iferece results for liear fuctioals θ(w),w I of the coditioal expectatio fuctio such as its derivative, average derivative, or coditioal average derivative. To a large extet, with the exceptio of Theorem 5.6, the results preseted i this sectio ca be cosidered as a extesio of results preseted i Sectio 4, ad so similar commets ca be applied as those give i Sectio 4. Theorem 5.6 deals with costructio of uiform cofidece bads for liear fuctioals uder weak coditios ad is a ew result. By the liearity of the series approximatios, the liear fuctioals ca be see as liear fuctios of the least squares coefficiets β up to a approximatio error, that is θ(w) = l θ (w) β +r θ (w), w I, where l θ (w) β is the series approximatio, with l θ (w) deotig the k-vector of loadigs o the coefficiets, ad r θ (w) is the remaider term, which correspods to the approximatio error. Throughout this sectio, we assume that I is a subset of some Euclidea space R l equipped with its usual orm. We allow I = I to deped o but for simplicity, we assume that the diameter of I is bouded from above uiformly over. Results allowig for the case where I is expadig as grows ca be covered as well with slightly more techicalities. I order to perform iferece, we costruct estimators of σθ 2(w) = l θ(w) Ωl θ (w)/, the variace of the associated liear fuctioals, as σ θ 2 (w) = l θ(w) Ωlθ (w)/. (5.34) I what follows, it will be coveiet to have the followig result o cosistecy of σ θ (w):

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Regression with an Evaporating Logarithmic Trend

Regression with an Evaporating Logarithmic Trend Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data Proceedigs 59th ISI World Statistics Cogress, 5-30 August 013, Hog Kog (Sessio STS046) p.09 Kolmogorov-Smirov type Tests for Local Gaussiaity i High-Frequecy Data George Tauche, Duke Uiversity Viktor Todorov,

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father

More information

MATHEMATICAL SCIENCES PAPER-II

MATHEMATICAL SCIENCES PAPER-II MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

A Weak Law of Large Numbers Under Weak Mixing

A Weak Law of Large Numbers Under Weak Mixing A Weak Law of Large Numbers Uder Weak Mixig Bruce E. Hase Uiversity of Wiscosi Jauary 209 Abstract This paper presets a ew weak law of large umbers (WLLN) for heterogeous depedet processes ad arrays. The

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Marcinkiwiecz-Zygmund Type Inequalities for all Arcs of the Circle

Marcinkiwiecz-Zygmund Type Inequalities for all Arcs of the Circle Marcikiwiecz-ygmud Type Iequalities for all Arcs of the Circle C.K. Kobidarajah ad D. S. Lubisky Mathematics Departmet, Easter Uiversity, Chekalady, Sri Laka; Mathematics Departmet, Georgia Istitute of

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Solution to Chapter 2 Analytical Exercises

Solution to Chapter 2 Analytical Exercises Nov. 25, 23, Revised Dec. 27, 23 Hayashi Ecoometrics Solutio to Chapter 2 Aalytical Exercises. For ay ε >, So, plim z =. O the other had, which meas that lim E(z =. 2. As show i the hit, Prob( z > ε =

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

On Binscatter Supplemental Appendix

On Binscatter Supplemental Appendix O Biscatter Supplemetal Appedix Matias D. Cattaeo Richard K. Crump Max H. Farrell Yigjie Feg February 12, 2019 Abstract This plemet collects all techical proofs, more geeral theoretical results tha those

More information

Diagonal approximations by martingales

Diagonal approximations by martingales Alea 7, 257 276 200 Diagoal approximatios by martigales Jaa Klicarová ad Dalibor Volý Faculty of Ecoomics, Uiversity of South Bohemia, Studetsa 3, 370 05, Cese Budejovice, Czech Republic E-mail address:

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information