SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Kernel density estimator

Distribution of Random Samples & Limit theorems

Random Variables, Sampling and Estimation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Fall 2013 MTH431/531 Real analysis Section Notes

Chapter 6 Infinite Series

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 2: Monte Carlo Simulation

Optimally Sparse SVMs

Advanced Stochastic Processes.

Lecture 19: Convergence

Lecture Notes for Analysis Class

Regression with an Evaporating Logarithmic Trend

Lecture 3 The Lebesgue Integral

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

7.1 Convergence of sequences of random variables

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Sequences and Series of Functions

Chapter 6 Principles of Data Reduction

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Lecture 33: Bootstrap

Empirical Processes: Glivenko Cantelli Theorems

Stochastic Simulation

LECTURE 8: ASYMPTOTICS I

4. Partial Sums and the Central Limit Theorem

b i u x i U a i j u x i u x j

Lecture 12: September 27

Riesz-Fischer Sequences and Lower Frame Bounds

Efficient GMM LECTURE 12 GMM II

Expectation and Variance of a random variable

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Properties and Hypothesis Testing

Algebra of Least Squares

Rates of Convergence by Moduli of Continuity

6.3 Testing Series With Positive Terms

An Introduction to Asymptotic Theory

Topic 9: Sampling Distributions of Estimators

5.1 A mutual information bound based on metric entropy

Output Analysis and Run-Length Control

An Introduction to Randomized Algorithms

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

MATHEMATICAL SCIENCES PAPER-II

32 estimating the cumulative distribution function

A Weak Law of Large Numbers Under Weak Mixing

Regression with quadratic loss

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Statistics 511 Additional Materials

1 Introduction to reducing variance in Monte Carlo simulations

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Law of the sum of Bernoulli random variables

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

The Method of Least Squares. To understand least squares fitting of data.

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Math Solutions to homework 6

Lecture 2. The Lovász Local Lemma

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

REGRESSION WITH QUADRATIC LOSS

ECE 901 Lecture 13: Maximum Likelihood Estimation

5 Birkhoff s Ergodic Theorem

Basics of Probability Theory (for Theory of Computation courses)

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

1 Inferential Methods for Correlation and Regression Analysis

Statistical Inference Based on Extremum Estimators

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Math 525: Lecture 5. January 18, 2018

1 Approximating Integrals using Taylor Polynomials

Introductory statistics

Notes 19 : Martingale CLT

The standard deviation of the mean

A survey on penalized empirical risk minimization Sara A. van de Geer

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Marcinkiwiecz-Zygmund Type Inequalities for all Arcs of the Circle

Rademacher Complexity

Solution to Chapter 2 Analytical Exercises

Sequences. Notation. Convergence of a Sequence

Mathematical Methods for Physics and Engineering

Asymptotic Results for the Linear Regression Model

Singular Continuous Measures by Michael Pejic 5/14/10

1 Covariance Estimation

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Lecture 19. sup y 1,..., yn B d n

On Binscatter Supplemental Appendix

Diagonal approximations by martingales

Lecture 8: Convergence of transformations and law of large numbers

MAT1026 Calculus II Basic Convergence Tests for Series

Transcription:

SOME NEW ASYMPTOTIC THEORY FOR LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, AND KENGO KATO Abstract. I ecoometric applicatios it is commo that the exact form of a coditioal expectatio is ukow ad havig flexible fuctioal forms ca lead to improvemets over a pre-specified fuctioal form, especially if they est some successful parametric ecoomically-motivated forms. Series method offers exactly that by approximatig the ukow fuctio based o k basis fuctios, where k is allowed to grow with the sample size to balace the trade off betwee variace ad bias. I this work we cosider series estimators for the coditioal mea i light of four ew igrediets: (i) sharp LLNs for matrices derived from the o-commutative Khichi iequalities, (ii) bouds o the Lebesgue factor that cotrols the ratio betwee the L ad L 2 -orms of approximatio errors, (iii) maximal iequalities for processes whose etropy itegrals diverge at some rate, ad (iv) strog approximatios to series-type processes. These techical tools allow us to cotribute to the series literature, specifically the semial work of Newey (1997), as follows. First, we weake cosiderably the coditio o the umber k of approximatig fuctios used i series estimatio from the typical k 2 / 0 to k/ 0, up to log factors, which was available oly for splie series before. Secod, uder the same weak coditios we derive L 2 rates ad poitwise cetral limit theorems results whe the approximatio error vaishes. Uder a icorrectly specified model, i.e. whe the approximatio error does ot vaish, aalogous results are also show. Third, uder stroger coditios we derive uiform rates ad fuctioal cetral limit theorems that hold if the approximatio error vaishes or ot. That is, we derive the strog approximatio for the etire estimate of the oparametric fuctio. Fially ad most importatly, from a poit of view of practice, we derive uiform rates, Gaussia approximatios, ad uiform cofidece bads for a wide collectio of liear fuctioals of the coditioal expectatio fuctio, for example, the fuctio itself, the partial derivative fuctio, the coditioal average partial derivative fuctio, ad other similar quatities. All of these results are ew. Date: First versio: May 2006, This versio is of Jauary 7, 2015. Submitted to ArXiv ad for publicatio: December 3, 2012. JEL Classificatio: C01, C14. Key words ad phrases. least squares series, strog approximatios, uiform cofidece bads. 1

2 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO 1. Itroductio Series estimators have bee playig a cetral role i various fields. I ecoometric applicatios it is commo that the exact form of a coditioal expectatio is ukow ad havig a flexible fuctioal form ca lead to improvemets over a pre-specified fuctioal form, especially if it ests some successful parametric ecoomic models. Series estimatio offers exactly that by approximatig the ukow fuctio based o k basis fuctios, where k is allowed to grow with the sample size to balace the trade off betwee variace ad bias. Moreover, the series modellig allows for coveiet estig of some theory-based models, by simply usig correspodig terms as the first k 0 k basis fuctios. For istace, our series could cotai liear ad quadratic fuctios to est the caoical Micer equatios i the cotext of wage equatio modellig or the caoical traslog demad ad productio fuctios i the cotext of demad ad supply modellig. Several asymptotic properties of series estimators have bee ivestigated i the literature. The focus has bee o covergece rates ad asymptotic ormality results (see vadegeer, 1990; Adrews, 1991; Eastwood ad Gallat, 1991; Gallat ad Souza, 1991; Newey, 1997; vadegeer, 2002; Huag, 2003b; Che, 2007; Cattaeo ad Farrell, 2013, ad the refereces therei). This work revisits the topic by makig use of ew critical igrediets: 1. The sharp LLNs for matrices derived from the o-commutative Khichi iequalities. 2. The sharp bouds o the Lebesgue factor that cotrols the ratio betwee the L ad L 2 -orms of the least squares approximatio of fuctios (which is bouded or grows like a logk i may cases). 3. Sharp maximal iequalities for processes whose etropy itegrals diverge at some rate. 4. Strog approximatios to empirical processes of series types. To the best of our kowledge, our results are the first applicatios of the first igrediet to statistical estimatio problems. After the use i this work, some recet workig papers are also usig related matrix iequalities ad extedig some results i differet directios, e.g. Che ad Christese (2013) allows β-mixig depedece, ad Hase (2014) hadles ubouded regressors ad also characterizes a trade-off betwee the umber of fiite momets ad the allowable rate of expasio of the umber of series terms. Regardig the

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 3 secod igrediet, it has already bee used by Huag (2003a) but for splies oly. All of these igrediets are critical for geeratig sharp results. This approach allows us to cotribute to the series literature i several directios. First, we weake cosiderably the coditio o the umber k of approximatig fuctios used i series estimatio from the typical k 2 / 0 (see Newey, 1997) to k/ 0 (up to logs) for bouded or local bases which was previously available oly for splie series (Huag, 2003a; Stoe, 1994), ad recetly established for local polyomial partitio series (Cattaeo ad Farrell, 2013). A example of a bouded basis is Fourier series; examples of local bases are splie, wavelet, ad local polyomial partitio series. To be more specific, for such bases we require klogk/ 0. Note that the last coditio is similar to the coditio o the badwidth value required for local polyomial (kerel) regressio estimators (h d log(1/h)/ 0 where h = 1/k 1/d is the badwidth value). Secod, uder the same weak coditios we derive L 2 rates ad poitwise cetral limit theorems results whe the approximatio error vaishes. Uder a misspecified model, i.e. whe the approximatio error does ot vaish, aalogous results are also show. Third, uder stroger coditios we derive uiform rates that hold if the approximatio error vaishes or ot. A importat cotributio here is that we show that the series estimator achieves the optimal uiform rate of covergece uder quite geeral coditios. Previously, the same result was show oly for local polyomial partitio series estimator (Cattaeo ad Farrell, 2013). I additio, we derive a fuctioal cetral limit theorem. By the fuctioal cetral limit theorem we mea here that the etire estimate of the oparametric fuctio is uiformly close to a Gaussia process that ca chage with. That is, we derive the strog approximatio for the etire estimate of the oparametric fuctio. Perhaps the most importat cotributio of the paper is a set of completely ew results that provide estimatio ad iferece methods for the etire liear fuctioals θ( ) of the coditioal mea fuctio g : X R. Examples of liear fuctioals θ( ) of iterest iclude 1. the partial derivative fuctio: x θ(x) = j g(x); 2. the average partial derivative: θ = j g(x)dµ(x); 3. the coditioal average partial derivative: x s θ(x s ) = j g(x)dµ(x x s ). where j g(x) deotes the partial derivative of g(x) with respect to jth compoet of x, x s is a subvector of x, ad the measure µ eterig the defiitios above is take as kow; the result ca be exteded to iclude estimated measures. We derive uiform (i x) rates

4 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO of covergece, large sample distributioal approximatios, ad iferece methods for the fuctios above based o the Gaussia approximatio. To the best of our kowledge all these results are ew, especially the distributioal ad iferetial results. For example, usig these results we ca ow perform iferece o the etire partial derivative fuctio. The oly other referece that provides aalogous results but for quatile series estimator is Belloi et al. (2011). Before doig uiform aalysis, we also update the poitwise results of Newey (1997) to weaker, more geeral coditios. Notatio. I what follows, all parameter values are idexed by the sample size, but we omit the idex wheever this does ot cause cofusio. We use the otatio (a) + = max{a,0}, a b = max{a,b} ad a b = mi{a,b}. The l 2 -orm of a vector v is deoted by v, while for a matrix Q the operator orm is deoted by Q. We also use stadard otatio i the empirical process literature, E [f] = E [f(w i )] = 1 f(w i ) ad G [f] = G [f(w i )] = 1 (f(w i ) E[f(w i )]) i=1 i=1 ad we use the otatio a b to deote a cb for some costat c > 0 that does ot deped o ; ad a P b to deote a = O P (b). Moreover, for two radom variables X,Y we say that X = d Y if they have the same probability distributio. Fially, S k 1 deotes the space of vectors α i R k with uit Euclidea orm: α = 1. 2. Set-Up Throughout the paper, we cosider a sequece of models, idexed by the sample size, y i = g(x i )+ǫ i, E[ǫ i x i ] = 0, x i X R d, i = 1,...,, (2.1) where y i is a respose variable, x i a vector of covariates (basic regressors), ǫ i oise, ad x g(x) = E[y i x i = x] a regressio (coditioal mea) fuctio; that is, we cosider a triagular array of models with y i = y i,, x i = x i,, ǫ i = ǫ i,, ad g = g. We assume that g G where G is some class of fuctios. Sice we cosider a sequece of models idexed by, we allow the fuctio class G = G, where the regressio fuctio g belogs to, to deped o as well. I additio, we allow X = X to deped o but we assume for the sake of simplicity that the diameter of X is bouded from above uiformly over (droppig the uiform boudedess coditio is possible at the expese of more techicalities; for example, without uiform boudedess coditio, we would have a additioal term log diam(x) i (4.20) ad (4.22) of Lemma 4.2). We deote σi 2 = E[ǫ2 i x i], σ 2 := sup x X E[ǫ 2 i x i = x], ad

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 5 σ 2 := if x X E[ǫ 2 i x i = x]. For otatioal coveiece, we omit idexig by where it does ot lead to cofusio. Coditio A.1 (Sample) For each, radom vectors (y i,x i ), i = 1,...,, are i.i.d. ad satisfy (2.1). We approximate the fuctio x g(x) by liear forms x p(x) b, where x p(x) := (p 1 (x),...,p k (x)) is a vector of approximatig fuctios that ca chage with ; i particular, k may icrease with. We deote the regressors as p i := p(x i ) := (p 1 (x i ),...,p k (x i )). The ext assumptio imposes regularity coditios o the regressors. Coditio A.2 (Eigevalues) Uiformly over all, eigevalues of Q := E[p i p i ] are bouded above ad away from zero. Coditio A.2 imposes the restrictio that p 1 (x i ),...,p k (x i ) are ot too co-liear. Give this assumptio, it is without loss of geerality to impose the followig ormalizatio: Normalizatio. To simplify otatio, we ormalize Q = I, but we shall treat Q as ukow, that is we deal with radom desig. The followig propositio establishes a simple sufficiet coditio for A.2 based o orthoormal bases with respect to some measure. Propositio 2.1 (Stability of Bouds o Eigevalues). Assume that x i F where F is a probability measure o X, ad that the regressors p 1 (x),...,p k (x) are orthoormal o (X,µ) for some measure µ. The A.2 is satisfied if df/dµ is bouded above ad away from zero. It is well kow that the least squares parameter β is defied by β := argmi b R k E [ (y i p ib) 2], which by (2.1) also implies that β = β g where β g is defied by β g := argmi b R k E [ (g(x i ) p i b)2]. (2.2) We call x g(x) the target fuctio ad x g k (x) = p(x) β the surrogate fuctio. I this settig, the surrogate fuctio provides the best liear approximatio to the target fuctio.

6 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO For all x X, let r(x) := r g (x) := g(x) p(x) β g (2.3) deote the approximatio error at the poit x, ad let r i := r(x i ) = g(x i ) p(x i ) β g deote the approximatio error for the observatio i. Usig this otatio, we obtai a may regressors model The least squares estimator of β is y i = p i β +u i, E[u i x i ] = 0, u i := r i +ǫ i. β := argmi b R k E [ (yi p i b)2] = Q 1 E [p i y i ] (2.4) where Q := E [p i p i ]. The least squares estimator β iduces the estimator ĝ(x) := p(x) β for the target fuctio g(x). The it follows from (2.3) that we ca decompose the error i estimatig the target fuctio as ĝ(x) g(x) = p(x) ( β β) r(x), where the first term o the right-had side is the estimatio error ad the secod term is the approximatio error. We are also iterested i various liear fuctioals θ of the coditioal mea fuctio. As discussed i the itroductio, examples iclude the partial derivative fuctio, the average partial derivative fuctio, ad the coditioal average partial derivative. Importatly, i each example above we could be iterested i estimatig θ = θ(w) simultaeously for may values w I. By the liearity of the series approximatios, the above parameters ca be see as liear fuctios of the least squares coefficiets β up to a approximatio error, that is θ(w) = l θ (w) β +r θ (w), w I, (2.5) where l θ (w) β is the series approximatio, with l θ (w) deotig the k-vector of loadigs o the coefficiets, ad r θ (w) is the remaider term, which correspods to the approximatio error. Ideed, the decompositio (2.5) arises from the applicatio of differet liear operators A to the decompositio g( ) = p( ) β +r( ) ad evaluatig the resultig fuctios at w: (Ag( ))[w] = (Ap( ))[w] β +(Ar( ))[w]. (2.6) Examples of the operator A correspodig to the cases eumerated i the itroductio are give by, respectively,

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 7 1. a differetial operator: (Af)[x] = ( j f)[x], so that l θ (x) = j p(x), r θ (x) = j r(x); 2. a itegro-differetial operator: Af = j f(x)dµ(x), so that l θ = j p(x)dµ(x), r θ = j r(x)dµ(x); 3. a partial itegro-differetial operator: (Af)[x 2 ] = j f(x)dµ(x x s ), so that l θ (x s ) = j p(x)dµ(x x s ), r θ (x s ) = j r(x)dµ(x x s ), where x s is a subvector of x. For otatioal coveiece, we use the formulatio (2.5) i the aalysis, istead of the motivatioal formulatio (2.6). We shall provide the iferece tools that will be valid for iferece o the series approximatio l θ (w) β, w I. If the approximatio error r θ (w), w I, is small eough as compared to the estimatio error, these tools will also be valid for iferece o the fuctioal of iterest θ(w), w I. Ithiscase, theseriesapproximatio l θ (w) isaimportatitermediary target, whereasthe fuctioal θ(w) is the ultimate target. The iferece will be based o the plug-i estimator θ(w) := l θ (w) β of the the series approximatio lθ (w) β ad hece of the fial target θ(w). 3. Approximatio Properties of Least Squares Next we cosider approximatio properties of the least squares estimator. Not surprisigly, approximatio properties must rely o the particular choice of approximatig fuctios. At this poit it is istructive to cosider particular examples of relevat bases used i the literature. For each example, we state a boud o the followig quatity: ξ k := sup p(x). x X This quatity will play a key role i our aalysis. 1 Excellet reviews of approximatig properties of differet series ca also be foud i Huag (1998) ad Che (2007), where additioal refereces are provided. 1 Most results exted directly to the case that ξk max i p(x i) holds with probability 1 o(1). We refer to Hase (2014) for recet results that explicit allows for ubouded regressors which required extedig the cocetratio iequalities for matrices.

8 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Example 3.1 (Polyomial series). Let X = [0, 1] ad cosider a polyomial series give by p(x) = (1,x,x 2,...,x k 1 ). I order to reduce colliearity problems, it is useful to orthoormalize the polyomial series with respect to the Lebesgue measure o [0,1] to get the Legedre polyomial series p(x) = (1, 3x, 5/4(3x 2 1),...). The Legedre polyomial series satisfies ξ k k; see, for example, Newey (1997). Example 3.2 (Fourier series). Let X = [0,1] ad cosider a Fourier series give by p(x) = (1,cos(2πjx),si(2πjx),j = 1,2,...,(k 1)/2), for k odd. Fourier series is orthoormal with respect to the Lebesgue measure o [0,1] ad satisfies ξ k k, which follows trivially from the fact that every elemet of p(x) is bouded i absolute value by oe. Example 3.3 (Splie series). Let X = [0, 1] ad cosider the liear regressio splie series, or regressio splieseries of order 1, with a fiiteumberof equally spaced kots l 1,...,l k 2 i X: p(x) = (1,x,(x l 1 ) +,...,(x l k 2 ) + ), or cosider the cubic regressio splie series, or regressio splie series of order 3, with a fiite umber of equally spaced kots l 1,...,l k 4 : p(x) = (1,x,x 2,x 3,(x l 1 ) 3 +,...,(x l k 4 ) 3 +). Similarly, oe ca defie the regressio splie series of ay order s 0 (here s 0 is a oegative iteger). The fuctio x p(x) b costructed usig regressio splies of order s 0 is s 0 1 times cotiuously differetiable i x for ay b. Istead of regressio splies, it is ofte helpful to cosider B-splies p(x) = (p 1 (x),...,p k (x)), which are liear trasformatios of the regressio splies with lower multicolliearity; see De Boor (2001) for the itroductio to thetheory of splies. B-splies arelocal i thesesethat each B-spliep j (x) is supported o the iterval [l j(1),l j(2) ] for some j(1) ad j(2) satisfyig j(2) j(1) 1 ad there is at

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 9 most s 0 +1 o-zero B-splies o each iterval [l j 1,l j ]. From this property of B-splies, it is easy to see that B-splie series satisfies ξ k k; see, for example, Newey (1997). Example 3.4 (Cohe-Deubechies-Vial wavelet series). Let X = [0, 1] ad cosider Cohe- Deubechies-Vial (CDV) wavelet bases; see Sectio 4 i Cohe et al. (1993), Chapter 7.5 i Mallat (2009), ad Chapter 7 ad Appedix B i Johstoe (2011) for details o CDV wavelet bases. CDV wavelet bases is a class of orthoormal with respect to the Lebesgue measure o [0, 1] bases. Each such basis is built from a Daubechies scalig fuctio φ (defied o R) ad the wavelet ψ of order s 0 startig from a fixed resolutio level J 0 such that 2 J 0 2s 0. The fuctios φ ad ψ are supported o [0,2s 0 1] ad [ s 0 + 1,s 0 ], respectively. Traslate φ so that it has the support [ s 0 +1,s 0 ]. Let φ l,m (x) = 2 l/2 φ(2 l x m), ψ l,m (x) = 2 l/2 ψ(2 l x m), l,m 0. The we ca create the CDV wavelet basis from these fuctios as follows. Take all the fuctios φ J0,m,ψ l,m, l J 0, that are supported i the iterior of [0,1] (these are fuctios φ J0,m withm = s 0 1,...,2 J 0 s 0 adψ l,m withm = s 0 1,...,2 l s 0,l J 0 ). Deotethese fuctios φ J0,m, ψl,m. To this set of fuctios, add suitable boudary corrected fuctios φ J0,0,..., φ J0,s 0 2, φ J0,2 J 0 s 0 +1,..., φ J0,2 J 0 1, ψ l,0,..., ψ l,s0 2, ψ l,2 J 0 s0 +1,..., ψ l,2 J 0 1, l J 0, so that { φ J0,m} 0 m<2 J 0 { ψ l,m } 0 m<2 l,l J 0 forms a orthoormal basis of L 2 [0,1]. Suppose that k = 2 J for some J > J 0. The the CDV series takes the form: This series satisfies p(x) = ( φ J0,0(x),..., φ J0,2 J 0 1 (x), ψ J0,0(x),..., ψ J 1,2 J 1 1(x)). ξ k k. This boud ca be derived by the same argumet as that for B-splies (see, for example, Kato, 2013, Lemma 1 (i) for its proof). CDV wavelet bases is a flexible tool to approximate may differet fuctio classes. See, for example, Johstoe (2011), Appedix B. Example 3.5 (Local polyomial partitio series). Let X = [0, 1] ad defie a local polyomial partitio series as follows. Let s 0 be a oegative iteger. Partitio X as 0 = l 0 < l 1, < l k 1 < l k = 1 where k := [k/(s 0 + 1)] + 1 where [a] is the largest iteger

10 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO that is strictly smaller tha a. For j = 1,..., k, defie δ j : [0,1] {0,1} by δ j (x) = 1 if x (l j 1,l j ] ad 0 otherwise. For j = 1,...,k, defie p j (x) := δ [j/(s0 +1)]+1(x)x j 1 (s 0+1)[j/(s 0 +1)] for all x X. Fially, defie the local polyomial partitio series p 1 ( ),...,p k ( ) of order s 0 as a orthoormalizatio of p 1 ( ),..., p k ( ) with respect to the Lebesgue (or some other) measure o X. The local polyomial partitio series estimator was aalyzed i detail i Cattaeo ad Farrell(2013). Its properties are somewhat similar to those of local polyomial estimator of Stoe (1982). Whe the partitio l 0 satisfies l,...,l k j l j 1 1/ k, that is there exist costats c,c > 0 idepedet of ad such that c/ k l j l j 1 C/ k for all j = 1,..., k, ad the Lebesgue measure is used, the local polyomial partitio series satisfies ξ k k. This boud ca be derived by the same argumet as that for B-splies. Example 3.6 (Tesor Products). Geeralizatios to multiple covariates are straightforward usig tesor products of uidimesioal series. Suppose that the basic regressors are x i = (x 1i,...,x di ). The we ca create d series for each basic regressor. The we take all iteractios of fuctios from these d series, called tesor products, ad collect them ito a vector of regressors p i. If each series for a basic regressor has J terms, the the fial regressor has dimesio k = J d, which explodes expoetially i the dimesio d. The bouds o ξ k i terms of k remai the same as i oe-dimesioal case. Each basis described i Examples 3.1-3.6 has differet approximatio properties which also deped o the particular class of fuctios G. The followig assumptio captures the essece of this depedece ito two quatities. Coditio A.3 (Approximatio) For each ad k, there are fiite costats c k ad l k such that for each f G, r f F,2 := x X r2 f (x)df(x) c k ad r f F, := sup r f (x) l k c k. x X

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 11 Here r f is defied by (2.2) ad (2.3) with g replaced by f. We call l k the Lebesgue factor because of its relatio to the Lebesgue costat defied i Sectio 3.2 below. Together c k ad l k characterize the approximatio properties of the uderlyig class of fuctios uder L 2 (X,F) ad uiform distaces. Note that costats c k = c k (G) ad l k = l k (G) are allowed to deped but we omit idexig by for simplicity of otatio. Next we discuss primitive bouds o c k ad l k. 3.1. Bouds o c k. I what follows, we call the case where c k 0 as k the correctly specified case. I particular, if the series are formed from bases that spa G, the c k 0 as k. However, if series are formed from bases that do ot spa G, the c k 0 as k. We call ay case where c k 0 the icorrectly specified (misspecified) case. To give a example of the misspecified case, suppose that d = 2, so that x = (x 1,x 2 ) ad g(x) = g(x 1,x 2 ). Further, suppose that the researcher mistakely assumes that g(x) is additively separable i x 1 ad x 2 : g(x 1,x 2 ) = g 1 (x 1 ) +g(x 2 ). Give this assumptio, the researcher forms the vector of approximatig fuctios p(x 1,x 2 ) such that each compoet of this vector depedseither o x 1 or x 2 but ot o both; see Newey (1997) ad Newey et al. (1999) for the descriptio of oparametric series estimators of separately additive models. The ote that if the true fuctio g(x 1,x 2 ) is ot separately additive, liear combiatios p(x 1,x 2 ) b will ot be able to accurately approximate g(x 1,x 2 ) for ay b, so that c k does ot coverge to zero as k. Sice aalysis of misspecified models plays a importat role i ecoometrics, we iclude results both for correctly ad icorrectly specified models. To provide a boud o c k, ote that for ay f G, if b f p b F,2 if b f p b F,, so that it suffices to set c k such that c k sup f G if b f p b F,. Next, the bouds for if b f p b F, are readily available from the Approximatio Theory; see DeVore ad Loretz (1993). A typical example is based o the cocept of s-smooth classes, amely Hölder classes of smoothess order s, Σ s (X). For s (0,1], the Hölder class of smoothess order s, Σ s (X), is defied as the set of all fuctios f : X R such that for C > 0, ( d f(x) f( x) C (x j x j ) 2) s/2 for all x = (x 1,...,x d ) ad x = ( x 1,..., x d ) i X. The smallest C satisfyig this iequality defies a orm of f i Σ s (X), which we deote by f s. For s > 1, Σ s (X) ca be defied j=1

12 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO as follows. For a d-tuple α = (α 1,...,α d ) of oegative itegers, let D α = α 1 x 1... α d x d. Let [s] deote the largest iteger strictly smaller tha s. The Σ s (X) is defied as the set of all fuctios f : X R such that f is [s] times cotiuously differetiable ad for some C > 0, ( d D α f(x) D α f( x) C (x j x j ) 2) (s [s])/2 ad D β f(x) C j=1 for all x = (x 1,...,x d ) ad x = ( x 1,..., x d ) i X ad for all d-tuples α = (α 1,...,α d ) ad β = (β 1,...,β d ) of oegative itegers satisfyig α 1 + +α d = [s] ad β 1 + +β d [s]. Agai, the smallest C satisfyig these iequalities defies a orm of f i Σ s (X), which we deote f s. If G is a set of fuctios f i Σ s (X) such that f s is bouded from above uiformly over all f G (that is, G is cotaied i a ball i Σ s (X) of fiite radius), the we ca take c k k s/d (3.7) for the polyomial series ad c k k (s s 0)/d for splie, CDV wavelet, ad local polyomial partitio series of order s 0. If i additio we assume that each elemet of G ca be exteded to a periodic fuctio, the (3.7) also holds for the Fourier series. See, for example, Newey (1997) ad Che (2007) for refereces. 3.2. Bouds o l k. We say that a least squares approximatio by a particular series for the fuctio class G is co-miimal if the Lebesgue factor l k is small i the sese of beig a slowly varyigfuctioi k. A simpleboudo l k, which is idepedetof G, is established i the followig propositio: Propositio 3.1. If c k is chose so that c k sup f G if b f p b F,, the Coditio A.3 holds with l k 1+ξ k. The proof of this propositio is based o the ideas of Newey (1997) ad is provided i the Appedix. The advatage of the boud established i this propositio is that it is uiversally applicable. It is, however, ot sharp i may cases because ξ k satisfies ξk 2 E[ p(x i) 2 ] = E[p(x i ) p(x i )] = k

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 13 so that ξ k k i all cases. Much sharper bouds follow from Approximatio Theory for some importat cases. To apply these bouds, defie the Lebesgue costat: p l k := sup( ) β f F, : f F, 0,f f Ḡ, F, where Ḡ = G + {p b : b R k } = {f + p b : f G,b R k }. The followig propositio provides a boud o l k i terms of l k : Propositio 3.2. If c k is chose so that c k sup f G if b f p b F,, the Coditio A.3 holds with l k = 1+ l k. Note that i all examples above, we provided c k such that c k sup f G if b f p b F,, ad so the results of Propositios 3.1 ad 3.2 apply i our examples. We ow provide bouds o l k. Example 3.7 (Fourier series, cotiued). For Fourier series o X = [0,1], F = U(0,1), ad G C(X) l k C 0 logk +C 1, where here ad below C 0 ad C 1 are some uiversal costats; see Zygmud (2002). Example 3.8 (Splie series, cotiued). For cotiuous B-splie series o X = [0, 1], F = U(0,1), ad G C(X) l k C 0, uder approximately uiform placemet of kots; see Huag (2003b). I fact, the result of Huag states that l k C wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā. Example 3.9 (Wavelet series, cotiued). For cotiuous CDV wavelet series o X = [0, 1], F = U(0,1), ad G C(X) l k C 0. The proof of this result was recetly obtaied by Che ad Christese(2013) who exteded the argumet of Huag (2003b) for B-splies to cover wavelets. I fact, the result of Che ad Christese also shows that l k C wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā.

14 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Example 3.10 (Local polyomial partitio series, cotiued). For local polyomial partitio series o X, F = U(0,1), ad G C(X), l k C 0. To prove this boud, ote that first order coditios imply that for ay f Ḡ, β f = Q 1 E[p(x 1 )f(x 1 )] = E[p(x 1 )f(x 1 )]. Hece, for ay x X, p(x) β f = E[p(x) p(x 1 )f(x 1 )] f F, where the last iequality follows by otig that the sum p(x) p(x 1 ) = k j=1 p j(x)p j (x 1 ) cotais at most s 0 +1 ozero terms, all ozero terms i the sum are bouded by ξk 2 k, ad p(x) p(x 1 ) = 0 outside of a set with probability bouded from above by 1/k up to a costat. The boud follows. Moreover, the boud l k C cotiues to hold wheever F has the pdf o [0,1] bouded from above by ā ad below from zero by a where C is a costat that depeds oly o a ad ā. Example 3.11 (Polyomial series, cotiued). For Chebyshev polyomials with X = [0, 1], df(x)/dx = 1/ 1 x 2, ad G C(X) l k C 0 logk +C 1. This boud follows from a trigoometric represetatio of Chebyshev polyomials (see, for example, DeVore ad Loretz (1993)) ad Example 3.7. Example 3.12 (Legedre Polyomials). For Legedre polyomials that form a orthoormal basis o X = [0,1] with respect to F = (0,1), ad G = C(X) l k C 0 ξ k = C 1 k, for some costats C 0,C 1 > 0. See, for example, DeVore ad Loretz (1993)). This meas that eve though some series schemes geerate well-behaved uiform approximatios, others Legedre polyomials do ot i geeral. However, the followig example specifies tailored fuctio classes, for which Legedre ad other series methods do automatically provide uiformly well-behaved approximatios. Example 3.13 (Tailored Fuctio Classes). For each type of series approximatios, it is possible to specify fuctio classes for which the Lebesgue factors are costat or slowly

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 15 varyig with k. Specifically, cosider a collectio G k = { x f(x) = p(x) b+r(x) : r(x)p(x)df(x) = 0, r F, l k r F,2, r F,2 c k }, where l k C or l k Clogk. This example captures the idea, that for each type of series fuctios there are fuctio classes that are well-approximated by this type. For example, Legedre polyomials may have poor Lebesgue factors i geeral, but there are welldefied fuctio classes, where Legedre polyomials have well-behaved Lebesgue factors. This explais why polyomial approximatios, for example, usig Legedre polyomials, are frequetly employed i empirical work. We provide a empirically relevat example below, where polyomial approximatio works just as well as a B-splie approximatio. I ecoomic examples, both polyomial approximatios ad B-splie approximatios are well-motivated if we cosider them as more flexible forms of well-kow, well-motivated fuctioal forms i ecoomics (for example, as more flexible versios of the liear-quadratic Micer equatios, or the more flexible versios of traslog demad ad productio fuctios). The followig example illustrate the performace of the series estimator usig differet bases for a real data set. Example 3.14 (Approximatios of Coditioal Expected Wage Fuctio). Here g(x) is the mea of log wage (y) coditioal o educatio x {8,9,10,11,12,13,14,16,17,18,19,20}. The fuctio g(x) is computed usig populatio data the 1990 Cesus data for the U.S. me of prime age; see Agrist et al. (2006) for more details. So i this example, we kow the true populatio fuctio g(x). We would like to kow how well this fuctio is approximated whe commo approximatio methods are used to form the regressors. For simplicity we assume that x i is uiformly distributed (otherwise we ca weigh by the frequecy). I populatio, least squares estimator solves the approximatio problem: β = argmi b E[{g(x i ) p i b}2 ] for p i = p(x i ), where we form p(x) as (a) liear splie (Figure 1, left) ad (b) polyomial series (Figure 1, right), such that dimesio of p(x) is either k = 3 or k = 8. It is clear from these graphs that splie ad polyomial series yield similar approximatios. I the table below, we also preset L 2 ad L orms of approximatig errors:

16 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO Approximatio by Liear Splies, K=3 ad 8 Approximatio by Polyomial Series, K=3 ad 8 cef 6.0 6.5 7.0 cef 6.0 6.5 7.0 8 10 12 14 16 18 20 ed 8 10 12 14 16 18 20 ed Figure 1. Coditioal expectatio fuctio (cef) of log wage give educatio (ed) i the 1990 Cesus data for the U.S. me of prime age ad its least squares approximatio by splie (left pael) ad polyomial series (right pael). Solid lie - coditioal expectatio fuctio; dashed lie - approximatio by k = 3 series terms; dash-dot lie - approximatio by k = 8 series terms splie k = 3 splie k = 8 Poly k = 3 Poly k = 8 L 2 Error 0.12 0.08 0.12 0.05 L Error 0.29 0.17 0.30 0.12 We see from the table that i this example, the Lebesgue factor, which is defied as the ratio of L to L 2 errors, of the polyomial approximatios is comparable to the Lebesgue factor of the splie approximatios. 4. Limit Theory 4.1. L 2 Limit Theory. After we have established the set-up, we proceed to derive our results. WestartwitharesultotheL 2 rateofcovergece. Recallthat σ 2 = sup x X E[ǫ 2 i x i = x]. I the theorem below, we assume that σ 2 1. This is a mild regularity coditio. Theorem 4.1 (L 2 rate of covergece). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that ξk 2logk/ 0 ad σ2 1. The uder c k 0, ĝ g F,2 P k/+ck, (4.8)

ad uder c k 0, LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 17 ĝ p β F,2 P k/+(lk c k k/) (ξk c k / ), (4.9) Commet 4.1. (i) This is our firstmai resulti this paper. Thecoditio ξk 2 logk/ 0, which we impose, weakes (hece geeralizes) the coditios imposed i Newey (1997) who required kξk 2/ 0. For series satisfyig ξ k k, the coditio ξk 2 logk/ 0 amouts to klogk/ 0. (4.10) This coditio is the same as that imposed i Stoe (1994), Huag (2003a), ad recetly by Cattaeo ad Farrell (2013) but the result (4.8) is obtaied uder the coditio (4.10) i Stoe (1994) ad Huag (2003a) oly for splie series ad i Cattaeo ad Farrell (2013) oly for local polyomial partitio series. Therefore, our result improves o those i the literature by weakeig the rate requiremets o the growth of k (with respect to ) ad/or by allowig for a wider set of series fuctios. (ii) Uderthecorrect specificatio (c k 0), thefastest L 2 rate of covergece is achieved by settig k so that the approximatio error ad the samplig error are of the same order, k/ ck. Oe cosequece of this result is that for Hölder classes of smoothess order s, Σ s (X), with c k k s/d, we obtai the optimal L 2 rate of covergece by settig k d/(d+2s), which is allowed uder our coditios for all s > 0 if ξ k k (Fourier, splie, wavelet, ad local polyomial partitio series). O the other had, if ξ k is growig faster tha k, the it is ot possible to achieve optimal L 2 rate of covergece for some s > 0. For example, for polyomial series cosidered above, ξ k k, ad so the coditio ξk 2 logk/ 0 becomes k 2 logk/ 0. Hece, optimal L 2 rate of covergece is achieved by polyomial series oly if d/(d+2s) < 1/2 or, equivaletly, s > d/2. Eve though this coditio is somewhat restrictive, it weakes the coditio i Newey (1997) who required k 3 / 0 for polyomial series, so that optimal L 2 rate i his aalysis could be achieved oly if d/(d+2s) 1/3 or, equivaletly, s d. Therefore, our results allow to achieve optimal L 2 rate of covergece i a larger set of classes of fuctios for particular series. (iii) The result (4.9) is cocered with the case whe the model is misspecified (c k 0). It shows that whe k/ 0 ad (l k c k k/) (ξk c k / ) 0, the estimator ĝ( ) coverges i L 2 to the surrogate fuctio p( ) β that provides the best liear approximatio to the target fuctio g( ). I this case, the estimator ĝ( ) does ot geerally coverge i L 2 to the target fuctio g( ).

18 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO 4.2. Poitwise Limit Theory. Next we focus o poitwise limit theory (some authors refer to poitwise limit theory as local asymptotics; see Huag (2003b)). That is, we study asymptotic behavior of α ( β β) ad (ĝ(x) g(x)) for particular α S k 1 ad x X. Here S k 1 deotes the space of vectors α i R k with uit Euclidea orm: α = 1. Note that both α ad x implicitly deped o. As we will show, poitwise results ca be achieved uder weak coditios similar to those we required i Theorem 4.1. The followig lemma plays a key role i our asymptotic poitwise ormality result. Lemma 4.1 (Poitwise Liearizatio). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that ξ 2 k logk/ 0 ad σ2 1. The for ay α S k 1, α ( β β) = α G [p i (ǫ i +r i )]+R 1 (α), (4.11) where the term R 1 (α), summarizig the impact of ukow desig, obeys ξk 2 R 1 (α) logk P (1+ kl k c k ). (4.12) Moreover, α ( β β) = α G [p i ǫ i ]+R 1 (α)+r 2 (α), (4.13) where the term R 2 (α), summarizig the impact of approximatio error o the samplig error of the estimator, obeys R 2 (α) P l k c k. (4.14) Commet 4.2. (i) I summary, the oly coditio that geerally matters for liearizatio (4.11)-(4.12) is that R 1 (α) 0, which holds if ξk 2logk/ 0 ad kξ2 k l2 k c2 klogk/ 0. I particular, liearizatio (4.11)-(4.12) allows for misspecificatio (c k 0 is ot required). I priciple, liearizatio (4.13)-(4.14) also allows for misspecificatio but the bouds are oly useful if the model is correctly specified, so that l k c k 0. As i the theorem o L 2 rate of covergece, our mai coditio is that ξk 2 logk/ 0. (ii) We cojecture that the boud o R 1 (α) ca be improved for splies to ξk 2 R 1 (α) logk P (1+ logk l k c k ). (4.15) sice it is attaied by local polyomials ad splies are also similarly localized. With the help of Lemma 4.1, we derive our asymptotic poitwise ormality result. We will use the followig additioal otatio: Ω := Q 1 E[(ǫ i +r i ) 2 p i p i]q 1 ad Ω 0 := Q 1 E[ǫ 2 ip i p i]q 1.

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 19 Ithetheorem below, wewillimposethecoditio that sup x X E [ ǫ 2 i 1{ ǫ i > M} x i = x ] 0 as M uiformly over. This is a mild uiform itegrability coditio. Specifically, it holds if for some m > 2, sup x X E[ ǫ i m x i = x] 1. I additio, we will impose the coditio that 1 σ 2. This coditio is used to properly ormalize the estimator. Theorem 4.2 (Poitwise Normality). Assume that Coditios A.1-A.3 are satisfied. I additio, assume that (i) sup x X E [ ǫ 2 i 1{ ǫ i > M} x i = x ] 0 as M uiformly over, (ii) 1 σ 2, ad (iii) (ξ 2 k logk/)1/2 (1+k 1/2 l k c k ) 0. The for ay α S k 1, α ( β β) α Ω 1/2 = d N(0,1)+o P (1), (4.16) where we set Ω = Ω but if R 2 (α) P 0, the we ca set Ω = Ω 0. Moreover, for ay x X ad s(x) := Ω 1/2 p(x), p(x) ( β β) s(x) = d N(0,1)+o P (1), (4.17) ad if the approximatio error is egligible relative tothe estimatio error, amely r(x) = o( s(x) ), the ĝ(x) g(x) s(x) = d N(0,1)+o P (1). (4.18) Commet 4.3. (i) This is our secod mai result i this paper. The result delivers poitwise covergece i distributio for ay sequeces α = α ad x = x with α S k 1 ad x X. I fact, the proof of the theorem implies that the covergece is uiform over all sequeces. Note that the ormalizatio factor s(x) is the poitwise stadard error, ad it is of a typical order s(x) k at most poits. I this case the coditio for egligibility of approximatio error r(x)/ s(x) 0, which ca be uderstood as a udersmoothig coditio, ca be replaced by /k lk c k 0. Whe l k c k k s/d logk, which is ofte the case if G is cotaied i a ball i Σ s (X) of fiite radius (see our examples i the previous sectio), this coditio substatially weakes a assumptio i Newey (1997) who required k s/d 0 i a similar set-up. (ii) Whe applied to splies, our result is somewhat less sharp tha that of Huag (2003b). Specifically, Huag required that ξk 2logk/ 0 ad (/k)1/2 l k c k 0 whereas we require (kξk 2logk/)1/2 l k c k 0 i additio to Huag s coditios (see coditio (iii) of the theorem). The differece ca likely be explaied by the fact that we use liearizatio boud (4.12) whereas for splies it is likely that (4.15) holds as well.

20 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO (iii) More geerally, our asymptotic poitwise ormality result, as well as other related results i this paper, applies to ay problem where the estimator of g(x) = p(x) β + r(x) takes the form p(x) β, where β admits liearizatio of the form (4.11)-(4.14). 4.3. Uiform Limit Theory. Fially, we tur to a uiform limit theory. Not surprisig, stroger coditios are required for our results to hold whe compared to the poitwise case. Let m > 2. We will eed the followig assumptio o the tails of the regressio errors. Coditio A.4 (Disturbaces) Regressio errors satisfy sup x X E[ ǫ i m x i = x] 1. It will be coveiet to deote α(x) := p(x)/ p(x) i this subsectio. Moreover, deote ξ L k := sup α(x) α(x ) x,x X:x x x x We will also eed the followig assumptio o the basis fuctios to hold with the same m > 2 as that i Coditio A.4. Coditio A.5 (Basis) Basis fuctios are such that (i) ξ 2m/(m 2) k logk/ 1, (ii) logξk L logk, ad (iii) logξ k logk. The followig lemma provides uiform liearizatio of the series estimator ad plays a key role i our derivatio of the uiform rate of covergece. Lemma 4.2 (Uiform Liearizatio). Assume that Coditios A.1-A.5 are satisfied. The α(x) ( β β) = α(x) G [p i (ǫ i +r i )]+R 1 (α(x)), (4.19) where R 1 (α(x)), summarizig the impact of ukow desig, obeys ξk 2 R 1 (α(x)) logk P ( 1/m logk+ k l k c k ) =: R 1 (4.20) uiformly over x X. Moreover, α(x) ( β β) = α(x) G [p i ǫ i ]+R 1 (α(x)) +R 2 (α(x)), (4.21) where R 2 (α(x)), summarizig the impact of approximatio error o the samplig error of the estimator, obeys R 2 (α(x)) P logk lk c k =: R 2 (4.22) uiformly over x X.

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 21 Commet 4.4. As i the case of poitwise liearizatio, our results o uiform liearizatio (4.19)-(4.20) allow for misspecificatio (c k 0 is ot required). I priciple, liearizatio (4.21)-(4.22) also allows for misspecificatio but the bouds are most useful if the model is correctly specified so that (logk) 1/2 l k c k 0. We are ot aware of ay similar uiform liearizatio result i the literature. We believe that this result is useful i a variety of problems. Below we use this result to derive good uiform rate of covergece of the series estimator. Aother applicatio of this result would be i testig shape restrictios i the oparametric model. The followig theorem provides uiform rate of covergece of the series estimator. Theorem 4.3 (Uiform Rate of Covergece). Assume that Coditios A.1-A.5 are satisfied. The Moreover, for R 1 ad R 2 give above we have sup α(x) G [p i ǫ i ] P logk. (4.23) x X ad sup p(x) ξ ( β β) P k ( logk+ R 1 + R 2 ) (4.24) x X ξ sup ĝ(x) g(x) P k ( logk+ R 1 + R 2 )+l k c k. (4.25) x X Commet 4.5. This is our third mai result i this paper. Assume that G is a ball i Σ s (X) of fiite radius, l k c k k s/d, ξ k k, ad R 1 + R 2 (logk) 1/2. The the boud i (4.25) becomes klogk sup ĝ(x) g(x) P +k s/d. x X Therefore, settig k (log/) d/(2s+d), we obtai ( ) log s/(2s+d) sup ĝ(x) g(x) P, x X which is the optimal uiform rate of covergece i the fuctio class Σ s (X); see Stoe (1982). To the best of our kowledge, our paper is the first to show that the series estimator attais the optimal uiform rate of covergece uder these rather geeral coditios; see the ext commet. We also ote here that it has bee kow for a log time that a local polyomial (kerel) estimator achieves the same optimal uiform rate of covergece; see, for example, Tsybakov(2009), ad it was also show recetly by Cattaeo ad Farrell(2013) that local polyomial partitio series estimator also achieves the same rate. Recetly, i

22 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO a effort to relax the idepedece assumptio, the workig paper Che ad Christese (2013), which appeared i ArXiv i 2013, approximately 1 year after our paper was posted to ArXiv ad submitted for publicatio, 2 derived similar uiform rate of covergece result allowig for β-mixig coditios, see their Theorem 4.1 for specific coditios. Commet 4.6. Primitive coditios leadig to iequalities l k c k k s/d ad ξ k k are discussed i the previous sectio. Also, uder the assumptio that l k c k k s/d, iequality R 2 (logk) 1/2 follows automatically from the defiitio of R 2. Thus, oe of the critical coditios to attai the optimal uiform rate of covergece is that we require R 1 (logk) 1/2. Uder our other assumptios, this coditio holds if klogk/ 1 2/m 1 ad k 2 2s/d / 1, ad so we ca set k (log/) d/(2s+d) if d/(2s +d) < 1 2/m ad (2d 2s)/(2s+d) < 1 or, equivaletly, m > 2+d/s ad s/d > 1/4. After establishig the auxiliary results o the uiform rate of covergece, we preset two results o iferece based o the series estimator. The first result o iferece is cocered with the strog approximatio of a series process by a Gaussia process ad is a (relatively) mior extesio of the result obtaied by Cherozhukov et al. (2013). The extesio is udertake to allow for a o-vaishig specificatio error to cover misspecified models. I particular, we make a distictio betwee Ω = Q 1 E[(ǫ i + r i ) 2 p i p i ]Q 1, ad Ω 0 = Q 1 E[ǫ 2 i p ip i ]Q 1 which are potetially asymptotically differet if R 2 P 0. To state the result, let a be some sequece of positive umbers satisfyig a. Theorem 4.4 (Strog Approximatio by a Gaussia Process). Assume that Coditios A.1-A.5 are satisfied with m 3. I additio, assume that (i) R 1 = o P (a 1 ), (ii) 1 σ2, ad (iii) a 6 k 4 ξ 2 k (1+l3 k c3 k )2 log 2 / 0. The for some N k N(0,I k ), α(x) ( β β) α(x) Ω 1/2 = d so that for s(x) = Ω 1/2 p(x), p(x) ( β β) s(x) ad if sup x X r(x) / s(x) = o(a 1 ), the ĝ(x) g(x) s(x) α(x) Ω 1/2 α(x) Ω 1/2 N k +o P (a 1 ) i l (X), (4.26) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.27) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.28) where we set Ω = Ω but if R 2 = o P (a 1 ), the we ca set Ω = Ω 0. 2 Our paper was submitted for publicatio ad to ArXiv o December 3, 2012. Our result as stated here did ot chage sice the origial submissio.

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 23 Commet 4.7. Oe might hope to have a result of the form ĝ(x) g(x) s(x) d G(x) i l (X), (4.29) where {G(x) : x X} is some fixed zero-mea Gaussia process. However, oe ca show that the process o the left-had side of (4.29) is ot asymptotically equicotiuous, ad so it does ot have a limit distributio. Istead, Theorem 4.4 provides a approximatio of the series process by a sequece of zero-mea Gaussia processes {G k (x) : x X} G k (x) := α(x) Ω 1/2 α(x) Ω 1/2 N k, with the stochastic error of size o P (a 1 ). Sice a, uder our coditios the theorem implies that the series process is well approximated by a Gaussia process, ad so the theorem ca be iterpreted as sayig that i large samples, the distributio of the series process depeds o the distributio of the data oly via covariace matrix Ω; hece, it allows us to perform iferece based o the whole series process. Note that the coditios of the theorem are quite strog i terms of growth requiremets o k, but the result of the theorem is also much stroger tha the poitwise ormality result: it asserts that the etire series process is uiformly close to a Gaussia process of the stated form. Our result o the strog approximatio by a Gaussia process plays a importat role i our secod result o iferece that is cocered with the weighted bootstrap. Cosider a set of weights h 1,...,h that are i.i.d. draws from the stadard expoetial distributio ad are idepedet of the data. For each draw of such weights, defie the weighted bootstrap draw of the least squares estimator as a solutio to the least squares problem weighted by h 1,...,h, amely β b argmi b R k E [h i (y i p ib) 2 ]. (4.30) For all x X, deote ĝ b (x) = p(x) βb. The followig theorem establishes a ew result that states that the weighted bootstrap distributio is valid for approximatig the distributio of the series process. Theorem 4.5 (Weighted Bootstrap Method). (1) Assume that Coditios A.1-A.5 are satisfied. I additio, assume that (ξ k (log) 1/2 ) 2m/(m 2) 1. The the weighted bootstrap process satisfies α(x) ( β b β) = α(x) G [(h i 1)p i (ǫ i +r i )]+R b 1(α(x)),

24 BELLONI, CHERNOZHUKOV, CHETVERIKOV, AND KATO where R1 b (α(x)) obeys R b 1(α(x)) P ξk 2log3 ( 1/m log+ k l k c k ) =: R 1 b (4.31) uiformly over x X. (2) If, i additio, Coditios A.4 ad A.5 are satisfied with m 3 ad (i) R b 1 = o P (a 1 ), (ii) 1 σ 2, ad (iii) a 6 k 4 ξ 2 k (1+l3 k c3 k )2 log 2 / 0 hold, the for s(x) = Ω 1/2 p(x) ad some N k N(0,I k ), ad so p(x) ( β b β) s(x) ĝ b (x) ĝ(x) s(x) = d s(x) s(x) N k +o P (a 1 ) i l (X), (4.32) = d s(x) s(x) N k +o P (a 1 ) i l (X). (4.33) where we set Ω = Ω, but if R 2 = o P (a 1 ), the we ca set Ω = Ω 0. (3) Moreover, the bouds (4.31), (4.32), ad (4.33) cotiue to hold i P-probability if we replace the ucoditioal probability P by the coditioal probability computed give the data, amely if we replace P by P ( D) where D = {(x i,y i ) : i = 1,...,}. Commet 4.8. (i) This is our fourth mai ad ew result i this paper. The theorem implies that the weighted bootstrap process ca be approximated by a copy of the same Gaussia process as that used to approximate origial series process. (ii) We emphasize that the theorem does ot require the correct specificatio, that is the case c k 0 is allowed. Also, i this theorem, symbol P refers to a joit probability measure with respect to the data D = {(x i,y i ) : i = 1,...,} ad the set of bootstrap weights {h i : i = 1,...,}. We close this sectio by establishig sufficiet coditios for cosistet estimatio of Ω. Recall that Q = E[p i p i ] = I. I additio, deote Σ = E[(ǫ i +r i ) 2 p i p i ], Q = E [p i p i ], ad Σ = E [ ǫ 2 i p ip i ] where ǫ i = y i p i β, ad let v = (E[max 1 i ǫ i 2 ]) 1/2. Theorem 4.6 (Matrices Estimatio). Assume that Coditios A.1-A.5 are satisfied. I additio, assume that R 1 + R 2 (logk) 1/2. The ξk 2 Q Q logk ξk 2 P = o(1) ad Σ Σ P (v 1+l k c k ) logk = o(1).

LEAST SQUARES SERIES: POINTWISE AND UNIFORM RESULTS 25 Moreover, for Ω = Q 1 Σ Q 1 ad Ω = Q 1 ΣQ 1, ξk 2 Ω Ω P (v 1+l k c k ) logk = o(1). Commet 4.9. Theorem 4.6 allows for cosistet estimatio of the matrix Q uder the mild coditio ξk 2 logk/ 0 ad for cosistet estimatio of the matrices Σ ad Ω uder somewhat more restricted coditios. Not surprisigly, the estimatio of Σ ad Ω depeds o the tail behavior of the error term via the value of v. Note that uder Coditio A.4, we have that v 1/m. 5. Rates ad Iferece o Liear Fuctioals I this sectio, we derive rates ad iferece results for liear fuctioals θ(w),w I of the coditioal expectatio fuctio such as its derivative, average derivative, or coditioal average derivative. To a large extet, with the exceptio of Theorem 5.6, the results preseted i this sectio ca be cosidered as a extesio of results preseted i Sectio 4, ad so similar commets ca be applied as those give i Sectio 4. Theorem 5.6 deals with costructio of uiform cofidece bads for liear fuctioals uder weak coditios ad is a ew result. By the liearity of the series approximatios, the liear fuctioals ca be see as liear fuctios of the least squares coefficiets β up to a approximatio error, that is θ(w) = l θ (w) β +r θ (w), w I, where l θ (w) β is the series approximatio, with l θ (w) deotig the k-vector of loadigs o the coefficiets, ad r θ (w) is the remaider term, which correspods to the approximatio error. Throughout this sectio, we assume that I is a subset of some Euclidea space R l equipped with its usual orm. We allow I = I to deped o but for simplicity, we assume that the diameter of I is bouded from above uiformly over. Results allowig for the case where I is expadig as grows ca be covered as well with slightly more techicalities. I order to perform iferece, we costruct estimators of σθ 2(w) = l θ(w) Ωl θ (w)/, the variace of the associated liear fuctioals, as σ θ 2 (w) = l θ(w) Ωlθ (w)/. (5.34) I what follows, it will be coveiet to have the followig result o cosistecy of σ θ (w):