On Binscatter Supplemental Appendix

Size: px

Start display at page:

Download "On Binscatter Supplemental Appendix"

Joella Barnett
5 years ago
Views:

1 O Biscatter Supplemetal Appedix Matias D. Cattaeo Richard K. Crump Max H. Farrell Yigjie Feg February 12, 2019 Abstract This plemet collects all techical proofs, more geeral theoretical results tha those reported i the mai paper, ad other methodological ad umerical results. New theoretical results for partitioig-based series estimatio are obtaied that may be of idepedet iterest. See also Stata ad R compaio software available at Departmet of Ecoomics ad Departmet of Statistics, Uiversity of Michiga. Capital Markets Fuctio, Federal Reserve Bak of New York. Booth School of Busiess, Uiversity of Chicago. Departmet of Ecoomics, Uiversity of Michiga. 1

2 Cotets SA-1 Setup 1 SA-1.1 Assumptios SA-1.2 Notatio SA-2 Techical Lemmas 4 SA-3 Mai Results 8 SA-3.1 Itegrated Mea Squared Error SA-3.2 Poitwise Iferece SA-3.3 Uiform Iferece SA-3.4 Applicatios SA-4 Implemetatio Details 13 SA-4.1 Rule-of-thumb Selector SA-4.2 Direct-plug-i Selector SA-5 Proof 15 SA-5.1 Proof of Lemma SA SA-5.2 Proof of Lemma SA SA-5.3 Proof of Lemma SA SA-5.4 Proof of Lemma SA SA-5.5 Proof of Lemma SA SA-5.6 Proof of Lemma SA SA-5.7 Proof of Lemma SA SA-5.8 Proof of Lemma SA SA-5.9 Proof of Lemma SA SA-5.10 Proof of Lemma SA SA-5.11 Proof of Theorem SA SA-5.12 Proof of Corollary SA SA-5.13 Proof of Theorem SA SA-5.14 Proof of Corollary SA SA-5.15 Proof of Theorem SA SA-5.16 Proof of Theorem SA SA-5.17 Proof of Theorem SA SA-5.18 Proof of Corollary SA SA-5.19 Proof of Theorem SA

3 SA-1 Setup This sectio repeats the setup i the mai paper, ad itroduce some otatio for the mai aalysis. Suppose that {(y i, x i, w i : 1 i } is a radom sample satisfyig the followig regressio model y i = µ(x i + w iγ + ɛ i, Eɛ i x i, w i = 0, (SA-1.1 where y i is a scalar respose variable, x i is a scalar covariate, w i is a vector of additioal cotrol variables of dimesio d, ad the parameter of iterest is the oparametric compoet µ(. Biscatter estimators are usually costructed based o quatile-spaced partitios. Specifically, the relevat port of x i is partitioed ito J disjoit itervals employig the empirical quatiles, leadig to the partitioig scheme = { B 1, B 2,..., B J }, where x(1, x ( /J B j = x( (j 1/J, x ( j/j x( (J 1/J, x ( if j = 1 if j = 2, 3,..., J 1, if j = J x (i deotes the i-th order statistic of the sample {x 1, x 2,..., x }, ad is the floor operator. The umber of bis J will play the role of tuig parameter for the biscatter method, ad is assumed to diverge: J as throughout the plemet, uless explicitly stated otherwise. I the mai paper, the p-th order piecewise polyomial basis, for some choice of p = 0, 1, 2,..., is defied as b(x = 1 B1 (x 1 B2 (x 1 BJ (x 1 x x p, where 1 A (x = 1(x A with 1( deotig the idicator fuctio, ad is tesor product operator. Without loss of geerality, we redefie b(x as a stadardized rotated basis for coveiece of aalysis. Specifically, for each α = 0,..., p, ad j = 1,..., J, the polyomial basis of degree α ported o B j is rotated ad rescaled: 1 Bj (xx α ( x x( (j 1/J α, J 1 Bj (x ĥ j where ĥj = x ( j/j x ( (j 1/J. Thus, each local polyomial is cetered at the start of each

4 bi ad scaled by the legth of the bi. J is a additioal scalig factor which will help simplify some expressios of our results. We maitai the otatio b(x for this redefied basis, sice it is equivalet to the origial oe i the sese that they represet the same (liear fuctio space. Imposig the restrictio that the estimated fuctio is (s 1-times cotiuously differetiable for 1 s p, we itroduce a ew basis b s (x = ( b s,1 (x,..., b s,ks (x = T s b(x, Ks = (p + 1J s(j 1, where T s := T s ( is a K s (p + 1J matrix depedig o, which trasforms a piecewise polyomial basis to a smoothed biscatter basis. Whe s = 0, we let T 0 = I (p+1j, the idetity matrix of dimesio (p + 1J. Thus b 0 (x = b(x, the discotiuous basis without ay costraits. Whe s = p, b p (x is the well-kow B-splie basis of order p + 1 with simple kots. Whe 0 < s < p, they ca be defied similarly as B-splies with kots of certai multiplicities. See Defiitio 4.1 i Sectio 4 of Schumaker (2007 for more details. Note that we require s p, sice if s = p + 1, b s (x reduces to a global polyomial basis of degree p. A key feature of the trasformatio matrix T s that will be employed i the aalysis is that o every row it has at most (p ozeros, ad o every colum it has at most p + 1 ozeros. The expressio of these elemets is very cumbersome. The proof of Lemma SA-2.2 describes the structure of T s i more detail, ad provides a explicit represetatio for T p. Give such a choice of basis, a covariate-adjusted (geeralized biscatter estimator is µ (v (x = b (v s (x β, ( β, γ = arg mi β,γ (y i b s (x i β w iγ 2, s p. (SA-1.2 where b (v s (x = d v bs (x/dx v for some v Z + such that v p. SA-1.1 Assumptios We impose the followig assumptio o the data geeratig process, which is more geeral tha the oe preseted i the mai paper. We use λ mi (A to deote the miimum eigevalue of a square matrix A. Assumptio SA-1 (Data Geeratig Process. 2

5 (i {(y i, x i, w i : 1 i } are i.i.d satisfyig (SA-1, ad x i follows a distributio fuctio F (x with a cotiuous (Lebesgue desity f(x bouded away from zero; µ( is (p + 1-times cotiuously differetiable; (ii σ 2 (x := Eɛ 2 i x i = x is cotiuous ad bouded away from zero, ad E ɛ i ν x i = x 1 for some ν > 2; (iii Ew i x i = x is ς-times cotiuously differetiable for some ς 1, E w i ν x i = x 1, E w i Ew i x i 4 x i 1, λ mi (E(w i Ew i x i (w i Ew i x i x i 1, ad Eɛ 2 i w i, x i 1. Part (i ad (ii are stadard coditios employed i oparametric series literature (Cattaeo, Farrell, ad Feg, 2018, ad refereces therei. Part (iii icludes a set of coditios similar to those used i Cattaeo, Jasso, ad Newey (2018a,b to aalyze the semiparametric partially liear regressio model, which esures the egligibility of the estimatio error of γ. The coditios i Part (i imply that the (relevat port of x i, deoted as, is a compact iterval. Without loss of geerality, it is ormalized to 0, 1 throughout the plemetal appedix. SA-1.2 Notatio For vectors, deotes the Euclidea orm, deotes the -orm, ad 0 deotes the umber of ozeros. For matrices, is the operator matrix orm iduced by the L 2 orm, ad is the matrix orm iduced by the remum orm, i.e., the maximum absolute row sum of a matrix. For a square matrix A, λ max (A ad λ mi (A are the maximum ad miimum eigevalues of A, respectively. We will use S L to deote the uit circle i R L, i.e., a = 1 for ay a S L. For a real-valued fuctio g( defied o a measure space Z, let g Q,2 := ( Z g 2 dq 1/2 be its L 2 -orm with respect to the measure Q. I additio, let g = z Z g(z be L -orm of g(, ad g (v (z = d v g(z/dz v be the vth derivative for v 0. For sequeces of umbers or radom variables, we use a b to deote that lim a /b is fiite, a P b or a = O P (b to deote lim ε lim P a /b ε = 0, a = o(b implies a /b 0, ad a = o P (b implies that a /b P 0, where P deotes covergece i probability. a b implies that a b ad b a. For two radom variables ad Y, = d Y implies that they have the same probability distributio. 3

6 We employ stadard empirical process otatio: E g(x i = 1 g(x i, ad G g(x i = 1 (g(x i Eg(x i. I additio, we employ the otio of coverig umber extesively i the proofs. Specifically, give a measurable space (S, S ad a suitably measurable class of fuctios G mappig S to R equipped with a measurable evelop fuctio Ḡ(z g G g(z. The coverig umber of N(G, L 2 (Q, ε is the miimal umber of L 2 (Q-balls of radius ε eeded to cover G. The coverig umber of G relative to the evelop is deoted as N(G, L 2 (Q, ε Ḡ Q,2. Give the radom partitio, we will use the otatio E to deote that the expectatio is take with the partitio uderstood as fixed. To further simplify otatio, we let {ˆτ 0 ˆτ 1 ˆτ J } deote the empirical quatile sequece employed by. Accordigly, let {τ 0 τ J } be the populatio quatile sequece, i.e., τ j = F 1 (j/j for 0 j J. The = {B 1,..., B J } deotes the partitio based o populatio quatiles, i.e., τ0, τ 1 B j = τj 1, τ j τj 1, τ J if j = 1 if j = 2, 3,..., J 1. if j = J Let h j = F 1 (j/j F 1 ((j 1/J be the width of B j. b s (x deotes the (smooth biscatter basis based o the oradom partitio. Moreover, x i s are collected i a matrix = x 1,..., x, all the data are collected i D = {(y i, x i, w i : 1 i }, z outputs the smallest iteger o less tha z ad a b = mi{a, b}. Fially, we sometimes write b s (x; = (b s,1 (x;,..., b s,ks (x; to emphasize a biscatter basis is costructed based o a particular partitio. Clearly, b s (x = b s (x; ad b s (x = b s (x;. SA-2 Techical Lemmas This sectio collects a set of techical lemmas, which are key igrediets of our mai theorems. The followig expressio of the coefficiet estimators, also kow as backfittig i statistics literature, will be quite coveiet for theoretical aalysis: 4

7 β = (B B 1 B (Y W γ, γ = (W M B W 1 (W M B Y where Y = (y 1,..., y, B = ( b s (x 1,..., b s (x, W = (w 1,..., w, M B = I B(B B 1 B. It is well kow that the least squares estimator provides a best liear approximatio to the target fuctio. For ay give partitio, the populatio least squares estimator is defied as β µ ( := arg mi β E(µ(x i b s (x i ; β 2. Accordigly, r µ (x; = µ(x b s (x; β µ ( deotes the L 2 approximatio error. We let β µ := β µ (, β µ := β µ (, r µ (x := r µ (x; ad r µ (x := r µ (x;. I additio, we itroduce the followig defiitios: Q := Q( := E b s (x i b s (x i, Q := Q( := Eb s (x i b s (x i, Σ := Σ( := E b s (x i b s (x i (y i b s (x i β w i γ 2, Σ := Σ( := E E bs (x i b s (x i ɛ 2 i, Σ := Σ( := E b s (x i b s (x i ɛ 2 i, Ω(x := Ω(x; := Ω(x := Ω(x; := (v b s (x Q 1 Σ Q 1 b(v s (x, ad b (v s (x Q 1 ΣQ 1 b(v s (x. All quatities with or deped o the radom partitio, ad those without ay accets are oradom with the oly exceptio of Ω(x, where the basis (v b s (x still depeds o. The asymptotic properties of partitioig-based estimators rely o a partitio that is ot be too irregular (Cattaeo, Farrell, ad Feg, I the biscatter settig, we let f = f(x ad f = if f(x, ad for ay partitio with J bis, we let h j ( deote the legth of the jth bi i. The, we itroduce the family of partitios: { max 1 j J h j ( Π = 3 f } :. (SA-2.1 mi 1 j J h j ( f Ituitively, if a partitio belogs to Π, the the legths of its bis do ot differ too much, a property usually referred to as quasi-uiformity i approximatio theory. Our first lemma shows that a quatile-spaced partitio possesses this property with probability approachig oe. 5

8 Lemma SA-2.1 (Quasi-Uiformity of Quatile-Spaced Partitios. Suppose that Assumptio SA- 1(i holds. If J log J = o(1 ad log J (ii Π with probability approachig oe. = o(1, the (i max 1 j J ĥj h j P J 1 J log J/, ad As discussed previously, T s liks the more complex splie basis with a simple piecewise polyomial basis. Recall that T s = T s ( depeds o the quatile-based partitio. The ext lemma describes its key features, ad gives a precise defiitio of T s := T s (, the trasformatio matrix correspodig to the oradom basis b s (x, i.e., b s (x = T s b 0 (x. Lemma SA-2.2 (Trasformatio Matrix. Suppose that Assumptio SA-1(i holds. If J log J = o(1 ad log J = o(1, the b s (x = T s b0 (x with T s P 1, T s P 1, T s T s P J log J, ad T s T s P J log J. The ext lemma characterizes the local basis b s (x ad the associated Gram matrix. Lemma SA-2.3 (Local Basis. Suppose that Assumptio SA-1(i holds. The b (v s (x 0 (p + 1 2, ad 1 λ mi (Q λ max (Q 1. If, i additio, J log J = o(1 ad log J = o(1, the b (v s (x P J 1 2 +v, Q Q P J log J/, Q 1 P 1, ad Q 1 Q 1 P J log J/. The ext lemma shows that the limitig variace is bouded from above ad below if properly scaled, which is key to poitwise ad uiform iferece. Recall that Ω(x = Ω(x; ad Ω(x = Ω(x;. Lemma SA-2.4 (Asymptotic Variace. Suppose that Assumptio SA-1(i-(ii holds. If J log J = o(1 ad log J = o(1, the J 1+2v P if Ω(x Ω(x P J 1+2v, ad J 1+2v if Ω(x Ω(x J 1+2v. 6

9 As explaied before, r µ (x is uderstood as L 2 approximatio error of least squares estimators for µ(x. The ext two lemmas establish bouds o r µ (x ad its projectio oto the space spaed by b s (x i terms of -orm. Lemma SA-2.5 (L 2 Approximatio Error. Uder Assumptio SA-1(i, if J log J log J = o(1, the b (v s (x βµ µ (v (x P J p 1+v. = o(1 ad Lemma SA-2.6 (Projectio of L 2 Approximatio Error. Uder Assumptio SA-1(i, if J log J = o(1 ad log J = o(1, the b (v s J log J (x Q 1 E b s (x i r µ (x i P J p 1+v. The ext lemma gives a boud o the variace compoet of the biscatter estimator, which is the mai buildig block of uiform covergece. Lemma SA-2.7 (Uiform Covergece: Variace. Suppose that Assumptio SA-1(i(ii hold. If J ν 2 ν log J = o(1 ad log J = o(1, the b (v s J log J (x Q 1 E b s (x i ɛ i J v. Let {a : 1} be a sequece of o-vaishig costats, which will be used later to characterize the strog approximatio rate. The ext theorem shows that uder certai coditios the estimatio of γ does ot impact the asymptotic iferece o the oparametric compoet. Lemma SA-2.8 (Covariate Adjustmet. Suppose that Assumptio SA-1 holds. If J log J a J = o(1, a J p (ς (p = o(1 ad, the = o(1, ˆγ γ = o P (a 1 J/, ad b(v s (x Q 1 E b s (x i w i P J v for each x. If, i additio, J ν 2 ν log J 1, the b (v s (x Q 1 E b s (x i w i P J v. Collectig the previous results, the ext lemma costructs the rate of uiform covergece for biscatter estimators. 7

10 Lemma SA-2.9 (Uiform Covergece. Suppose that Assumptio SA-1 holds. If J p (ς (p = o(1 ad J ν 2 ν log J 1, the J log J µ (v (x µ (v (x P J v + J p 1+v The last lemma shows that the proposed variace estimator is cosistet. Lemma SA-2.10 (Variace Estimate. Suppose that Assumptio SA-1 holds. If J o(1 ad J p (ς (p = o(1, the Σ Σ P J p 1 + J log J. 1 2 ν ν 2 ν (log J ν 2 ν = As a result, Ω(x Ω(x P J 1+2v( J p 1 + J log J 1 2 ν. SA-3 SA-3.1 Mai Results Itegrated Mea Squared Error The followig theorem proves the result stated i Theorem 1 of the mai paper. Theorem SA-3.1 (IMSE. Suppose that Assumptio SA-1 holds. Let ω(x be a cotiuous weightig fuctio over bouded away from zero. If J log J = o(1 ad J p (ς (p = o(1, the ( 2 E µ (v (x µ (v D (x ω(x = J 1+2v ( J V (p, s, v+j 2(p+1 v 1+2v B (p, s, v+o P +J 2(p+1 v. where V (p, s, v := J (1+2v trace (Q 1 ΣQ 1 B (p, s, v := J 2p+2 2v b (v s (xb (v (x ω(xdx 1, ( b (v s (x β µ µ (v (x 2ω(xdx 1. s As a cosequece, the IMSE-optimal bi is J IMSE = ( 1 2(p v + 1B (p, s, v 2p+3 1 2p+3. (1 + 2vV (p, s, v 8

11 Regardig the bias compoet B (p, s, v, a more explicit but more cumbersome expressio is available i the proof, which forms the foudatio of our bi selectio procedure discussed i Sectio SA-4. However, for s = 0, both variace ad bias terms admit cocise explicit formulas, as show i the followig corollary. To state the results, we itroduce a polyomial fuctio B p (x for p Z + such that ( 2p p Bp (x is the shifted Legedre polyomial of degree p o 0, 1. These polyomials are orthogoal o 0, 1 with respect to the Lebesgue measure. O the other had, let ψ(z = (1, z,..., z p. Corollary SA-3.1. Uder the assumptios i Theorem SA-3.1, V (p, 0, v = V (p, 0, v + o(1 ad B (p, 0, v = B(p, 0, v + o(1 where {( V (p, 0, v := trace ψ(zψ(z dz B(p, 0, v := 1 0 B p+1 v(z 2 dz ((p + 1 v! µ (p+1 (x 2 ω(xdx. f(x 2p+2 2v } ψ (v (zψ (v (z dz σ 2 (xf(x 2v ω(xdx, Remark SA-3.1. The above corollary implies that the bias costat B(p, 0, v is ozero uless µ (p+1 (x is zero almost everywhere o. For other s > 0, otice that b (v s (x β µ ca be viewed as a approximatio of µ (v (x i the space spaed by piecewise polyomials of order (p v. The best L 2 (x approximatio error i this space, accordig to the above corollary, is bouded away from zero if rescaled by J p+1 v. b (v s (x β µ, as a o-optimal L 2 approximatio i such a space, must have a larger L 2 error tha the best oe (i terms of L 2 -orm. Sice ω(x ad f(x are both bouded ad bouded away from zero, the above fact implies that except for the quite special case metioed previously, B(p, s, v 1, a slightly stroger result tha that i Theorem SA-3.1. I all aalysis that follows, we simply exclude this special case whe the leadig bias degeerates, ad thus J IMSE 1 2p+3. SA-3.2 Poitwise Iferece We cosider statistical iferece based o the Studetized t-statistic: T p (x = µ(v (x µ (v (x. Ω(x/ 9

12 Let Φ( be the cumulative distributio fuctio of a stadard ormal radom variable. The followig theorem proves Lemma 1 of the mai paper. Theorem SA-3.2. Suppose that Assumptio SA-1 holds. If E ɛ i ν x i = x 1 for some ν 3, J ν 2 ν (log J ν 2 ν = o(1 ad J 2p 3 = o(1, the P( T p (x u Φ(u = o(1, for each x. u R Let Îp(x = µ (v (x ± c Ω(x/ for some critical value c to be specified. Give the above theorem, we have the followig corollary, a result stated i Theorem 2 fo the mai paper is valid. Corollary SA-3.2. For give p, pose that the coditios i Theorem SA-3.2 hold, ad further assume that µ(x ad Ew i x i = x are (p + q + 1-times cotiuously differetiable for some q 1. If J = J IMSE ad c = Φ 1 (1 α/2, the P µ (v (x Îp+q(x = 1 α + o(1, for all x, SA-3.3 Uiform Iferece Recall that {a : 1} is a sequece of o-vaishig costats. We will first show that the (feasible Studetized t-statistic process { T p (x : x } ca be approximated by a Gaussia process i a proper sese at certai rate. Theorem SA-3.3 (Strog Approximatio. Suppose that Assumptio SA-1 holds. If J(log J ν = o(a 2, J 1 = o(a 2 ad J 2p 3 = o(a 2, the, o a properly eriched probability space, there exists some K s -dimesioal stadard ormal radom vector N Ks such that for ay η > 0, ( P T p (x Z p (x > ηa 1 = o(1, Z p (x = b 0 (x T sq 1 Σ 1/2 N Ks. Ω(x The approximatig process {Z p (x : x } is a Gaussia process coditioal o by costructio. I practice, oe ca replace all ukows i Z p (x by their sample aalogues, ad the 10

13 costruct the followig feasible (coditioal Gaussia process: Ẑ p (x = b s (x Q 1 Σ1/2 N Ks. Ω(x where N Ks deotes a K s -dimesioal stadard ormal vector idepedet of the data D = {(y i, x i, w i : 1 i }. Theorem SA-3.4 (Plug-i Approximatio. Suppose that the coditios i Theorem SA-3.3 holds. The, o a properly erich probability space there exists K s -dimesioal stadard ormal vector N Ks idepedet of D such that for ay η > 0, P Ẑp(x Z p (x > ηa 1 D = o P (1. SA-3.4 Applicatios Theorem SA-3.3 ad SA-3.4 offer a way to approximate the distributio of the whole t-statistic process. A direct applicatio of this result is to costructig uiform cofidece bad, which relies o distributioal approximatio to the remum of the t-statistic process. The followig theorem proves Lemma 2 of the mai paper. Theorem SA-3.5 (Supremum Approximatio. Suppose that the coditios of Theorem SA-3.3 hold with a = log J. The ( P u R T ( p (x u P Ẑp(x ud = op (1. Usig the above theorem, we have the followig corollary, which is a result stated i Theorem 3 of the mai paper. Corollary SA-3.3. For give p, pose the coditios i Theorem SA-3.5 hold ad J = J IMSE. Further, assume that µ(x ad Ew i x i = x are (p + q + 1-times cotiuously differetiable for { } some q 1. If c = if c R + : P Ẑp+q(x c D 1 α. The P µ (v (x Îp+q(x, for all x = 1 α + o(1. 11

14 As aother applicatio, the mai paper discusses two classes of hypothesis testig problems: testig parametric specificatios ad certai shape restrictios. To be specific, cosider the followig two problems: (i Ḧ0 : µ (v (x m (v (x, θ = 0 for some θ Θ v.s. Ḧ A : µ (v (x m (v (x, θ > 0 for all θ Θ. (ii Ḣ0 : (µ (v (x m (v (x, θ 0 for a certai θ Θ v.s. Ḣ A : (µ (v (x m (v (x, θ > 0 for θ Θ. The testig problem i (i ca be viewed as a two-sided test where the equality betwee two fuctios holds uiformly over x. I this case, we itroduce θ as a cosistet estimator of θ uder Ḧ0. The we rely o the followig test statistic: T p (x = µ(v (x m (v (x, θ. Ω(x/ The ull hypothesis is rejected if T p (x > c for some critical value c. The testig problem i (ii ca be viewed as a oe-sided test where the iequality holds uiformly over x. Importatly, it should be oted that uder both Ḣ0 ad ḢA, we fix θ to be the same value i Θ. I such a case, we itroduce θ as a cosistet estimator of θ uder both Ḣ 0 ad ḢA. The we will rely o the followig test statistic T p (x = µ(v (x m (v (x, θ. Ω(x/ The ull hypothesis is rejected if T p (x > c for some critical value c. The followig theorem characterizes the size ad power of such tests. Theorem SA-3.6 (Hypothesis Testig. Let the coditios i Theorem SA-3.3 holds with a = log J. (i (Specificatio Let c = if { c R + : P Ẑp(x cd } 1 α. 12

15 Uder Ḧ0, if µ (v (x m (v (x; θ ( J = o 1+2v P log J, the lim P T p (x > c = α. Uder ḦA, if there exists some θ Θ such that m (v (x, θ m (v (x, θ = o P (1, ad J v J log J = o(1, the lim P T p (x > c = 1. { c R + : P (ii (Shape Restrictio Let c = if m (v (x; θ m (v (x, θ ( J = o 1+2v P log J Uder Ḣ0, Uder ḢA, if J v J log J = o(1, Ẑ p (x cd. lim P T p (x > c α. lim P T p (x > c = 1. } 1 α. Assume that The robust bias-corrected testig procedures give i Theorem 4 ad 5 of the mai paper are immediate corollaries of Theorem SA-3.6, oce the stroger coditios o the smoothess of µ(x ad Ew i x i = x are assumed. To coserve some space, we do ot repeat their statemets. SA-4 Implemetatio Details We discuss the implemetatio details for data-drive selectio of the umber of bis, based o the itegrated mea squared error expasio (see Theorem SA-3.1 ad Corollary SA-3.1 preseted above. We offer two procedures for estimatig the bias ad variace costats, ad oce these estimates ( B (p, s, v ad V (p, s, v are available, the estimated optimal J is ( 1 2(p v + 1 B (p, s, v 2p+3 1 Ĵ IMSE = (1 + 2v V 2p+3. (p, s, v We always let ω(x = f(x as weightig fuctio for cocreteess. 13

16 SA-4.1 Rule-of-thumb Selector A rule-of-thumb choice of J is obtaied based o Corollary SA-3.1, i which case s = 0. Regardig the variace costats V (p, 0, v, the ukows are the desity fuctio f(x ad the coditioal variace σ 2 (x. A Gaussia referece model is employed for f(x. For the coditioal variace, we ote that σ 2 (x = Ey 2 i x i, w i (Ey i x i, w i 2. The two coditioal expectatios ca be approximated by global polyomial regressios of degree p + 1. The, the variace costat is estimated by {( 1 V p,0,v = trace ψ(zψ(z dz 0 } ψ (v (zψ (v (z dz 1 σ 2 (x i f(x i 2v Regardig the bias costat, the ukows are f(x, which is estimated usig the Gaussia referece model, ad µ (p+1 (x, which ca be estimated based o the global regressio that approximates Ey i x i, w i. The the bias costat is estimated by B(p, 0, v = 1 0 B p+1 v(z 2 dz ((p + 1 v! 2 1 µ (p+1 (x i 2 f(x i 2p+2 2v. The resultig J selector employs the correct rate but a icosistet costat approximatio. Recall that s does ot chage the rate of J IMSE. Thus, eve for other s > 0, this selector still gives a correct rate. SA-4.2 Direct-plug-i Selector The direct-plug-i selector is implemeted based o the biscatter estimators, which apply to ay user-specified p, s ad v. It requires a prelimiary choice of J, for which the rule-of-thumb selector previously described ca be used. More geerally, pose that a prelimiary choice J pre is give, ad the a biscatter basis b s (x (of order p ca be costructed immediately o the prelimiary partitio. Implemetig a biscatter regressio usig this basis ad partitioig, the variace costat the ca be estimated usig a stadard variace estimator, such as the oe i Lemma SA Regardig the bias costat, we employ the uiform approximatio (SA-5.6 i the proof of Theorem SA-3.1. The key idea of the bias represetatio is to orthogoalize the leadig error 14

17 of the uiform approximatio based o splies with simple kots (i.e., p smoothess costraits are imposed with respect to the prelimiary biscatter basis b s (x. Specifically, the key ukow i the expressio of the leadig error is µ (p+1 (x, which ca be estimated by implemetig a biscatter regressio of order p + 1 (with the prelimiary partitio uchaged. Plug it i (SA-5.7, ad all other quatities i it ca be replaced by their sample aalogues. The a bias costat estimate is available. By this costructio, the direct-plug-i selector employs the correct rate ad a cosistet costat approximatio for ay p, s ad v. SA-5 SA-5.1 Proof Proof of Lemma SA-2.1 Proof. The first result follows by Lemma SA2 of Caloico, Cattaeo, ad Titiuik (2015. To show the secod result, first cosider the determiistic partitio sequece based o the populatio quatiles. By mea value theorem, h j = F 1( j J F 1( j 1 = J 1 f(f 1 (ξ 1 J where ξ is some poit betwee (j 1/J ad j/j. Sice f is bouded ad bouded away from zero, max 1 j J h j / mi 1 j J h j f/f. Usig the first result, we have with probability approachig oe, max 1 j J ĥj h j J 1 f 1 /2. The, max 1 j J ĥ j = max 1 j J h j + max 1 j J ĥj h j mi 1 j J ĥ j mi 1 j J h j max 1 j J ĥj h j 3 f f, ad the desired result follows. SA-5.2 Proof of Lemma SA-2.2 Proof. For s = 0, the result is trivial. For 0 < s p, b s (x is formally kow as B-splie basis of order p + 1 with kots {ˆτ 1,..., ˆτ J 1 } of multiplicities (p s + 1,..., p s + 1. See Schumaker 15

18 (2007, Defiitio 4.1. Specifically, such a basis is costructed o a exteded kot sequece {ξ j } 2(p+1+(p s+1(j 1 j=1 : ξ 1 ξ p+1 0, 1 ξ p+2+(p s+1(j 1 ξ 2(p+1+(p s+1(j 1. ad ξ p+2 ξ p+1+(p s+1(j 1 = ˆτ 1,, ˆτ }{{} 1,, ˆτ J 1,, ˆτ J 1. }{{} p s+1 p s+1 By the well-kow Recursive Relatio of splies, a typical fuctio b s,l (x i b s (x ported o (ξ l, ξ l+p+1 is expressed as bs,l (x = J l+p+1 j=l+1 C j (x1(x ξ j 1, ξ j. where each C j (x is a polyomial of degree p as the sum of products of p liear polyomials. See De Boor (1978, Sectio I, Equatio (19. Sice s p, we always have ξ l < ξ l+p+1. Thus, the port of such a basis fuctio is well defied. Specifically, all C j (x s take the followig form: M c ( 1 k,k (x ξ k C j (x =. ξ ι=1 k ξ k (k,k K ι Here, the covetio is that 0/0 = 0, M 2 p is a costat deotig the umber of summads, the cardiality of the idex pair set K s is exactly p, ad c k,k is a costat used to chage the sig of the summad. These idices may deped o j, which is omitted for otatio simplicity. As explaied previously, such a fuctio is ported o at least oe bi. We wat to liearly represet such a fuctio i terms of b 0 (x with typical elemet ϕ j,α (x = ( x ˆτj 1 α, J 1 Bj (x 1 α p, 1 j J. (SA-5.1 ĥ j Suppose without loss of geerality, ξ j 1 < ξ j ad (ξ j 1, ξ j is a cell withi the port of b s,l (x. Let c j,α be the coefficiet of ϕ j,α (x i the liear represetatio of b s (x. Usig the above results, 16

19 it takes the followig form klι,p α M (ξ j ξ j 1 α C p,α l ι=1 k=k c j,α = lι,1 (ξ j 1 ξ k ( 1 c. ι=1 k,k (ξ k ξ k (k,k K ι The quatities withi the summatio oly deped o distace betwee kots, which is o greater tha (p+1 max j ĥ j, sice the port covers at most (p+1 bis. Both deomiator ad umerator are products of p such distaces, ad hece by Lemma SA-2.1, j,α c j,α P 1. Sice each row ad colum of T s oly cotai a fiite umber of ozeros, T s P 1 ad T s P 1. Usig the fact max 1 j J ĥj h j P J 1 J log J/, give i the proof of Lemma SA-2.1, ad oticig the form c j,α, max k,l ( T s T s k,l J log J/ where ( T s T s k,l is (k, lth elemet of T s T s. Sice ( T s T s oly has a fiite umber of ozeros o every row ad colum, T T P J log J/ ad T T P J log J/. Fially, we give a explicit expressio of c j,α for the case s = p, which may be of idepedet iterest. I this case, b s (x is the usual B-splie basis with simple kots. Let b s,l (x be a typical basis fuctio ported o ˆτ l, ˆτ l+p+1. The, usig recursive formula of B-splies, by iductio we have l+p+1 bs,l (x = (ˆτ l+p+1 ˆτ l j=l (x ˆτ j p + l+p+1 k=l (ˆτ k ˆτ j, k l (SA-5.2 where (z + equal to z if z 0 ad 0 otherwise. Sice b s,l (x is zero outside of (ˆτ l, ˆτ l+p+1, b s,l (x ca be writte as a liear combiatio of ϕ j,α (x, j = l + 1,..., l + p + 1, α = 0,..., m 1: bs,l (x = p l+p+1 α=0 j=l+1 c j,α ϕ j,α (x, for some c j,α, (SA-5.3 For a geeric cell (ˆτ j 1, ˆτ j (ˆτ l, ˆτ l+p+1, all trucated polyomials (x ˆτ k p + does ot cotribute to the coefficiets of ϕ j,α (x if k > j 1. For ay l k j 1, we ca expad (x ˆτ k p + o (ˆτ j 1, ˆτ j as (x ˆτ k p = (x ˆτ j 1 + ˆτ j 1 ˆτ k p = p α=0 ( p (x ˆτ j 1 α (ˆτ j 1 ˆτ k p α. α Thus, the cotributio of (x ˆτ k p + to the coefficiets of ϕ j,α(x i Equatio (SA-5.3, combied 17

20 with its coefficiet i Equatio (SA-5.2, is ( p (ˆτ j 1 ˆτ k p α (ˆτ j ˆτ j 1 α (ˆτ l+p+1 ˆτ l α ( l+p+1 k =l k k (ˆτ k ˆτ k 1. Collectig all such coefficiets cotributed by (x ˆτ k p +, k = l,..., j, we obtai j 1 ( p c j,α = (ˆτ j 1 ˆτ k p α (ˆτ j ˆτ j 1 α (ˆτ l+p+1 ˆτ l α k=l ( l+p+1 k =l k k (ˆτ k ˆτ k 1. SA-5.3 Proof of Lemma SA-2.3 Proof. The sparsity of the basis follows by costructio. The upper boud o the maximum eigevalue of Q follows from Lemma SA-2.2, ad the quasi-uiformity property of populatio quatiles show i the proof of Lemma SA-2.1. Also, i view of Lemma SA-2.1, the lower boud o the miimum eigevalue of Q follows from Theorem 4.41 of Schumaker (2007, by which the miimum eigevalue of Q/J (the scalig factor dropped is bouded by mi 1 j J h j up to some uiversal costat. To show the boud o b (v s (x, otice that whe s = 0, for ay x ad ay j = 1,..., J(p+ 1, 0 b 0,j (x J. Defie ϕ j,α (x as i Equatio (SA-5.1. Sice j,α = ( x ˆτj 1 α v Jα(α 1 (α v + 1ĥ v j 1 Bj (x J ĥ v j, ϕ (v ĥ j the boud o b (v s (x simply follows from Lemma SA-2.1 ad Lemma SA-2.2. Now, we prove the covergece of Q. I view of Lemma SA-2.2, it suffices to show the covergece of Q whe s = 0, i.e., E b 0 (x i b 0 (x i Eb 0 (x i b 0 (x i P J log J/. By Lemma SA-2.1, with probability approach 1, rages withi a family of partitios Π. Let A deote the evet o which Π. Thus, P(A c = o(1. O A, E b 0 (x i b 0 (x i E b 0 (x i b 0 (x i E b 0 (x i ; b 0 (x i ; Eb 0 (x i ; b 0 (x i ;. Π 18

21 By the relatio betwee matrix orms, the right-had-side of the above iequality is further bouded by E b 0 (x i ; b 0 (x i ; Eb 0 (x i ; b 0 (x i ;. Π Let a kl be a geeric (k, lth etry of the matrix iside the matrix orm, i.e., a kl = E b 0,k (x i ; b 0,l (x i ; E b 0,k (x i ; b 0,l (x i ; Clearly, if b 0,k ( ; ad b 0,l ( ; are basis fuctios with differet ports, a kl is zero. Now defie the followig fuctio class G = { } x b 0,k (x; b 0,l (x; : 1 k, l J(p + 1, Π. For such a class, g G g J ad g G Vg g G Eg 2 J where the secod result follows from the fact that the ports of b 0,k ( ; ad b 0,l ( ; shrik at the rate of J 1. I additio, each fuctio i G is simply a dilatio ad traslatio of a polyomial fuctio ported o 0, 1, plus a zero fuctio, ad the umber of polyomial degree is fiite. The, by Propositio of Gié ad Nickl (2016, the collectio G of such fuctios is of VC type, i.e., there exists some costat C z ad z > 6 such that N(G, L 2 (Q, ε Ḡ L 2 (Q ( Cz 2z, ε for ε small eough where we take Ḡ = CJ for some costat C > 0 large eough. Theorem 6.1 of Belloi, Cherozhukov, Chetverikov, ad Kato (2015, E g G g(x i Eg(x i J log J + J log J, implyig that g G 1 g(x i Eg(x i P J log J/. Sice ay row or colum of the matrix (a kl oly cotais a fiite umber of ozero etries, oly 19

22 depedig o p, the above result suffices to show that E b 0 (x i b 0 (x i E b 0 (x i b 0 (x i P J log J/. Next, Let α kl be a geeric (k, lth etry of E b0 (x i b 0 (x i /J E b 0 (x i b 0 (x i /J, where by dividig the matrix by J, we drop the ormalizig costat oly for otatio simplicity. By defiitio, it is either equal to zero or ca be rewritte as α kl = =ĥj ĥ j B j ( x ˆτj 1 0 =(ĥj h j lf(xdx ( x τj lf(xdx B j h j 1 z l f(zĥj + ˆτ j dz h j z l f(zh j + τ j dz 1 0 z l f(zĥj + ˆτ j dz + h j z l( f(zĥj + ˆτ j f(zh j + τ j dz (SA-5.4 for some 1 j J ad 0 l 2p. By Assumptio SA-1 ad Lemma SA2 of Caloico, Cattaeo, ad Titiuik (2015, max 1 j J f(ˆτ j 1 ad max 1 j J ĥj h j P J 1 J log J/. Also, Lemma SA2 of Caloico, Cattaeo, ad Titiuik (2015 implies that z 0,1 max ˆτ j + zĥj (τ j + zh j P J log J/. 1 j J Sice f( is uiformly cotiuous o, the secod term i (SA-5.4 is also O P (J 1 J log J/. Agai, usig the sparsity structure of the matrix α kl, the above result suffices to show that E b 0 (x i b 0 (x i Q P J log J/. Give the above fact, it follows that Q 1 P 1. Notice that Q ad Q are baded matrices with fiite bad width. The the bouds o Q ad Q 1 Q 1 hold by Theorem 2.2 of Demko (1977. This completes the proof. SA-5.4 Proof of Lemma SA-2.4 Proof. Sice Eɛ 2 i x i = x is bouded ad bouded away from zero uiformly over x, Q Σ Q. The, by Lemma SA-2.3, 1 P λ mi ( Σ λ max ( Σ P 1. The upper boud o Ω(x immediately follows by Lemma SA-2.3. To establish the lower boud, it suffices to show if b (v s (x P J 1/2+v. For s = 0, such 20

23 a boud is trivial by costructio. For other s, we oly eed to cosider the case whe Π. Itroduce a auxiliary fuctio ρ(x = (x x 0 v /h v x 0 for ay arbitrary poit x 0, ad h x0 is the legth of B x0, the bi cotaiig x 0 i ay give partitio Π. Let {ψ j } Ks j=1 be the dual basis for B-splies b(x := b s (x; / J, which is costructed as i Theorem 4.41 of Schumaker (2007. The scalig factor J is dropped temporarily so that the defiitio of b(x is cosistet with that theorem. Sice the B-splie basis reproduce polyomials, K s J v ρ (v (x 0 = (ψ j ρ b (v s,j (x 0. j=1 For ay x 0, there are oly a fiite umber of basis fuctios i b s (x ported o B x0. By Theorem 4.41 of Schumaker (2007, for such basis fuctios b s,j (x, we have ψ j ρ ρ L Ij where I j deotes the port of b s,j (x. All poits withi such I j should be o greater tha (p + 1 max 1 j J h j ( away from x 0 where h j ( deotes the legth of the jth bi i. Hece ρ L Ij 1. The, the desired lower boud follows. The boud o Ω(x ca be established similarly. SA-5.5 Proof of Lemma SA-2.5 Proof. By Lemma SA-2.1, it suffices to establish the approximatio power of b s (x; for all Π. For v = 0, by Theorem 6.27 of Schumaker (2007, max Π mi β R Ks µ(x b s (x; β J p 1. By Huag (2003 ad Assumptio SA-1, the Lebesgue factor of splie bases is bouded. The, the boud o uiform approximatio error coicides with that for L 2 projectio error up to some uiversal costat. For other v > 0, agai, we oly eed to cosider the case whe belogs to Π. For ay Π, we ca take the L approximatio µ(x b s (x; β ( J p 1, µ (v (x b (v s (x; β ( J p 1+v for some β ( R Ks. Such a costructio exists by Lemma SA-6.1 of Cattaeo, Farrell, ad Feg (2018. The, µ (v (x b (v s (x; β µ ( µ (v (x b (v s (x; β ( + b (v s (x; (β ( β µ ( J p 1+v + b (v s (x; (β ( β µ (. By defiitio of β µ (, β µ ( β ( = Eb s (x i ; b s (x i ; 1 Eb s (x i ; r (x i ;, 21

24 where r (x i ; = µ(x i b s (x i ; β (. By Lemma SA-2.3, Eb s (x i ; b s (x i ; 1 1 uiformly over Π. Sice b s (x i ; is ported o a fiite umber of bis, Eb s (x i ; r (x i ; J p 1 1/2, ad the the desired result follows. SA-5.6 Proof of Lemma SA-2.6 Proof. Note that Q 1 E b s (x i r µ (x i ad A 2 (x := E b s (x i r µ (x i = 0. (v b s (x Q 1 E b s (x i r µ (x i = A 1 (x + A 2 (x, with A 1 (x := Defie the followig fuctio class G := (v b s (x ( Q 1 (v b s (x Q 1 E b s (x i r µ (x i. By defiitio of r µ (, we have { } x b s,l (x; r µ (x; : 1 l K s, Π. By Lemma SA-2.5, Π r µ (x; J p 1. The we have g G g J p 1+1/2, ad g G Vg J 2(p+1. I additio, ay fuctio g G ca be rewritte as g(x = b s,l (x; (µ(x b s (x; β µ ( = b s,l (x; µ(x b s,l (x; b s,k (x; β µ,k ( k+p k=k for some 1 l, k K s where β µ,k ( deotes the kth elemet i β µ (. Here, we use the sparsity property of the partitioig basis: the summad i the secod term is ozero oly if b s,l (x; ad b s,k (x; have overlappig ports. For each l, there are oly a fiite umber of such b s,k (x; fuctios. The, usig the same argumet give i the proof of Lemma SA-2.3, ( J l N(G, L 2 (Q, ε Ḡ L 2 (Q ε z for some fiite l ad z ad the evelop Ḡ = CJ p 1+1/2 for C large eough. By Theorem 6.1 of 22

25 Belloi, Cherozhukov, Chetverikov, ad Kato (2015, g G 1 log J g(x i J p 1 + J p 1+1/2 log J, ad, by Lemma SA-2.3, Q 1 Q 1 P J log J/. The, usig the boud o the basis give i Lemma SA-2.3, A 1 (x P J v J J log J A 2 (x P J v log J JJ p 1 J p 1 log J p 1+v J log J = J, ad J log J = J p 1+v. These results complete the proof. SA-5.7 Proof of Lemma SA-2.7 Proof. By Lemma SA-2.2 ad SA-2.3, b (v (x P J 1/2+v, Q 1 P 1 ad T s P 1. Defie a fuctio class G = { } (x 1, ɛ 1 b 0,l (x 1 ; ɛ 1 : 1 l J(p + 1, Π. The, g G g J ɛ 1, ad hece take a evelop Ḡ = C J ɛ 1 for some C large eough. Moreover, g G Vg 1 ad, as i the proof of Lemma SA-2.3, G is of VC-type. By Propositio 6.1 of Belloi, Cherozhukov, Chetverikov, ad Kato (2015, g G 1 log J g(x i, ɛ i P ν + J 2(ν 2 log J log J, ad the desired result follows. SA-5.8 Proof of Lemma SA-2.8 Proof. We first show the covergece of γ. We deote the (i, jth elemet of M B by M ij. The, ( 1 γ γ = 1 ( 1 M ij w i w j j=1 j=1 w i M ij (µ(x j + ɛ j 23

26 Defie V = W EW ad H = EW. The, W M B W = V M B V + H M B H + H M B V + V M B H. We have V M B V = 1 M ii v i v i + 1 M ij v i v j = 1 j i ( 1 M ii Ev i v i + O P P 1, where the peultimate equality holds by Lemma SA-1 of Cattaeo, Jasso, ad Newey (2018b ad the last by 1 M ii = Ks 1. Moreover, H M B H 0, ad H M B V coditioal o ad by Lemma SA-1 of Cattaeo, Jasso, ad Newey (2018b, H M P V 1 ( ( H H 1/2 P trace = op (1, F has mea zero where F deotes Frobeius orm. Therefore, we coclude that W M B W 1 + o P (1. O the other had, 1 j=1 w im ij ɛ j has mea zero with variace of order O(1/ by Lemma SA-2 of Cattaeo, Jasso, ad Newey (2018b. I additio, as i Lemma 2 of Cattaeo, Jasso, ad Newey (2018a, let G = (µ(x 1,..., µ(x ad ote that W M B G = H M B G + V M B G ( H trace M B H ( G trace M B G P J ς (p+1 J p 1 + J p ( G M B G 1/2 The, the first result follows from the rate restrictios imposed. To show the secod result, ote that by Lemma SA-2.2 ad SA-2.3, b (v s (x P J 1/2+v, Q 1 P 1 ad T s P 1. E b 0 (x i w i is a J(p + 1 d matrix, ad ca be decomposed as follows: E b 0 (x i w i = E b 0 (x i Ew i x i + E b 0 (x i (w i Ew i x i. 24

27 By the argumet i the proof of Lemma SA-2.3 ad the coditios that Ew l,i x i = x 1 ad J log J = o(1, E b 0 (x i Ew i x i P J 1/2. Regardig the secod term, ote that it is a mea zero sequece, ad for the lth covariate i w, l = 1,..., d, 1 b(v V s (x Q 1 E b s (x i (w l,i Ew i,l x i (v b s (x Q 1 E b s (x i b s (x i Vw l,i x i Q 1 b(v s (x J 1+2v. Thus the secod result follows by Markov s iequality. Now pose J ν 2 ν log J 1 also holds. Usig the argumet give i Lemma SA-2.7 ad the assumptio that E w l,i ν x i = x 1 for all l, we have E b s (x i (w l,i Ew i,l x i P log J/. Thus, the last result follows. SA-5.9 Proof of Lemma SA-2.9 Proof. Noticig that µ (v (x µ (v (x = b (v s (x Q 1 E b s (x i ɛ i + ( b(v s (x βµ µ (v (x b (v s (x Q 1 E b s (x i r µ (x i + (v b s (x Q 1 E b s (x i w i( γ γ. (SA-5.5 The the result follows by Lemma SA-2.5, SA-2.6, SA-2.7 ad SA-2.8. SA-5.10 Proof of Lemma SA-2.10 Proof. Sice ɛ i := y i b s (x i β w i γ = ɛ i + µ(x i b s (x i β w i ( γ γ =: ɛ i + u i, we ca write E b s (x i b s (x i ɛ 2 i Eb s (x i b s (x i σ 2 (x i =E b s (x i b s (x i u 2 i + 2E b s (x i b s (x i u i ɛ i + E b s (x i b s (x i (ɛ 2 i σ 2 (x i ( + E b s (x i b s (x i σ 2 (x i E b s (x i b s (x i σ 2 (x i =:V 1 + V 2 + V 3 + V 4. Now we boud each term i the followig. 25

28 Step 1: For V 1, we further write u i = (µ(x i b s (x i β w i ( γ γ =: u i1 u i2. The V 1 = E b s (x i b s (x i (u 2 i1 + u 2 i2 2u i1 u i2 =: V 11 + V 12 V 13. Sice 2E b s (x i b s (x i u i1 u i2 E b s (x i b s (x i (u 2 i1 +u2 i2, it suffices to boud V 11 ad V 12. For V 11, V 11 max u i1 2 E b s (x i b s (x i J log J P 1 i + J 2(p+1 where the last iequality holds by Lemma SA-2.3 ad SA-2.9. O the other had, V 12 = E bs (x i b s (x i ( d wil 2 ( γ l γ l 2 + w il w il ( γ l γ( γ l γ l l l E bs (x i b s (x i ( d wil 2 ( γ l γ l 2 l l by CR-iequality. By Lemma SA-2.8, γ γ 2 = o P (J/. The it suffices to show that for every l = 1,..., d, E b s (x i b s (x i w 2 il P 1. Uder the coditios give i the theorem, this boud ca be established usig the argumet that will be give i Step 3 ad 4. Step 2: For V 2, we have V 2 = 2E b s (x i b s (x i ɛ i (u i1 u i2 =: V 21 V 22. The, ( V 21 max u E i1 b s (x i b s (x i + 1 i E b s (x i b s (x i ɛ 2 i P J log J + J p 1 where the last step follows from Lemma SA-2.3 ad the result give i the ext step. I additio, V 22 = 2E b s (x i b s (x i ɛ i d w il ( γ l γ l. The, sice 2E b s (x i b s (x i ɛ i w il E b s (x i b s (x i (ɛ 2, i +w2 il the result ca be established usig the strategy give i the ext step. Step 3: For V 3, i view of Lemma SA-2.1 ad SA-2.2, it suffices to show that l=1 E b 0 (x i ; b 0 (x i ; (ɛ 2 i σ 2 J log J (x i P. Π ν 2 ν 26

29 For otatioal simplicity, we write η i = ɛ 2 i σ2 (x i, η i = η i 1( η i M Eη i 1( η i M x i, η + i = η i 1( η i > M Eη i 1( η i > M x i for some M > 0 to be specified later. Sice Eη i x i = 0, η i = η i + η + i. The defie a fuctio class G = { } (x 1, η 1 b 0,l (x 1 ; b 0,k (x 1 ; η 1 : 1 l J(p + 1, 1 k J(p + 1, Π. The for g G, g(x 1, η 1 = g(x 1, η g(x 1, η 1. Now, for the trucated piece, we have g G g(x 1, η1 JM, ad Vg(x 1, η1 g G Eη1 x 2 1 = x Π JM E 1 l,k J(p+1 η 1 x i = x JM. Eb 2 0,l (x 1; b 2 0,k (x 1; The VC coditio holds by the same argumet give i the proof of Lemma SA-2.3. The usig Propositio 6.2 of Belloi, Cherozhukov, Chetverikov, ad Kato (2015, E g G E g(x i, ηi JM log(jm + JM log(jm. Regardig the tail, we apply Theorem of va der vaart ad Weller (1996 ad obtai E g G E g(x i, η i + 1 J log JE E η + i 2 1 J log J(E max 1 i η+ i 1/2 (EE η + i 1/2 J log J 1 ν M (ν 2/4 where the secod lie follows from Cauchy-Schwarz iequality ad the third lie uses the fact that E max 1 i η+ i E max 1 i ɛ2 i 2/ν, EE η + i E η 1 + E ɛ ν M (ν 2/2. ad The the desired result follows simply by settig M = J 2 ν 2 ad the sparsity of the basis. Step 4: For V 4, sice by Assumptio SA-1, Eɛ 2 i x i = x 1. The, by the same 27

30 argumet give i the proof of Lemma SA-2.3, E b s (x i ; b s (x i ; σ 2 (x i E b s (x i ; b s (x i ; ɛ 2 i P J log J/, ad Π E bs (x i b s (x i ɛ 2 i E b s (x i b s (x i ɛ 2 i P J log J/. The the proof is complete. SA-5.11 Proof of Theorem SA-3.1 Proof. The proof is divided ito several steps. Step 1: We rely o the decompositio (SA-5.5. By Lemma SA-2.8, the variace of the last term is of smaller order, ad thus it suffices to characterize the coditioal variace of A(x := b (v s (x Q 1 E b s ɛ i. By Lemma SA-2.3, VA(x ω(xdx = 1 trace ( Q 1 ΣQ 1 b (v s (x b (v s (x ω(xdx + o P ( J 1+2v. I fact, usig the argumet give i the proof of Lemma SA-2.3, we also have b (v s (x b (v (x ω(xdx s b (v s (xb (v s (x ω(xdx = o P (J 2v, ad sice σ 2 (x ad ω(x are bouded ad bouded away from zero, V (p, s, v = J (1+2v trace (Q 1 ΣQ 1 b (v s (xb (v s (x ω(xdx 1. Step 2: By decompositio (SA-5.5, E µ (v (x, W µ (v (x = b s (x ( b(v Q 1 E b s (x i r µ (x i + s (x βµ µ (v (x b (v s (x Q 1 E b s (x i w ie( γ γ, W =: B 1 (x + B 2 (x + B 3 (x. By Lemma SA-2.6, B 1(x 2 ω(xdx = o P (J 2p 2+2v. By Lemma SA-2.8, B 3(x 2 ω(xdx = o P (J 2p 2+2v. By Lemma SA-2.5, B 2(x 2 ω(xdx P J 2p 2+2v. By Cauchy-Schwarz iequal- 28

31 ity, we ca safely igore the itegrals of those cross-product terms i the IMSE expasio, ad thus the leadig term i the itegrated squared bias is J 2p+2 2v ( b(v s (x βµ µ (v (x 2ω(xdx P 1. The, by Lemma SA-6.1 of Cattaeo, Farrell, ad Feg (2018, for s = p, µ(v (x b (v p (x β ( µ(p+1 (x v!ĥp+1 v x (p + 1 E p+1 v ( x ˆτ L x ĥ x = o P (J (p+1 v (SA-5.6 where for each m Z +, E m ( is the mth Beroulli polyomial, ˆτ L x is the start of the (radom iterval i cotaiig x ad ĥx deotes its legth. Note that whe s < p, b p (x β is still a elemet i the space spaed by b s (x. I other words, it provides a valid approximatio of µ (v (x i the larger space i terms of -orm. The it follows that b (v s (x βµ µ (v (x b (v = s (x ( 1E E b s (x i b s (x i b s (x i µ(x i µ (v (x (v = b s (x ( 1E E b s (x i b s (x i b s (x i µ(p+1 (x i ( ĥ p+1 xi ˆτ L x x (p + 1! i E i p+1 ĥ xi µ(p+1 (x ( x ˆτ v!ĥp+1 v L x E x p+1 v + o P (J p 1+v (p + 1 ĥ x = J p 1 b(v s (x Q 1 µ T s E b (p+1 (x i ( 0 (x i (p + 1!f(x i p+1 E xi ˆτ L x i p+1 ĥ xi J p 1+v µ (p+1 (x ( x ˆτ L (p + 1 v!f(x p+1 v E x p+1 v + o P (J p 1+v (SA-5.7 ĥ x where the last step uses Lemma SA-2.1-SA-2.3, ad o P ( i the above is uderstood i terms of -orm over x. Takig itegral of the squared bias ad usig Assumptio SA-1 ad Lemma SA-2.1-SA-2.3 agai, we have three leadig terms: ( J p 1+v µ (p+1 (x ( x ˆτ L 2 M 1 (x := (p + 1 v!f(x p+1 v E x p+1 v ω(xdx ĥ x = J 2p 2+2v E 2p+2 2v µ (p+1 (x 2ω(xdx + op (2p + 2 2v! f(x p+1 v (J 2p 2+2v, ( M 2 (x :=J 2p 2 b (v s (x Q 1 µ (p+1 (x i ( T s E b0 (x i (p + 1!f(x i p+1 E xi ˆτ x L 2 i p+1 ω(xdx ĥ xi 29

32 =J 2p 2 ξ 0,f T sq 1( b (v s (xb (v s (x ω(xdx Q 1 T s ξ 0,f + o P (J 2p 2+2v, {( M 3 (x :=J 2p 2+v b (v s (x Q 1 µ (p+1 (x i ( T s E b0 (x i (p + 1!f(x i p+1 E xi ˆτ L x i p+1 ĥ xi µ (p+1 (x ( x ˆτ L } (p + 1 v!f(x p+1 v E x p+1 v ω(xdx ĥ x =J 2p 2+v ξ 0,f T sq 1 T s ξ v,ω + o P (J 2p 2+2v, where E 2p+2 2v is the (2p + 2 2vth Beroulli umber, ad for a weightig fuctio λ( (which ca be replaced by f( ad ω( respectively, we defie ξ v,λ = b (v 0 (x µ (p+1 (x ( x τ L (p + 1 v!f(x p+1 v E x p+1 v λ(xdx. h x τ x ad h x are defied the same way as ˆτ x ad ĥx, but with respect to, the partitio based o populatio quatiles. Therefore, the leadig terms ow oly rely o the o-radom partitio as well as other determiistic fuctios, which are simply equivalet to the leadig bias if we repeat the above derivatio but set =. The the proof is complete. SA-5.12 Proof of Corollary SA-3.1 Proof. The proof is divided ito two steps. Step 1: Cosider the special case i which s = 0. V (p, 0, v depeds o three matrices: Q, Σ ad b(v 0 (xb(v 0 (x ω(xdx. Importatly, they are block diagoal with fiite block sizes, ad the basis fuctios that form these matrices have local ports. The by cotiuity of ω(x, f(x ad σ 2 (x, these matrices ca be further approximated: Q = QD f + o P (1, Σ = QD σ 2 f + o P (1, ad b (v 0 (xb(v 0 (x ω(xdx = Q v D ω + o P (J 2v where ˇQ = b 0 (xb 0 (x dx, ˇQv = b (v 0 (xb(v 0 (x dx, D f = diag{f(ˇx 1,..., f(ˇx J(p+1 }, D σ 2 f = diag{σ 2 (ˇx 1 f(ˇx 1,..., σ 2 (ˇx J(p+1 f(ˇx J(p+1 }, ad D ω = diag{ω(ˇx 1,..., ω(ˇx J(p+1 }. 30

33 o P ( i the above equatios meas the operator orm of matrix differeces is o P (, ad for l = 1,..., J(p + 1, each ˇx l is a arbitrary poit i the port of b 0,l (x. For simplicity, we choose these poits such that x l = x l if b 0,l ( ad b 0,l ( have the same port. Therefore we have VA(x ω(xdx = 1 trace ( D σ 2 ω/f Q 1 Qv ( J 1+2v + o P where D σ 2 ω/f = diag{σ 2 (ˇx 1 ω(ˇx 1 /f(ˇx 1,..., σ 2 (ˇx J(p+1 ω(ˇx J(p+1 /f(ˇx J(p+1 }. Fially, by chage of variables, we ca write Q 1 Qv as aother block diagoal matrix Q = diag{ Q 1,..., Q J } where the lth block Q l, l = 1,..., j, ca be writte as ( Q l = ψ(zψ(z dz ψ (v (zψ (v (z dz 0 0 where ψ(z = (1, z,..., z p. Employig Lemma SA-2.1 ad lettig the trace coverge to the Riema itegral, we coclude that VA(x ω(xdx = J 1+2v ( J 1+2v V (p, 0, v + o P. {( where V (p, 0, v := trace 0 ψ(zψ(z dz 0 ψ(v (zψ (v (z dz} σ2 (xf(x 2v ω(xdx. Step 2: Now cosider the special case i which s = 0. By Lemma A.3 of Cattaeo, Farrell, ad Feg (2018, we ca costruct a L approximatio error r (v (x; := µ (v (v (x b 0 (x β ( = µ(p+1 (x v!ĥp+1 v x (p + 1 B p+1 v ( x ˆτ L x ĥ x + o P (J (p+1 v where for each m Z +, ( 2m m Bm ( is the mth shifted Legedre polyomial o 0, 1, ˆτ L x is the start of the (radom iterval i cotaiig x ad ĥx deotes its legth. I additio, max E b 0,j (xr (x; 1 j J(p+1 = max b0,j (xr (x; f(xdx 1 j J(p+1 = max 1 j J(p+1 ˆτ L x +ĥx ˆτ L x b0,j (xr (x; f(ˆτ x L dx + o P (J p 1 1/2 31

34 = max 1 j J(p+1 = o P (J p 1 1/2 f(ˆτ x L µ(p+1 (xj p 1 (p + 1! ˆτ L x +ĥx ˆτ L x ( x ˆτ L b0,j (xb x p+1 dx + o P (J p 1 1/2 ĥ x where the last lie follows by chage of variables ad the orthogoality of Legedre polyomials. Thus r (x; is approximately orthogoal to the space spaed by b(x. Immediately, we have E b(x; r (x; = o P (J p 1. Sice E b 0 (xr µ (x; = 0, E b(x(r µ (x; r (x; = E b(x b(x (β ( β µ ( = o P (J p 1. By Lemma SA-2.3, λ mi (E b 0 (x i b 0 (x i P 1, ad thus β ( β µ ( = o P (J p 1. The, ( b(v 0 (x (β( 2ω(xdx β ( ( λ max b (v (v 0 (x b 0 (x ω(xdx β( β ( 2 = o P (J 2p 2+2v. Therefore, we ca represet the leadig term i the itegrated squared bias by L approximatio error: B 2(x 2 ω(xdx = (µ(v (x b (v (x β ( 2 ω(xdx + o P (J 2p 2+2v. Fially, usig the results give i Lemma SA-2.1, chage of variables ad the defiitio of Riema itegral, we coclude that ( E µ (v (x, W µ (v (x 2ω(xdx = J 2(p+1 v B(p, 0, v + o P (J 2p 2+2v where B(p, 0, v = 1 0 B p+1 v(z 2 dz ((p + 1 v! 2 µ (p+1 (x 2 ω(xdx. f(x 2p+2 2v The the proof is complete. 32

35 SA-5.13 Proof of Theorem SA-3.2 Proof. By Lemma SA-2.5-SA-2.8, We first show Ω(x 1/2 b(v s (x Q 1 G b s (x i ɛ i =: G a i ɛ i is asymptotically ormal. Coditioal o, it is a mea zero idepedet sequece over i with variace equal to 1. The by Berry-Essee iequality, ( P(G a i ɛ i u Φ(u mi 1, u R E a iɛ i 3 3/2. Now, usig Lemma SA-2.3 ad SA-2.4, 1 3/2 E a i ɛ i 3 Ω(x 3/2 1 3/2 Ω(x 3/2 1 3/2 E b (v s (x Q 1 bs (x i ɛ i 3 b (v s (x Q 1 bs (x i 3 Ω(x 3/2 z b (v s (x Q 1 bs (z 3/2 1 J 1+v P J 1+2v 0 J 3/2+3v b (v s (x Q 1 bs (x i 2 sice J/ = o(1. By Lemma SA-2.10, the above weak covergece still holds if Ω(x is replaced by Ω(x. Now, the desired result follows by Lemma SA-2.5, SA-2.6 ad SA-2.8. SA-5.14 Proof of Corollary SA-3.2 Proof. Note that for a give p, by Theorem SA-3.1, J IMSE 1 2p+3. The, for (p + qth-order biscatter estimator, J 2p 2q 3 IMSE = o(1 ad J 2 IMSE log2 J IMSE = o(1. The the coclusio of Theorem SA-3.2 holds for the (p + qth-order biscatter estimator. The the result immediately follows. SA-5.15 Proof of Theorem SA-3.3 Proof. The proof is divided ito several steps. 33

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak