Inference under shape restrictions

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Inference under shape restrictions

Kernel density estimator

Efficient GMM LECTURE 12 GMM II

Lecture 19: Convergence

11 THE GMM ESTIMATION

Statistical Inference Based on Extremum Estimators

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Rank tests and regression rank scores tests in measurement error models

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Summary. Recap ... Last Lecture. Summary. Theorem

Topic 9: Sampling Distributions of Estimators

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

32 estimating the cumulative distribution function

Properties and Hypothesis Testing

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Empirical Processes: Glivenko Cantelli Theorems

Introductory statistics

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Frequentist Inference

Output Analysis and Run-Length Control

Chapter 6 Infinite Series

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 33: Bootstrap

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

5. Likelihood Ratio Tests

Inference under shape restrictions

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Chapter 6 Principles of Data Reduction

1 Introduction to reducing variance in Monte Carlo simulations

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Last Lecture. Wald Test

Stat 421-SP2012 Interval Estimation Section

Sequences and Series of Functions

Asymptotic distribution of the first-stage F-statistic under weak IVs

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Optimally Sparse SVMs

Lecture 2. The Lovász Local Lemma

Infinite Sequences and Series

Supplemental Material: Proofs

Statistics 511 Additional Materials

1 Review and Overview

1 Inferential Methods for Correlation and Regression Analysis

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Ω ). Then the following inequality takes place:

Chi-Squared Tests Math 6070, Spring 2006

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Rates of Convergence by Moduli of Continuity

Lecture 3 The Lebesgue Integral

Estimation for Complete Data

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Mathematical Statistics - MS

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Statistical inference: example 1. Inferential Statistics

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Lecture 2: Monte Carlo Simulation

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Accuracy Assessment for High-Dimensional Linear Regression

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 11 October 27

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

MA131 - Analysis 1. Workbook 3 Sequences II

REGRESSION WITH QUADRATIC LOSS

Basis for simulation techniques

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Expectation and Variance of a random variable

Math Solutions to homework 6

S1 Notation and Assumptions

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

Measure and Measurable Functions

STA Object Data Analysis - A List of Projects. January 18, 2018

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MAT1026 Calculus II Basic Convergence Tests for Series

Math 61CM - Solutions to homework 3

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

6 Sample Size Calculations

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

Estimation of a population proportion March 23,

Fall 2013 MTH431/531 Real analysis Section Notes

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Output Analysis (2, Chapters 10 &11 Law)

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Statistical Pattern Recognition

Stochastic Simulation

Department of Mathematics

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Transcription:

Iferece uder shape restrictios Joachim Freyberger Brado Reeves July 3, 207 Abstract We propose a uiformly valid iferece method for a ukow fuctio or parameter vector satisfyig certai shape restrictios. The method applies very geerally, amely to a wide rage of fiite dimesioal ad oparametric problems, such as regressios or istrumetal variable estimatio, to both kerel or series estimators, ad to may differet shape restrictios. A major applicatio of our iferece method is to costruct uiform cofidece bads for a ukow fuctio of iterest. Our cofidece bads are asymptotically equivalet to stadard urestricted cofidece bads if the true fuctio strictly satisfies all shape restrictios, but they ca be much smaller if some of the shape restrictios are bidig or close to bidig. We illustrate these sizable width gais as well as the wide applicability of our method i Mote Carlo simulatios ad i a empirical applicatio. Keywords: Shape restrictios, iferece, oparametric, uiform cofidece bads. We thak Richard Bludell, Iva Caay, Bruce Hase, Joel Horowitz, Philipp Ketz, Matt Maste, Fracesca Moliari, Taisuke Otsu, Jack Porter, Azeem Shaikh, Xiaoxia Shi, Alex Torgovitsky, Daiel Wilhelm, ad semiar particpats at UW Madiso, UCL, LSE, Bosto College, Northwester Uiversity, ad Humboldt Uiversity for helpful commets ad discussios. We also thak Richard Bludell, Joel Horowitz, ad Matthias Parey for sharig their data. Departmet of Ecoomics, Uiversity of Wiscosi - Madiso. Email: jfreyberger@ssc.wisc.edu. Departmet of Ecoomics, Uiversity of Wiscosi - Madiso. Email: breeves@wisc.edu.

Itroductio Researchers ca ofte use either parametric or oparametric methods to estimate the parameters of a model. Parametric estimators have favorable properties, such as good fiite sample precisio ad fast rates of covergece, ad it is usually straightforward to use them for iferece. However, parametric models are ofte misspecified. Specifically, ecoomic theory rarely implies a particular fuctioal form, such as a liear or quadratic demad fuctio, ad coclusios draw from a icorrect parametric model ca be misleadig. Noparametric methods, o the other had, do ot impose strog fuctioal form assumptios, but as a cosequece, cofidece itervals obtaied from them are ofte much wider. I this paper we explore shape restrictios i order to restrict the class of fuctios but without imposig arbitrary parametric assumptios. Shape restrictios are ofte reasoable assumptios, such as assumig that the retur to eductio is positive, ad they ca be implied by ecoomic theory. For example, demad fuctios are geerally mootoically decreasig i prices, cost fuctios are mootoe icreasig, homogeeous of degree, ad cocave i iput prices, Egel curves of ormal goods are mootoically icreasig, ad utility fuctios of risk averse agets are cocave. There is a log history of estimatio uder shape restrictios i ecoometrics ad statistics, goig back to Hildreth 954 ad Bruk 955, ad obtaiig shape restricted estimators is simple i may settigs. Moreover, shape restricted estimators ca have much better fiite sample properties, such as lower mea squared errors, compared to urestricted estimators. Oe would therefore hope that the improved fiite sample precisio traslates to smaller cofidece sets. Usig shape restrictios for iferece is much more complicated tha simply obtaiig a restricted estimator. The mai reaso is that the distributio of the restricted estimator depeds o where the shape restrictios bid, which is ukow a priori. I this paper we propose a uiformly valid iferece method, which icorporates shape restrictios ad ca be used to test hypotheses about a ukow fuctio or parameter vector. The method applies very geerally, amely to a wide rage of fiite dimesioal ad oparametric problems, such as regressios or istrumetal variable estimatio, to both kerel or series estimators, ad to may differet shape restrictios. Oe major applicatio of our iferece method is to costruct uiform cofidece bads for a fuctio. Such a bad cosists of a lower boud fuctio ad a upper boud fuctio such that the true fuctio is betwee them with at least a pre-specified probability. Our cofidece bads have desirable properties. I particular, they are asymptotically equivalet to stadard urestricted cofidece bads if the true fuctio strictly satisfies all shape 2

restrictios e.g. if the true fuctio is strictly icreasig but the shape restrictio is that it is weakly icreasig. However, if for the true fuctio some of the shape restrictios are bidig or close to bidig, the cofidece bads are geerally much smaller. decrease i the width reflects the icreased precisio of the costraied estimator. Moreover, the bads always iclude the shape restricted estimator of the fuctio ad are therefore ever empty. Fially, the proposed method provides uiformly valid iferece over a large class of distributios, which i particular implies that the cofidece bads do ot suffer from uder-coverage if some of the shape restrictios are close to bidig. These cases are empirically relevat. For example, demad fuctios are likely to be strictly decreasig, but oparametric estimates are ofte ot mootoe, suggestig that the demad fuctio is close to costat for some prices. To the best of our kowledge, eve i a regressio model uder mootoicity, there are o existig uiform cofidece bads, which are ever empty, uiformly valid, ad yield width reductios whe the shape restrictios are bidig or close to bidig. Furthermore, our method applies very geerally. For example, our paper is the first to provide such iferece results for the oparametric istrumetal variables NPIV model uder geeral shape costraits. Similar to may other iferece problems i ostadard settigs, istead of tryig to obtai cofidece sets directly from the asymptotic distributio of the estimator, our iferece procedure is based o test iversio. 2 The This meas that we start by testig the ull hypothesis that the true parameter vector θ 0 is equal to some fixed value θ. I series estimatio θ 0 represets the coefficiets i the series approximatio of a fuctio ad θ 0 ca therefore grow i dimesio as the sample size icreases. The major advatage of the test iversio approach is that uder the ull hypothesis we kow exactly which of the shape restrictios are bidig or close to bidig. Therefore, uder the ull hypothesis, we ca approximate the distributio of the estimator i large samples ad we ca decide whether or ot we reject the ull hypothesis. We ca the collect all values for which we do ot reject, which form a cofidece set for θ 0. To obtai uiform cofidece bads, or cofidece sets for other fuctios of θ 0, we project o the cofidece set for θ 0 see Sectio 2 for a simple illustratio. We choose the test statistic i such a way that our cofidece bads are asymptotically equivalet to stadard urestricted cofidece bads if θ 0 is sufficietly i the iterior of the parameter space. Thus, i this case, the cofidece bads have the right coverage asymptotically. If Aalogously to may other papers, closeess to the boudary is relative to the sample size. 2 Other ostadard iferece settigs iclude autoregressive models e.g. Mikusheva 2007, weak idetificatio e.g. Adrews ad Cheg 202, ad partial idetificatio e.g. Adrews ad Soares 200. 3

some of the shape restrictios are bidig or close to bidig, our iferece procedure will geerally be coservative due to the projectio. However, i these cases we also obtai very sizable width gais compared to a stadard urestricted bad. Furthermore, due to test iversio ad projectios, our iferece method ca be computatioally demadig. We give a sese of the computatioal costs i Sectio 6. We also briefly describe recet computatioal advaces, which might help to mitigate these costs. I Mote Carlo simulatios we costruct uiform cofidece bads i a series regressio framework as well as i the NPIV model uder a mootoicity costrait. I the NPIV model the gais of usig shape restrictios are geerally much higher. For example, we show that with a fourth order polyomial approximatio of the true fuctio, the average width gais ca be up to 73%, depedig o the slope of the true fuctio. We also provide a empirical applicatio, where we estimate demad fuctios for gasolie, subject to the fuctios beig weakly decreasig. The width gais from usig shape restrictios are betwee 25% ad 45% i this settig. We ow explai how our paper fits ito the related literature. There is a vast literature o estimatio uder shape restrictios goig back to Hildreth 954 ad Bruk 955 who suggest estimators uder cocavity ad mootoicity restrictios, respectively. Other related work icludes, amog may others, Mukerjee 988, Dierckx 980, Ramsay 988, Mamme 99a, Mamme 99b, Mamme ad Thomas-Aga 999, Hall ad Huag 200, Haag, Hoderlei, ad Pedakur 2009, Du, Parmeter, ad Racie 203, ad Wag ad She 203. See also Delecroix ad Thomas-Aga 2000 ad Hederso ad Parmeter 2009 for additioal refereces. May of the early papers focus o implemetatio issues ad subsequet papers discuss rates of covergece of shape restricted estimators. May iferece results, such as those by Mamme 99b, Groeeboom, Jogbloed, ad Weller 200, Dette, Neumeyer, ad Pilz 2006, Birke ad Dette 2007, ad Pal ad Woodroofe 2007 are for poits of the fuctio where the shape restrictios do ot bid. It is also well kow that a shape restricted estimator has a ostadard distributio if the shape restrictios bid; see for example Wright 98 ad Geyer 994. Freyberger ad Horowitz 205 provide iferece methods i a partially idetified NPIV model uder shape restrictios with discrete regressors ad istrumets. Empirical applicatios iclude Matzki 994, Lewbel 995, Ait-Sahalia ad Duarte 2003 ad Bludell, Horowitz, ad Parey 202, 207. There is also a iterestig literature o risk bouds e.g. Zhag 2002, Chatterjee, Gutuboyia, ad Se 205, Chetverikov ad Wilhelm 207 showig, amog others, that a restricted estimator ca have a faster rate of covergece tha a urestricted estimator whe the 4

true fuctio is close to the boudary. Specifically, the results i Chetverikov ad Wilhelm 207 imply that a mootoe estimator i the NPIV settig does ot suffer from a slow rate of covergece due to the ill-posed iverse problem if the true fuctio is close to costat. There is also a large, less related literature o testig shape restrictios. Usig existig methods, uiform cofidece bads uder shape restrictios ca be obtaied i three distict ways. First, oe could obtai a stadard urestricted cofidece bad ad itersect it with all fuctios which satisfy the shape restrictios see for example Dümbge 998, 2003. A drawback of the resultig bads is that they ca be empty with positive probability ad hece, they do ot satisfy the reasoableess property of Müller ad Norets 206. Furthermore, i our simulatios, our bads are o average much arrower tha such mootoized bads. The secod possibility is to use the rearragemet approach of Cherozhukov, Feradez-Val, ad Galicho 2009, which works with mootoicity restrictios ad is very easy to implemet. However, the average width does ot chage by rearragig the bad. Fially, oe could use a two step procedure recetly suggested by Horowitz ad Lee 207 i a kerel regressio framework with very geeral costraits. I the first step, they estimate the poits where the shape restrictios bid. I the secod step, they estimate the fuctio uder equality costraits ad hece, they obtai a asymptotically ormally distributed estimator, which they ca use to obtai uiform cofidece bads. While their approach is computatioally much simpler tha ours, their mai result leads to bads which ca suffer from uder-coverage if some of the shape restrictios are close to bidig. They also suggest usig a bias correctio term to improve the fiite sample coverage probability, but they do ot provide ay theoretical results for this method. Cherozhukov, Newey, ad Satos 205 develop a geeral testig procedure, which allows, amog others, testig shape restrictios ad obtaiig cofidece regios for fuctioals uder shape restrictios. Eve though there is some overlap i the settigs where both methods apply, the techical argumets are very differet. Their method is also based o test iversio ad it is robust to partial idetificatio, but it is restricted to coditioal momet models ad series estimators. We allow for a geeral setup ad estimators, but we assume poit idetificatio. Sice our focus is o testig a growig parameter vector, we are able to obtai uiform cofidece bads ext to cofidece sets for fuctioals. However, if the mai object of iterest is a sigle fuctioal, their approach might be computatioally simpler because they test fixed values of the fuctioal directly, rather tha projectig o a cofidece set for the etire parameter vector. Fially, our paper builds o previous work o iferece i ostadard problems, most 5

importatly the papers of Adrews 999, 200 o estimatio ad testig whe a parameter is o the boudary of the parameter space. The mai differece of our paper to Adrews work is that we allow testig for a growig parameter vector while Adrews cosiders a vector of a fixed dimesio. Moreover, we show that our iferece method is uiformly valid whe the parameters ca be either at the boudary, close to the boudary, or away from the boudary. We also use differet test statistics because we ivert them to obtai cofidece bads. Thus, while the geeral approach is similar, the details of the argumets are very differet. Ketz 207 has a similar setup as Adrews but allows for certai parameter sequeces that are close to the boudary uder o-egativity costraits. Outlie: The remaider of the paper is orgaized as follows. We start by illustratig the most importat features of our iferece approach i a very simple example. Sectio 3 discusses a geeral settig, icludig high level assumptios for uiformly valid iferece. Sectios 4 ad 5 provide low level coditios i a regressio framework for both series ad kerel estimatio ad the NPIV model, respectively. The remaiig sectios cotai Mote Carlo simulatios, the empirical applicatio, ad a coclusio. Proofs of the results from Sectios 4 ad 5, computatioal details, ad additioal simulatio results are i a supplemetary appedix with sectio umbers S., S.2, etc.. Notatio: For ay matrix A, A deotes the Frobeius orm. For ay square matrix A, A S = sup x = Ax deotes the spectral orm. For a positive semi-defiite matrix Ω ad a vector a let a Ω = a Ωa. Let λ mi A ad λ max A deote the smallest ad the largest eigevalue of a symmetric square matrix A. For a sequece of radom variables X ad a class of distributios P we say that X = o p ε uiformly over P P if sup P P P X δε 0 for ay δ > 0. We say that X = O p ε uiformly over P P if for ay δ > 0 there are M δ ad N δ such that sup P P P X M δ ε δ for all N δ. 2 Illustrative example We ow illustrate the mai features of our method i a very simple example. We the explai how these ideas ca easily be geeralized before itroducig the geeral setup i Sectio 3. Suppose that X Nθ 0, I 2 2 ad that we observe a radom sample {X i } i= of X. Deote the sample average by X. We are iterested i estimatig θ 0 uder the assumptio that θ 0, θ 0,2. A urestricted estimator of θ 0, deoted by ˆθ ur, is ˆθ ur = arg mi θ R 2 θ X 2 + θ 2 X 2 2. 6

Hece ˆθ ur = X. Aalogously, a restricted estimator is ˆθ r = arg mi θ R 2 : θ θ 2 θ X 2 + θ 2 X 2 2 = arg mi θ R 2 : θ θ 2 0 = arg mi θ R 2 : θ θ 2 0 θ ˆθ ur 2 θ θ 0 ˆθ ur θ 0 2. Let λ = θ θ 0. From a chage of variables it the follows that ˆθr θ 0 = arg mi λ R 2 : λ λ 2 θ 0,2 θ 0, Let Z N0, I 2 2. Sice ˆθ ur θ 0 N0, I 2 2 we get λ ˆθ ur θ 0 2. ˆθr θ 0 d = arg mi λ R 2 : λ λ 2 θ 0,2 θ 0, λ Z 2, where d = meas that the radom variables o the left ad right side have the same distributio. Notice that while the distributio of ˆθ ur θ 0 does ot deped o θ 0 ad, the distributio of ˆθ r θ 0 depeds o θ 0,2 θ 0,, which measures how close θ 0 is to the boudary of the parameter space relative to. We deote a radom variable which has the same distributio as ˆθ r θ 0 by Z θ 0. As a example, suppose that θ 0, = θ 0,2. The Z θ 0 is the projectio of Z o the set {z R 2 : z z 2 }. A 95% cofidece regio for θ 0 usig the urestricted estimator ca be costructed by fidig the costat c ur such that P max{ Z, Z 2 } c ur = 0.95. It the follows immediately that P ˆθ ur, c ur θ 0, ˆθ ur, + c ur ad ˆθ ur,2 c ur θ 0,2 ˆθ ur,2 + c ur = 0.95. Thus CI ur = { θ R 2 : ˆθur, c ur θ ˆθ ur, + c ur ad ˆθ ur,2 c ur θ 2 ˆθ ur,2 + c } ur is a 95% cofidece set for θ 0. While there are may differet 95% cofidece regios for θ 0, rectagular regios are particularly easy to report especially i larger dimesios, because oe oly has to report the extreme poits of each coordiate. 7

Similarly, ow lookig at the restricted estimator, for each θ R 2 let c r, θ be such that ad defie CI r as { θ R 2 : θ θ 2, ˆθ r, c r,θ Agai, by costructio P θ 0 CI r = 0.95. P max{ Z, θ, Z,2 θ } c r, θ = 0.95. θ ˆθ r, + c r,θ, ˆθ r,2 c r,θ θ 2 ˆθ r,2 + c } r,θ. Figure illustrates the relatio betwee c ur ad c r, θ. The first pael shows a radom sample of Z. The dashed square cotais all z R 2 such that max{ z, z 2 } c ur. The secod pael displays the correspodig radom sample of Z θ 0 whe θ 0,2 θ 0, = 0, Figure : Scatter plots of samples illustratig relatio betwee critical values 4 Urestricted 4 p 30;2! 3 0; = 0 2 2 Z2 0 Z2 0-2 -2-4 -4-2 0 2 4 Z -4-4 -2 0 2 4 Z p 30;2! 3 0; = p 30;2! 3 0; = 5 4 4 2 2 Z2 0 Z2 0-2 -2-4 -4-2 0 2 4 Z -4-4 -2 0 2 4 Z 8

which is simply the projectio of Z o the set {z R 2 : z z 2 }. I particular, for each realizatio z we have z θ 0 = z if z z 2 ad z θ 0 = 0.5z + z 2, z + z 2 if z > z 2. Therefore, if max{ z, z 2 } c ur, the also max{ z, θ 0, z,2 θ 0 } c ur, which immediately implies that c r, θ 0 c ur. The solid square cotais all z R 2 such that max{ z, z 2 } c r, θ 0, which is strictly iside the dashed square. The third ad fourth pael show a similar situatios with θ 0,2 θ 0, = ad θ 0,2 θ 0, = 5, respectively. As θ 0,2 θ 0, icreases, the percetage projected o the solid lie decreases ad therefore c r, θ 0 gets closer to c ur. Moreover, oce θ 0,2 θ 0, is large eough, c r, θ 0 = c ur. Figure 2 shows the resultig cofidece regios for θ 0 whe = 00, coditioal o specific realizatios of ˆθ ur ad ˆθ r. The cofidece sets deped o these realizatios, but give ˆθ ur ad ˆθ r, they do ot deped o θ 0. The dashed red square is CI ur ad the solid Figure 2: Cofidece regios 0.6 ^3 ur = 0; 0 0 ad ^3 r = 0; 0 0 0.6 ^3 ur = 0; 0: 0 ad ^3 r = 0; 0: 0 0.4 0.4 0.2 0.2 32 32 0 0-0.2-0.2-0.4-0.5-0.3-0. 0. 0.3 0.5 3-0.4-0.5-0.3-0. 0. 0.3 0.5 3 0.6 ^3 ur = 0; 0:3 0 ad ^3 r = 0; 0:3 0 0.6 ^3 ur = 0:;!0: 0 ad ^3 r = 0; 0 0 0.4 0.4 0.2 0.2 32 32 0 0-0.2-0.2-0.4-0.5-0.3-0. 0. 0.3 0.5 3-0.4-0.5-0.3-0. 0. 0.3 0.5 3 9

blue lies are the boudary of CI r. I the first pael ˆθ ur = ˆθ r = 0, 0. Sice ˆθ ur = ˆθ r ad c r, θ c ur for all θ R 2, it holds that CI r CI ur. Also otice that sice c r, θ depeds o θ, CI r is ot a triagle as opposed to the set CI ur {θ R : θ θ 2 }. The secod ad the third pael display similar situatios with ˆθ ur = ˆθ r = 0, 0. ad ˆθ ur = ˆθ r = 0, 0.3, respectively. I both cases, CI r CI ur. It also follows from the previous discussio that if ˆθ ur = ˆθ r ad if ˆθ ur,2 ˆθ ur, is large eough the CI ur = CI r. Cosequetly, for ay fixed θ 0 with θ 0, < θ 0,2, it holds that P CI r = CI ur. However, this equivalece does ot hold if θ 0 is at the boudary or close to the boudary. Furthermore, it the holds with positive probability that CI ur {θ R : θ θ 2 } =, while CI r always cotais ˆθ r. The fourth pael illustrates that if ˆθ ur ˆθ r, the CI r is ot a subset of CI ur. The set CI r is a exact 95% cofidece set for θ 0, but it caot simply be characterized by its extreme poits ad it ca be hard to report with more tha two dimesios. Nevertheless, we ca use it to costruct a rectagular cofidece set. To do so, for j =, 2 defie ˆθ L r,j = mi θ CI r θ j ad ˆθU r,j = max θ CI r θ j ad CI r = { θ R 2 : θ θ 2 ad ˆθ r, L θ ˆθ r, U ad ˆθ r,2 L θ 2 ˆθ } r,2 U. It the holds by costructio that CI r CI r ad therefore P θ 0 CI r 0.95. Moreover, just as before, if ˆθ ur = ˆθ r, the CI r CI ur. If for example ˆθ ur = ˆθ r = 0, 0, the ˆθ r,2 U = ˆθ r, L = c ur / but ˆθ r, U = ˆθ r,2 L < c ur /, which ca be see from the first pael of Figure 2. Hece, relative to the cofidece set from the urestricted estimator, we obtai width gais for the upper ed of the first dimesio ad the lower ed of the secod dimesio. The width gais decrease as ˆθ ur moves away from the boudary ito the iterior of Θ R. Moreover, for ay ˆθ ur ad ˆθ r ad j =, 2 we get ˆθ r,j U ˆθ r,j L 2c ur /. Thus, the sides of the square {θ R 2 : ˆθ r, L θ ˆθ r, U ad ˆθ r,2 L θ 2 ˆθ r,2} U are ever loger tha the sides of the square CI ur. Fially, if ˆθ ur is sufficietly i the iterior of Θ R, the CI r = CI ur, which is a importat feature of our iferece method. We get this equivalece i the iterior of Θ R because we ivert a test based o a particular type of test statistic, amely max{ Z, Z 2 }. If we started out with a differet test statistic, such as Z 2 + Z2, 2 we would ot obtai CI r = CI ur i the iterior of Θ R. We retur to this result more geerally i Sectio 3.2 ad discuss possible alterative ways of costructig cofidece regios i Sectio 8. This method of costructig cofidece sets is easy to geeralize. As a first step, let Θ R be a restricted parameter space ad let Q θ be a populatio objective fuctio. Suppose that the urestricted estimator ˆθ ur miimizes Q θ. Also suppose that Q θ is a quadratic 0

fuctio i θ which implies that 2 Q θ does ot deped o θ. The with ˆΩ = 2 Q θ we get Q θ = Q ˆθ ur + Q ˆθ ur θ ˆθ ur + 2 θ ˆθ ur ˆΩθ ˆθur ad sice Q ˆθ ur = 0 it holds that ˆθ r = arg mi θ Θ R θ ˆθ ur 2ˆΩ. Hece, ˆθ r is simply the projectio of ˆθ ur o Θ R. Thus, just as before, we ca use a chage of variables ad the characterize the distributio of ˆθ r θ 0 i terms of the distributio of ˆθ ur θ 0 ad a local parameter space that depeds o θ 0 ad. 3 Geeral setup I this sectio we discuss a geeral framework ad provide coditios for uiformly valid iferece. We start with a iformal overview of the iferece method ad provide the formal assumptios ad results i Sectio 3.. regios for geeral fuctios of the parameter vector. Let Θ R K I Sectio 3.2 we discuss rectagular cofidece be the parameter space ad let Θ R Θ be a restricted parameter space. Ifereces focuses o θ 0 Θ R. I a example discussed i Sectio 4.2 we have θ 0 = EY X = x... EY X = x K, ad K icreases with the sample size. I this case, the cofidece regios we obtai are aalogous to the oes i the simple example above. For series estimatio we take θ 0 R K such that g 0 x px θ 0, where g 0 is a ukow fuctio of iterest ad px is a vector of basis fuctios. A rectagular cofidece regio for certai fuctios of θ 0 ca the be iterpreted as a uiform cofidece bad for g 0 ; see Sectio 4.3 for details. Eve though θ 0 ad Θ may deped o the sample size, we omit the subscripts for brevity. As explaied i Sectio 2, i may applicatios we ca obtai a restricted estimator as a projectio of a urestricted estimator o the restricted parameter space. More geerally, we assume that there exist ˆθ ur ad ˆθ r such that ˆθ r is approximately the projectio of ˆθ ur o Θ R uder some orm ˆΩ see Assumptio below for a formal statemet. Moreover, sice the rate of covergece may be slower tha /, let κ be a sequece of umbers such that κ as. The ˆθ r arg mi θ ˆθ ur 2ˆΩ θ Θ R = arg mi κ θ θ 0 κ ˆθ ur θ 0 2ˆΩ. θ Θ R

Next defie Λ θ 0 = {λ R K : λ = κ θ θ 0 for some θ Θ R }. The κ ˆθ r θ 0 arg mi λ κ ˆθ ur θ 0 2ˆΩ. λ Λ θ 0 We will also assume that κ ˆθ ur θ 0 is approximately N0, Σ distributed see Assumptio 2 for a formal statemet ad that we have a cosistet estimator of Σ, deoted by ˆΣ. Now let Z N0, I K K be idepedet of ˆΣ ad ˆΩ ad defie Z θ, ˆΣ, ˆΩ = arg mi λ ˆΣ /2 Z 2ˆΩ. λ Λ θ We will use the distributio of Z θ 0, ˆΣ, ˆΩ to approximate the distributio of κ ˆθ r θ 0. This idea is aalogous to Adrews 999, 200; see for example Theorem 2e i Adrews 999. The mai differeces are that θ 0 ca grow i dimesios as ad that our local parameter space Λ θ 0 depeds o because we allow θ 0 to be close to the boudary. Now for θ Θ R cosider testig H 0 : θ 0 = θ based o a test statistic T, which depeds o κ ˆθ r θ ad ˆΣ. For example T κ ˆθ r θ, ˆΣ κ ˆθ r,k = max θ k. k=,...,k ˆΣkk We reject H 0 if ad oly if T κ ˆθ r θ, ˆΣ > c α, θ, ˆΣ, ˆΩ, where c α, θ, ˆΣ, ˆΩ = if{c R : P T Z θ, ˆΣ, ˆΩ, ˆΣ c ˆΣ, ˆΩ α}. Our α cofidece set for θ 0 is the CI = {θ Θ R : T κ ˆθ r θ, ˆΣ c α, θ, ˆΣ, ˆΩ}. To guaratee that P θ 0 CI α uiformly over a class of distributios P we require P T κ ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ α 0. sup P P Notice that if ˆθ r was exactly the projectio of ˆθ ur o Θ R, if κ ˆθ ur θ 0 was exactly N0, Σ distributed, if Σ ad Ω were kow, ad if T Z θ 0, Σ, Ω, Σ was cotiuously distributed, the by costructio P T κ ˆθ r θ 0, Σ c α, θ 0, Σ, Ω = α, 2

just as i the simple example i Sectio 2. Therefore, the assumptios below simply guaratee that the various approximatio errors are small ad that small approximatio errors oly have a small impact o the distributio of the test statistic. 3. Assumptios ad mai result Let ε be a sequece of positive umbers with ε 0. We discuss the role of ε after statig the assumptios. Let P be a set of distributios satisfyig the followig assumptios. 3 Assumptio. There exists a symmetric, positive semi-defiite matrix ˆΩ such that ad R = o p ε uiformly over P P. κ ˆθ r θ 0 = arg mi λ κ ˆθ ur θ 0 2ˆΩ + R λ Λ θ 0 Assumptio 2. There exist symmetric, positive defiite matrices Ω ad Σ ad a sequece of radom variables Z N0, Σ such that λ mi Ω /2 κ ˆθ ur θ 0 Z = o p ε uiformly over P P. Assumptio 3. There exists a costat C λ > 0 such that /C λ λ mi Σ C λ, /C λ λ max Ω C λ ad λ max Σ λ mi Ω ˆΣ Σ 2 S = o p ε 2 /K uiformly over P P. ad Assumptio 4. Θ R is closed ad covex ad θ 0 Θ R. λ max Σ λ mi Ω 2 ˆΩ Ω S = o p ε 2 /K Assumptio 5. Let Σ ad Σ 2 be ay symmetric ad positive defiite matrices such that /B λ mi Σ B ad /B λ mi Σ 2 B for some costat B > 0. There exists a costat C, possibly depedig o B, such that for ay z R K ad z 2 R K T z, Σ T z 2, Σ C z z 2 ad T z, Σ T z, Σ 2 C z Σ Σ 2 S. Assumptio 6. There exists δ 0, α such that for all β [α δ, α + δ] ad sup P T Z θ 0, Σ, Ω, Σ c β, θ 0, Σ, Ω ε β 0 P P sup P T Z θ 0, Σ, Ω, Σ c β, θ 0, Σ, Ω + ε β 0. P P 3 Eve though θ 0 depeds o P P, we do ot make the depedece explicit i the otatio. 3

As demostrated above, if ˆθ ur maximizes Q θ ad if 2 Q θ does ot deped o θ, the Assumptio holds with R = 0 ad ˆΩ = 2 Q θ. Adrews 999 provides geeral sufficiet coditios for a small remaider i a quadratic expasio. The assumptio also holds by costructio if we simply project ˆθ ur o Θ R to obtai ˆθ r. More geerally, the assumptio does ot ecessarily require ˆθ ur to be a urestricted estimator of a criterio fuctio, which may ot eve exist i some settigs if the criterio fuctio is ot defied outside of Θ R. Eve i these cases, ˆθ r is usually a approximate projectio of a asymptotically ormally distributed estimator o Θ R. 4 Assumptio 2 ca be verified usig a couplig argumet ad the rate of covergece of ˆθ ur ca be slower tha /. Assumptio 3 esures that the estimatio errors of ˆΣ ad ˆΩ are egligible. If λ mi Ω is bouded away from 0 ad if λ max Σ is bouded, the the assumptio simply states that ˆΣ Σ S = o p ε / K ad ˆΩ Ω S = o p ε 2 /K, which is easy to verify i specific examples. Allowig λ mi Ω 0 is importat for ill-posed iverse problems such as NPIV. We explai i Sectios 4 ad 5 that both /C λ λ mi Σ C λ ad /C λ λ max Ω C λ hold uder commo assumptios i a variety of settigs. We could adapt the assumptios to allow for λ mi Σ 0 ad λ max Ω, but this would require much more otatio. Assumptio 4 holds for example with liear iequality costraits of the form Θ R = {θ R K : Aθ b}. Other examples of covex shape restrictios for series estimators are mootoicity, covexity/cocavity, icreasig returs to scale, or homogeeity of a certai degree, but we rule out Slutzki restrictios, which Horowitz ad Lee 207 allow for. The assumptio implies that Λ θ 0 is closed ad covex as well. The mai purpose of this assumptio is to esure that the projectio o Λ θ 0 is oexpasive, ad thus, we could replace it with a higher level assumptio. 5 Assumptio 5 imposes cotiuity coditios o the test statistic. We provide several examples of test statistics satisfyig this assumptio i Sectios 4 ad 5. Assumptio 6 is a cotiuity coditio o the distributio of T Z θ 0, Σ, Ω, Σ, which requires that its distributio fuctio does ot become too steep too quickly as icreases. It is usually referred to as a ati-cocetratio coditio ad it is ot ucommo i these type of testig problems; see e.g. Assumptio 6.7 of Cherozhukov, Newey, ad Satos 205. If the distributio fuctio is cotiuous for ay fixed K, the the assumptio is a abstract rate coditio o how fast K ca diverge relative to ε. As explaied below, to get aroud this assumptio we could take c α, θ, ˆΣ, ˆΩ + ε istead of c α, θ, ˆΣ, ˆΩ as the critical value. Also 4 See Ketz 207 for the costructio of such a estimator. ˆθur does ot eve have to be a feasible estimator ad we could simply replace κ ˆθ ur θ 0 by a radom variable Ẑ, which is allowed for by our geeral formulatio; specifically see Z T i Adrews 999. 5 I.e. we use arg mi λ Λθ 0 λ z ˆΩ arg mi λ Λθ 0 λ z 2 ˆΩ ˆΩ C z z 2 ˆΩ for some C > 0. 4

otice that Assumptios 5 impose very little restrictios o the shape restrictios ad hece, they are isufficiet to guaratee that the distributio fuctio of T Z θ 0, Σ, Ω, Σ is cotiuous. We ow get the followig result. Theorem. Suppose Assumptios 5 hold. The lim if If i additio Assumptio 6 holds the P sup P P if P T κ ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ + ε α. P P T κ ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ α 0. The first part of Theorem implies that if we take c α, θ, ˆΣ, ˆΩ + ε for ay fixed ε > 0 as the critical value, the the rejectio probability is asymptotically at most α uder the ull hypothesis, eve if Assumptio 6 does ot hold. I this case, ε ca go to 0 arbitrarily slowly. A alterative iterpretatio is that with c α, θ, ˆΣ, ˆΩ as the critical value ad without Assumptio 6, the rejectio probability might be larger tha α i the limit, but the resultig cofidece set is arbitrarily close to the α cofidece set. The secod part states that the test has the right size asymptotically if Assumptios 6 hold. 3.2 Rectagular cofidece sets for fuctios The previous results yield asymptotically valid cofidece regios for θ 0. However, these regios might be hard to report if K is large ad they may ot be the mai object of iterest. For example, we might be more iterested i a uiform cofidece bad for a fuctio rather tha a cofidece regio of the coefficiets i the series expasio. We ow discuss how we ca use these regios to obtai rectagular cofidece sets for fuctios h : R K R L usig projectios, similar as i Sectio 2 where hθ = θ. Rectagular cofidece regios are easy to report because we oly have to report the extreme poits of each coordiate, which is crucial whe L is large. Our method applies to geeral fuctios, such as fuctio values or average derivatives i oparameteric estimatio. I our applicatios we focus o uiform cofidece bads, which we ca obtai usig specific fuctios h, as explaied i Sectios 4 ad 5. Defie CI = {θ Θ R : T κ ˆθ r θ, ˆΣ c α, θ, ˆΣ, ˆΩ} ad let ĥ L l = if h lθ ad ĥ U l = sup h l θ, l =,..., L. θ CI θ CI 5

Notice that if θ 0 CI, the ĥl l h l θ 0 ad ĥu l h l θ 0 for all l =,..., L. We therefore obtai the followig corollary. 6 Corollary. Suppose Assumptios 6 hold. The lim if ĥl if P l h l θ 0 P P ĥu l for all l =,..., L α. A projectio for ay T satisfyig the assumptios above yields a rectagular cofidece regio with coverage probability at least α i the limit. I the examples discussed i Sectios 4 ad 5 we pick T such that the resultig cofidece regio is ocoservative for θ 0 i the iterior of Θ R, just as the cofidece sets i Figure 2. I these examples h l θ = c l +q l θ, where c l is a costat ad q l R L, ad possibly L > K. We the let T κ ˆθ r θ, ˆΣ κ q ˆθr θ = max q ˆΣq,..., κ q L ˆθr θ q L ˆΣqL. Now suppose that for ay θ CI, the critical value does ot deped o θ, which will be the case with probability approachig if θ 0 is i the iterior of the parameter space. That is cθ, ˆΣ, ˆΩ = ĉ. The CI = θ Θ q ˆΣq l l R : h l ˆθ r ĉ κ h l θ h l ˆθ r + ĉ q l ˆΣq l κ for all l =,..., L. Moreover, by the defiitios of the ifimum ad the supremum as the largest lower boud ad smallest upper boud respectively, it holds that q ˆΣq ĥ L l l q ˆΣq l h l ˆθ r ĉ ad ĥ U l l l h l ˆθ r + ĉ κ κ for all l =,..., L ad thus, Cosequetly P ĥ L l h l θ 0 ĥu l for all l =,..., L θ 0 CI. ĥl l h l θ 0 ĥu l for all l =,..., L = P θ 0 CI. We state a formal result, which guaratees that the projectio based cofidece set does ot suffer from over-coverage if θ 0 is sufficietly i the iterior of the parameter space, i Corollary A i the appedix. The results ca be exteded to oliear fuctios h alog the lies of Freyberger ad Rai 207. 6 Uder Assumptios - 5 oly, we could project o {θ Θ R : T κ ˆθ r θ, ˆΣ c α, θ, ˆΣ, ˆΩ + ε } to obtai the same coclusio as i Corollary. 6

4 Coditioal mea estimatio I this sectio we provide sufficiet coditios for Assumptios 5 whe Y = g 0 X + U, EU X = 0 ad Y, X ad U are scalar radom variables. We also explai how we ca use the projectio results to obtai uiform cofidece bads for g 0. We first assume that X is discretely distributed to illustrate that the iferece method ca easily be applied to fiite dimesioal models. We the let X be cotiuously distributed ad discuss both kerel ad series estimators. Throughout, we assume that the data is a radom sample {Y i, X i } i=. The proofs of all results i this ad the followig sectio are i the supplemetary appedix. 4. Discrete regressors Suppose that X is discretely distributed with support X = {x,..., x K }, where K is fixed. Let θ 0 = EY X = x... EY X = x K ad ˆθ ur = i= Y ix i =x i= X i=x... i= Y ix i =x K i= X i=x K. Defie σ 2 x k = V aru X = x k ad px k = P X = x k > 0, ad let σ 2 x Σ = diag px,..., σ2 x K px K ad where ˆpx k = i= X i = x k ad ˆσ ˆΣ 2 x = diag ˆpx,..., ˆσ2 x K, ˆpx K ˆσ 2 x k = i= Y i 2 X i = x k i= X i = x k i= Y 2 ix i = x k i= X. i = x k Let Θ R be a covex subset of R K, such as Θ R = {θ R K : Aθ b}. Now defie ˆθ r = arg mi θ ˆθ ur 2ˆΣ θ Θ R ad hece ˆΩ = ˆΣ. Other weight fuctios ˆΩ, such as the idetity matrix, are possible choices as well. We discuss this issue further i Sectio 8. As a test statistic we use { } T z, ˆΣ = max z / ˆΣ,..., z K / ˆΣ KK 7

because the resultig cofidece regio of the urestricted estimator is rectagular, aalogous to the oe i Sectio 2. We ow get the followig result. Theorem 2. Let P be the class of distributios satisfyig the followig assumptios. The. {Y i, X i } i= is a iid sample from the distributio of Y, X with σ 2 x k [/C, C], px k /C, ad EU 4 X = x k C for all k =,..., K ad for some C > 0. 2. Θ R is closed ad covex ad θ 0 Θ R. 3. = oε 3. lim if if P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ + ε α. P P If i additio Assumptio 6 holds the sup P P P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ Next let h l θ = θ l for l =,..., K. α 0. The the results i Sectio 3.2 yield a rectagular cofidece regio for θ 0, which ca be iterpreted as a uiform cofidece bad for g 0 x,..., g 0 x K. Moreover, Corollary A i the appedix shows that the bad is ocoservative if θ 0 is sufficietly i the iterior of the parameter space. 4.2 Kerel regressio We ow suppose that X is cotiuously distributed with desity f X. We deote its support by X ad assume that X = [x, x]. Let {x,..., x K } X ad θ 0 = EY X = x... EY X = x K. Here K icreases as the sample size icreases ad thus, our setup is very similar to Horowitz ad Lee 207. Let K be a kerel fuctio ad h the badwidth. The urestricted estimator is i= ˆθ ur = Y ik x X i h K x X i... i= h i= ik Y xk X i h i= K xk X i h Defie B = Ku2 du ad σ 2 x = V aru X = x ad let σ 2 x B Σ = diag f X x,..., σ2 x K B, f X x K 8.

ad ˆσ ˆΣ 2 x B = diag ˆf X x,..., ˆσ2 x K B, ˆf X x K where ˆf X x k = h i= K x k X i h ad ˆσ 2 x k = i= Y i 2 x K k X i h i= K x k X i h i= Y x ik k X i i= K x k X i h h Just as before, let Θ R be covex such as Θ R = {θ R K : Aθ b} ad defie ˆθ r = arg mi θ ˆθ ur 2ˆΣ, θ Θ R implyig that ˆΩ = ˆΣ. Fially, as before we let { } T z, ˆΣ = max z / ˆΣ,..., z K / ˆΣ KK. We get the followig result. Theorem 3. Let P be the class of distributios satisfyig the followig assumptios. The. The data {Y i, X i } i= is a iid sample where X = [x, x]. a g 0 x ad f X x are twice cotiuously differetiable with uiformly bouded fuctio values ad derivatives. if x X f X x /C for some C > 0. b σ 2 x is twice cotiuously differetiable, the fuctio ad derivatives are uiformly bouded o X, ad if x X σ 2 x /C for some C > 0. c EY 4 X = x C for some C > 0. 2. x k x k > 2h for all k ad x > x + h ad x K < x h. 3. K is a bouded ad symmetric pdf with support [, ]. 4. Θ R is closed ad covex ad θ 0 Θ R. 5. K h 5 = oε 2 ad K5/2 h lim if sup P P = oε 3. ˆΣ if P T h ˆθ r θ 0, c α, θ 0, ˆΣ, ˆΩ + ε α. P P If i additio Assumptio 6 holds the P T ˆΣ h ˆθ r θ 0, c α, θ 0, ˆΣ, ˆΩ 9 α 0. 2.

The first assumptio cotais stadard smoothess ad momet coditios. The secod assumptio guaratees that estimators of g 0 x k ad g 0 x l for k l are idepedet, just as i Horowitz ad Lee 207, ad it also avoids complicatios associated with x k beig too close to the boudary of the support. The third assumptio imposes stadard restrictios o the kerel fuctio ad the fourth assumptio has bee discussed before. The fifth assumptio cotais rate coditios. Notice that with a fixed K, these rates are the stadard coditios for asymptotic ormality with udersmoothig i kerel regressio. The rate coditios also imply that K h 0, which is similar to Horowitz ad Lee 207. Oce agai with h l θ = θ l for l =,..., K the results i Sectio 3.2 yield a rectagular cofidece regio for θ 0, which is a uiform cofidece bad for g 0 x,..., g 0 x K. Remark. While we use the Nadaraya-Watso estimator for simplicity, the geeral theory also applies to other estimators, such as local polyomial estimators. Aother possibility is to use a bias corrected estimator ad the adjusted stadard errors suggested by Caloico, Cattaeo, ad Farrell 207. Fially, the geeral theory ca also be adapted to icorporate a worst-case bias as i Armstrog ad Kolesár 206 istead of usig the udersmoothig assumptio; see Sectio S.2 for details. 4.3 Series regressio I this sectio we agai assume that X X is cotiuously distributed, but we use a series estimator. Oe advatage of a series estimator is that it yields uiform cofidece bads for the etire fuctio g 0, rather tha just a vector of fuctio values. Let p K x R K be a vector of basis fuctios ad write g 0 x p K x θ 0 for some θ 0 Θ R. We agai let Θ R be a covex set such as {θ R K : Aθ b}. For example, we could impose the costraits p K x j θ 0 for j =,..., J. Notice that J is ot restricted, ad we could eve impose p K x θ 0 for all x X if it is computatioally feasible. 7 The urestricted ad restricted estimators are ˆθ ur = arg mi θ R K Y i p K X i θ 2 i= ad ˆθ r = arg mi θ Θ R Y i p K X i θ 2, i= 7 For example, with quadratic splies p K x θ 0 reduces to fiitely may iequality costraits. 20

respectively. The assumptios esure that both miimizers are uique with probability approachig. Sice the objective fuctio is quadratic i θ 0 we have ˆθr θ 0 = arg mi λ ˆθ ur θ 0 2ˆΩ, λ Λ θ 0 where ˆΩ = i= p K X i p K X i ad Ω = EˆΩ. Defie Σ = Ep K X i p K X i EU 2 i p K X i p K X i Ep K X i p K X i. Also let Ûi = Y i p K X i ˆθur ad ˆΣ = ˆΩ Ûi 2 p K X i p K X i i= Let ˆσx = p K x ˆΣp K x. We use the test statistic T ˆθ r θ 0, ˆΣ = sup x X ˆΩ. p K x ˆθr θ 0 ˆσx. The followig theorem provides coditios to esure that cofidece sets for θ 0 have the correct coverage asymptotically. We the explai how we ca use these sets to costruct uiform cofidece bads for g 0 x. To state the theorem, let ξk = sup x X p K x. Theorem 4. Let P be the class of distributios satisfyig the followig assumptios. The. The data {Y i, X i } i= is a iid sample from the distributio of Y, X with EU 2 X [/C, C] ad EU 4 X C for some C > 0. 2. The basis fuctios p k are orthoormal o X with respect to the L 2 orm ad f X x [/C, C] for all x X ad some C > 0. 3. Θ R is closed ad covex ad θ 0 Θ R is such that for some costats C g ad γ > 0 4. K 2γ = oε 2, ξk2 K 4 lim if sup P P sup g 0 x p K x θ 0 C g K γ. x X = oε 6, ad ξk4 K 3 = oε 2. if P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ + ε α. P P If i additio Assumptio 6 holds the P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ 2 α 0.

The first assumptio imposes stadard momet coditios. The mai role of the secod assumptio is to guaratee that the miimum eigevalues of Σ ad Ω are bouded ad bouded away from 0. The third assumptio says that g 0 ca be well approximated by a fuctio satisfyig the costraits, ad the fourth assumptio provides rate coditios. For asymptotic ormality of oliear fuctioals Newey 997 assumes that K 2γ + ξk 4 K 2 0. For orthoormal polyomials ξk = C p K ad for splies ξk = C s K. Thus, our rate coditios are slightly stroger tha the oes i Newey 997, but we also obtai cofidece sets for the K dimesioal vector θ 0, which we ca trasform to uiform cofidece bads for g 0. The last rate coditio, ξk4 K 3 that varu i X i = σ 2 > 0. = oε 2, is ot eeded uder the additioal assumptios Remark 2. I a fiite dimesioal regressio framework with K = K, the third assumptio always holds ad the fourth assumptio oly requires that. I this case the secod assumptio ca be replaced with the full rak coditio λ mi Ep K Xp K X /C. To obtai a uiform cofidece bad for g 0 X, defie ad let CI = {θ Θ R : T ˆθ r θ, ˆΣ c α, θ, ˆΣ, ˆΩ} ĝ l x = mi θ CI p K x θ ad ĝ u x = max θ CI p K x θ. Also otice that p K x 2 is bouded away from 0 if the basis fuctios cotai the costat fuctio. We get the followig result. Corollary 2. Suppose the assumptios of Theorem 4 ad Assumptio 6 hold. suppose that if x X p K x 2 > /C for some costat C > 0. The Further lim if if P ĝ lx g 0 x ĝ u x x X α. P P Remark 3. Without ay restrictios o the parameter space, ivertig our test statistic results i a uiform cofidece bad where the width of the bad is proportioal to the stadard deviatio of the estimated fuctio for each x. This bad ca also be obtaied as a projectig o the uderlyig cofidece set for θ 0 ; see Freyberger ad Rai 207 for this equivalece result. If θ 0 is sufficietly i the iterior of the parameter space, a applicatio of Corollary A shows that the restricted bad is equivalet to that bad with probability approachig. I this case the projectio based bad is ot coservative. 22

Remark 4. Similar as before, Assumptio 6 is ot eeded if the bad is obtaied by projectig o {θ Θ R : T ˆθ r θ, ˆΣ c α, θ, ˆΣ, ˆΩ + ε } Remark 5. The results ca be exteded to a partially liear model of the form Y = g 0 X + X 2γ 0 + U. The parameter vector θ 0 would the cotai both γ 0 ad the coefficiets of the series approximatio of g 0. 5 Istrumetal variables estimatio As the fial applicatio of the geeral method we cosider the NPIV model Y = g 0 X + U, EU Z = 0, where X ad Z are cotiuously distributed scalar radom variables with bouded support. We assume for otatioal simplicity that X ad Z have the same support, X, but this assumptio is without loss of geerality because X ad Z ca always be trasformed to have support o [0, ]. We assume that EU 2 Z = σ 2 to focus o the complicatios resultig from the ill-posed iverse problem. Here, the data is a radom sample {Y i, X i, Z i } i=. As before, let p K x R K be a vector of basis fuctios ad write g 0 x p K x θ 0 for some θ 0 Θ R, where Θ R is a covex subset of R K. Let P X be the K matrix, where the ith row is p K X i ad defie P Z aalogously. Let Y be the vector cotaiig Y i. Let ad ˆθ ur = arg mi θ R K Y P X θ P Z P ZP Z P ZY P X θ ˆθ r = arg mi θ Θ R Y P X θ P Z P ZP Z P ZY P X θ. For simplicity we use the same basis fuctio as well as the same umber of basis fuctios for X i ad Z i. Our results ca be geeralized to allow for differet basis fuctios ad more istrumets tha regressors. Sice the objective fuctio is quadratic i θ 0 we have ˆθr θ 0 = arg mi λ ˆθ ur θ 0 2ˆΩ, λ Λ θ 0 where ˆΩ = P X P ZP Z P Z P Z P X. Furthermore, let Q XZ = Ep K X i p K Z i. The Σ = σ 2 Q XZ Ep K Z i p K Z i Q XZ, which we estimate by ˆΣ = ˆσ 2 ˆΩ with ˆσ 2 = i= Û 2 i ad Ûi = Y i p K X i ˆθur. 23

As before, ˆσx = p K x ˆΣp K x ad the test statistic is T ˆθ r θ 0, ˆΣ = sup x X p K x ˆθr θ 0 ˆσx. The followig theorem provides coditios to esure that cofidece sets for θ 0 have the correct coverage, ad aalogously to before we ca trasform these sets to uiform cofidece bads for g 0 x. As before, let ξk = sup x X p K x. Theorem 5. Let P be the class of distributios satisfyig the followig assumptios.. The data {Y i, X i, Z i } i= is a iid sample from the distributio of Y, X, Z with EU 2 Z = σ 2 [/C, C] ad EU 4 Z C for some C > 0. 2. The fuctios p k are orthoormal o X with respect to the L 2 orm ad the desities of X ad Z are uiformly bouded above ad bouded away from 0. 3. Θ R is closed ad covex ad for some fuctio bk ad θ 0 Θ R sup g 0 x p K x θ 0 bk. x X 4. λ mi Q XZ Q XZ τ K > 0 ad λ max Q XZ Q XZ [/C, C] for some C <. 5. bk2 τ 2 K = oε 2 ad ξk2 K 4 τ 6 K = oε 6. The lim if sup P P if P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ + ε α. P P If i additio Assumptio 6 holds the P T ˆθ r θ 0, ˆΣ c α, θ 0, ˆΣ, ˆΩ α 0. Assumptios 3 of the theorem are very similar to those of Theorem 4. Assumptio 4 defies a measure of ill-posedess τ K, which affects the rate coditios. It is easy to show that λ max Q XZ Q XZ is bouded as log as f XZ is square itegrable. However, λ max Q XZ Q XZ C also allows for X = Z as a special case. I fact, i this case, τ K is bouded away from 0 ad all assumptios reduce to the oes i the series regressio framework with homoskedasticity. Moreover, similar to Remark 2, the assumptios also allow for K to be fixed i which case all coditios reduce to stadard assumptios i a parametric IV framework. Remark 5. Fially, the results ca also be exteded to a partially liear model; see 24

6 Mote Carlo simulatios To ivestigate fiite sample properties of our iferece method we simulate data from the model Y = g 0 X + U, EU Z = 0, where X [, ] ad g 0 X = c F 4 X +. 2 Here, F is the cdf of a t-distributio with oe degree of freedom ad we vary the costat c. Figure 3 shows the fuctio for = 5, 000 ad c {0, 0, 20, 30, 40, 50}. Clearly, c = 0 belogs to the costat fuctio. As c icreases the slope of g 0 x icreases for every x. Let X, Z, ad U be joitly ormally distributed with varu = 0.25 ad var Z = var X =. Let X = 2F X X Uif[, ] ad Z = 2F Z Z Uif[, ]. We cosider two DGPs. First, we let cov X, U = 0. Thus, X is exogeous ad we use the series estimator described i Sectio 4.3. Secod, we let cov X, Z = 0.7 ad cov X, U = 0.5 ad use the NPIV estimator. I both cases we focus o uiform cofidece bads for g 0. I this sectio we report results with Legedre polyomials as basis fuctio. I Sectio S.4 i the supplemet we report qualitatively very similar results for quadratic splies. For the series regressio settig we take =, 000 ad for NPIV we use = 5, 000. We take sample sizes large eough such that the urestricted estimator has good coverage properties for a sufficietly large umber of series terms, which helps i aalyzig how coservative the restricted cofidece bads ca be. All results are based o, 000 Mote Carlo simulatios. Figure 3: g 0 for differet values of c 0.75 0.5 0.25 0-0.25-0.5 c = 0 c = 0 c = 20 c = 30 c = 40-0.75 - -0.5 0 0.5 c = 50 25

We impose the restrictio that g 0 is weakly decreasig ad we eforce this costrait o 0 equally spaced poits. We solve for the uiform cofidece bads o 30 equally spaced grid poit. Usig fier grids has almost o impact o the results, but icreases the computatioal costs. 8 To solve the optimizatio problems, we have to calculate c α θ, ˆΣ, ˆΩ, which is ot available i closed form. To do so, we take 2000 draws from a multivariate ormal distributio ad use them to estimate the distributio fuctio of T Z θ, ˆΣ, ˆΩ, ˆΣ usig a kerel estimator ad Silverma s rule of thumb badwidth. We the take the α quatile of the estimated distributio fuctio as the critical value. Estimatig the distributio fuctio simply as a step fuctio yields almost idetical critical values for ay give θ, but our costructio esures that the estimated critical value is a smooth fuctio of θ. The umber of draws from the ormal distributio is aalogous to the umber of bootstrap samples i other settigs ad usig more draws has almost o impact o our results. Tables ad 2 show the simulatio results for the series regressio model ad the NPIV model, respectively. The first colum is the order of the polyomial ad K = 2 belogs to a liear fuctio. We use the same umber of basis fuctios for X ad Z, but usig K + 3 for the istrumet matrix yields very similar results. The third ad fourth colums show the coverage rates of uiform cofidece bads usig the urestricted ad shape restricted method, respectively. The omial coverage rate is 0.95. For a cofidece bad [ĝ l x, ĝ u x] defie the average width as 30 30 j= ĝ ux j ĝ l x j, where {x j } 30 j= are the grid poits. Colums 5 ad 6 show the medias of the average widths of the, 000 simulated data sets for the urestricted ad restricted estimator, respectively. Let width s ur ad width s r be the average widths i data set s. The last colums shows the media of width s ur width s r/width s ur across the, 000 simulated data sets. Eve though the mea gais are very similar, we report the media gais to esure that our gais are ot maily caused by extreme outcomes. I Table we ca see that the urestricted estimator has coverage rates close to 0.95 if c = 0. For K = 2 ad K = 3, the coverage probability drops sigificatly below 0.95 whe c is large because icreasig c also icreases the approximatio bias. For larger values of K, the coverage probability of the urestricted bad is close to 0.95 for all reported values of c. Due to the projectio, the coverage probability of the restricted bad teds to be above the oe of the urestricted bad. Whe c is large eough, such as c = 0 with K = 2, the two bads are idetical with very large probability. The average width of the urestricted bad does ot deped o c. O the other had, the average width of the restricted bad is 8 I the applicatio we use a grid of 00 poits for the uiform cofidece bads, but we use a coarser grid for the simulatios, because our reported results are based o 78, 000 estimated cofidece bads i total. 26