Nonparametric Estimation of Smooth Conditional Distributions

Similar documents
Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Estimation of cumulative distribution function with spline functions

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Time Series and Forecasting Lecture 4 NonLinear Time Series

Econ 582 Nonparametric Regression

A Novel Nonparametric Density Estimator

Local Polynomial Regression

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Adaptive Kernel Estimation of The Hazard Rate Function

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Nonparametric Regression. Changliang Zou

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Econometrics

A nonparametric method of multi-step ahead forecasting in diffusion processes

FinQuiz Notes

A Bootstrap Test for Conditional Symmetry

Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Density estimators for the convolution of discrete and continuous random variables

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Each element of this set is assigned a probability. There are three basic rules for probabilities:

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

On a Nonparametric Notion of Residual and its Applications

Simple Examples. Let s look at a few simple examples of OI analysis.

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression

Real option valuation for reserve capacity

Estimation of the Error Density in a Semiparametric Transformation Model

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Nonparametric Methods

Chaos and Dynamical Systems

Local Polynomial Modelling and Its Applications

Nonparametric Modal Regression

13 Endogeneity and Nonparametric IV

Teruko Takada Department of Economics, University of Illinois. Abstract

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Uniform Convergence Rates for Nonparametric Estimation

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Transformation-based Nonparametric Estimation of Multivariate Densities

Nonparametric Density Estimation

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Weak bidders prefer first-price (sealed-bid) auctions. (This holds both ex-ante, and once the bidders have learned their types)

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Nonparametric confidence intervals. for receiver operating characteristic curves

DEPARTMENT OF ECONOMICS

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

Gaussian kernel GARCH models

Nonparametric Function Estimation with Infinite-Order Kernels

Local Polynomial Wavelet Regression with Missing at Random

Conditional density estimation: an application to the Ecuadorian manufacturing sector. Abstract

A note on inconsistent families of discrete multivariate distributions

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

1. Define the following terms (1 point each): alternative hypothesis

Smooth simultaneous confidence bands for cumulative distribution functions

Variance Function Estimation in Multivariate Nonparametric Regression

Polynomial Degree and Finite Differences

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC)

41903: Introduction to Nonparametrics

Spiking problem in monotone regression : penalized residual sum of squares

Random fraction of a biased sample

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Does k-th Moment Exist?

The variance for partial match retrievals in k-dimensional bucket digital trees

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

Bandwidth selection for kernel conditional density

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Lecture 6 January 15, 2014

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Smooth functions and local extreme values

SPECIFICATION TESTING FOR ERRORS-IN-VARIABLES MODELS

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March.

INTRODUCTION. 2. Characteristic curve of rain intensities. 1. Material and methods

Modelling Non-linear and Non-stationary Time Series

ERASMUS UNIVERSITY ROTTERDAM Information concerning the Entrance examination Mathematics level 2 for International Business Administration (IBA)

Asymmetric least squares estimation and testing

A converse Gaussian Poincare-type inequality for convex functions

Fixed-b Inference for Testing Structural Change in a Time Series Regression

Sulphur dioxide Number of factories

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

Nonparametric Identication of a Binary Random Factor in Cross Section Data and

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

" $ CALCULUS 2 WORKSHEET #21. t, y = t + 1. are A) x = 0, y = 0 B) x = 0 only C) x = 1, y = 0 D) x = 1 only E) x= 0, y = 1

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

NCoVaR Granger Causality

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Efficient estimation of a semiparametric dynamic copula model

Depth versus Breadth in Convolutional Polar Codes

Nonparametric estimation and symmetry tests for

Bayesian inference with reliability methods without knowing the maximum of the likelihood function

Spectrum Opportunity Detection with Weak and Correlated Signals

Data-Based Choice of Histogram Bin Width. M. P. Wand. Australian Graduate School of Management. University of New South Wales.

Transcription:

Nonparametric Estimation of Smooth Conditional Distriutions Bruce E. Hansen University of Wisconsin www.ssc.wisc.edu/~hansen May 24 Astract This paper considers nonparametric estimation of smooth conditional distriution functions (CDFs) using kernel smoothing methods. We propose estimation using a new smoothed local linear (SLL) estimator. Estimation ias is reduced through the use of a local linear estimator rather than local averaging. Estimation variance is reduced through the use of smoothing. Asymptotic analysis of mean integrated squared error (MISE) reveals the form of these efficiency gains, and their magnitudes are demonstrated in numerical simulations. Considerale attention is devoted to the development of a plugin rule for andwidths which minimize estimates of the asymptotic MISE. We illustrate the estimation method with an application to the U.S. quarterly GDP growth rate. Research supported y the National Science Foundation. Department of Economics, 118 Oservatory Drive, University of Wisconsin, Madison, WI 5376

1 Introduction This paper considers nonparametric estimation of smooth conditional distriution functions (CDFs). As the CDF is the conditional expectation of an indicator function, it may e estimated y nonparametric regression estimation methods. A recent paper which exploits advances in nonparametric regression for estimation of the CDF is Hall, Wolff and Yao (1999). See also Fan and Yao (23). Standard nonparametric regression estimates of the CDF, however, are not smooth and therefore are inefficient. If the dependent variale in these regressions the indicator function is replaced y a smooth function, estimation variance can e reduced at the cost of an increase in ias. If the degree of smoothing is carefully chosen, it can result in a lowered finite sample mean integrated squared error (MISE). The advantages of smoothed estimation of distriution functions has een discussed in the statistics literature. For references see Hansen (24). The extension to CDF estimation considered here is new. Our analysis concerns a pair of random variales (Y,) R R that have a smooth joint distriution. Let f(y) and g(x) denote the marginal densities of Y and, and let f (y x) denote the conditional density of Y given x. The conditional distriution function is Define as well the derivatives F (y x) Z y f (u x) du. F (s) (y x) s F (y x) xs f (s) (y x) s f (y x) xs f (y x) f (y x). y Given a random sample {Y 1, 1,..., Y n, n } from this distriution, the goal is to estimate F (y x) at a fixed value x. To simplify the notation we will sometimes suppress dependence on x. Let Φ(s) and φ(s) denote the standard normal distriution and density functions and let Φ σ (s) Φ(s/σ) and φ σ (s) σ 1 φ(s/σ). Let φ (m) (s) (d m /ds m ) φ(s) and φ σ (m) (s) (d m /ds m ) φ σ (s) σ (m+1) φ (m) (s/σ) denote the derivatives of φ(s) and φ σ (s). In particular, φ (1) (s) sφ(s) and φ (2) (s) s 2 1 φ(s). Our estimators will make use of two univariate kernel functions w(s) and k(s). We assume that oth are symmetric and second-order, are normalized to have a unit variance (e.g. R s2 w(s)ds 1), and have a finite sixth moment (e.g. R s6 w(s)ds < ). Let R R w(s)2 ds denote the roughness of w. Let Z s K(s) k(u)du 1

denote the integrated kernel of k and define the constant ψ 2 sk(s)k(s)ds >. (1) For our applications, we set oth k(s) and w(s) to equal the normal kernel φ(s), in which case R 1/2 π, ψ 1/ π and K(s) Φ(s). Section 2 presents unsmoothed estimators. Section 3 introduces our smoothed estimators. Section 4 develops plug-in andwidth rules for the local linear and smoothed local linear estimator. Section 5 presents a numerical simulation of performance. Section 6 is a simple application to the U.S. quarterly real GDP growth rate. Section 7 concludes. The proof of Theorem 1 is presented in the Appendix. Gauss code which implements the estimators and andwidth methods is availale on the author s wepage http://www.ssc.wisc.edu/~hansen/ 2 Unsmoothed Estimators Note that F (y x) E (1 (Y i y) i x), andthusitisaregressionfunction. ThustheCDFmaye estimated y regression of 1 (Y i y) on i. A simple nonparametric estimator is the Nadaraya-Watson (NW) estimator, and takes the form of a local average ˆF (y x) P n i1 w i1 (Y i y) P n i1 w i (2) where w i 1 µ x w i (3) are kernel weights and is a andwidth. Reduced ias may e achieved using a local linear (LL) estimator. Local linear estimation is a special case of local polynomial regression, and is typically recommended for estimation mean. For a general treatment see Fan and Gijels (1996). The estimator is the intercept from a weighted least-squares regression of 1 (Y i y) on (x i ) using the weights w i, and can e written as P n i1 F (y x) w i 1 (Y i y) P n i1 w i wi w i ³1 ˆβ (x i ) Ã n ˆβ i1 w i (x i ) 2! 1 Ã n i1! w i (x i ). (4) (5) AsshowninTheorem1() of Hall, Wolff and Yao (1999), if cn 1/5 then ³ 2 R (F (y x)(1 F (y x))) E F (y x) F (y x) ' + 4 g(x)n 4 F (2) (y x) 2 + O ³n 6/5. 2

Integrating over y we otain the MISE ³ 2 RV E F (y x) F (y x) dy gn + 4 V 1 + O ³n 6/5 4 (6) where g g(x) V V 1 F (y x)(1 F (y x)) dy F (2) (y x) 2 dy. The first term on the right side of (6) is the integrated asymptotic variance, and the second is the integrated squared asymptotic ias. This ias term is simplified relative to the NW estimator, and this is the source of the improved performance of F over ˆF. Hall, Wolff and Yao (1999) point out that the local linear estimator F has two undesirale properties. It may e nonmonotonic in y, and is not constrained to lie in [, 1]. Thus the estimator F is not a CDF. The prolem occurs when the weights (5) are negative, which occurs when ˆβ (x i ) > 1. A simple solution 1 is to modify wi to exclude negative values. Thus we redefine the weights as follows ( wi ˆβ (x i ) > 1 w i ³1 ˆβ (7) (x i ) ˆβ (x i ) 1 This modification is asymptotically negligile, ut constrains F to e a valid CDF. Using the modified weights (7) our estimator is a slight modification of the local linear estimator, however we will continue to refer to it as the local linear (LL) estimator for simplicity. 3 Smoothed Estimators For some andwidth h> let K h (s) K(s/h) and consider the integrated kernel smooth K h (y Y i ). For small h it can e viewed as a smooth approximation to the indicator function 1 (Y i y). Smoothed CDF estimators replace the indicator function 1 (Y i y) in (2) and (4) with K h (y Y i ), yielding the smoothed Nadaraya-Watson (SNW) estimator ˆF h, (y x) P n i1 w ik h (y Y i ) P n i1 w i 1 Hall, Wolff and Yao (1999) proposed an alternative solution, introducing weights to the Nadaraya-Watson estimator which are selected y an empirical likelihood criterion. Their estimator is asymptotically equivalent to ours, ut computationally more complicated. 3

and the smoothed local linear (SLL) estimator F h, (y x) P n i1 w i K h (y Y i ) P n. i1 w i The weights w i and w i are definedin(3)and(7),respectively. Theorem 1 Assume that F (y x) and g(x) are continuously differentiale up to the fourth order in oth y and x. If cn 1/5 and h O() as n, then ³ 2 R E F h, (y x) F (y x) dy gn (V V 1 hψ)+4 h2 2 V 2 + h4 V 3 + O ³n 6/5 4 2 4 (8) where g g(x) V V 1 V 2 V 3 F (y x)(1 F (y x)) dy F (2) (y x) 2 dy f (y x) f (2) (y x) dy f (y x) 2 dy. The MISE in (8) is a generalization of (6) to allow for h. The MISE is strictly decreasing at h (since ψ > ) demonstrating that the asymptotic MISE is minimized for strictly positive h. Thus there are (at least asymptotic) efficiency gains from smoothing. 4 Plug-In Bandwidth Selection We suggest selecting the andwidths (h, ) y the plug-in method. Plug-in andwidths are the values which minimize an estimate of the asymptotic MISE (8). Estimation of the latter requires estimation of the parameters g, V, V 1,V 2, and V 2. In this section we discuss estimation of these components, and implementation of the plug-in andwidth rule. Readers who are interested in the plug-in algorithm ut not the motivational details can jump directly to Section 4.5, which summarizes the implementation details. 4.1 Kernel Estimation of g The normal kernel density estimator is ĝ 1 n n φ r (x i ) (9) j1 4

where r is a andwidth. An estimate of the asymptotic MSE-minimizing choice for r is ˆr 2 π ĝ s (x) ³ ĝ (2) t (x) 2 n 1/5 (1) where ĝ s (x) 1 n ĝ (2) t (x) 1 n n φ s (x i ) j1 n j1 φ (2) t (x i ) and s and t are preliminary andwidths. Letting ˆσ x denote the sample standard deviation for i, we suggest the Gaussian reference rules s 1.6ˆσ x n 1/5 and t.94ˆσ x n 1/9, which are MISE-optimal for estimation of the density and its second derivative when g is Gaussian. 4.2 Estimation of V The parameter V is a function of the conditional distriution F (y x). A parametric estimator of the latter is the intercept from a p th order polynomial regression of 1 (Y i y) on ( i x). For p let 1 1 x ( 1 x) p p..., (11) 1 n x ( n x) p and let 1 (Y y) denote the stacked 1 (Y i y). Then the estimator can e written as ˆF (y x) δ 1 p p p 1 (Y y) (12) where δ 1 (1 ) is the first unit vector. Given (12), an estimate of V is ³ ˆV ˆF (y x) 1 ˆF (y x) dy Z δ 1 1 p p p 1 (Y >y) 1 (Y y) dy p 1 p p δ1 δ 1 p p p Y Y + p p p 1 δ1 (13) where (a) + a1 (a ) for scalar a, and is an element-y-element operation for matrices. A nonparametric estimator of V can e ased on a local linear regression. Given a andwidth q, let w i φ q (x i ) denote Gaussian kernel weights and set W diag{w i }. Then the estimator is F (y x) δ 1 1 W 1 1 W 1 (Y y) 5

and the estimate of V is Ṽ δ 1 1 W 1 1 W Y Y + W 1 1 W 1 1 δ1. (14) The andwidth q which minimizes the asymptotic mean integrated squared error of F (y x) is µ V q 2 πgv 1 n which depends on the unknowns g, V and V 1. (See equation (8) with h.) Using the estimates ĝ from (9), ˆV from (13), and ˆV 1 from (17) elow, we otain a plug-in andwidth ˆq.78 Ã ˆV ĝ ˆV 1 n 1/5! 1/5. (15) 4.3 Estimation of V 1 The parameter V 1 is a function of the second derivative F (2) (y x) whichcaneestimatedypolynomial regression. For some p 2 let p e defined as in (11). The estimator is ˆF (2) (y x) δ 3 1 p p p 1 (Y y) (16) where δ 3 is the third unit vector (a one in the third place, and zeros elsewhere). To calculate V 2 from (16), oserve that for any c>max Y i Z c 1 (Y y) 1 (Y y) dy Dc max Y,Y where D is an n n matrix of ones, and oserve that δ 3 p p p D. Thus Z c Z ˆF (2) (y x) 2 dy δ c 3 1 p p p 1 (Y y) 1 (Y y) dy p 1 p p δ3 δ 3 p p p Dc max Y,Y p p p 1 δ3 δ 3 1 p p p max Y,Y p 1 p p δ3 where the max operator is element-y-element. Note that this expression is invariant to c>max Y i, thus we define ˆV 1 δ 3 1 p p p max Y,Y p 1 p p δ3. (17) This estimator was also used in (15) to construct a plug-in rule for the nonparametric estimate Ṽ. A nonparametric estimator of V 1 can e ased on a local cuic regression. (While any local polynomial regression of order two or aove is consistent, local cuic regression is recommend for estimation of a regression second derivative.) Given a andwidth q 1, let w 1i φ q1 (x i ) denote Gaussian kernel 6

weights and set W 1 diag{w 1i }. The estimator of F (2) (y x) is F (2) (y x) δ 3 3 W 1 3 3 W 1 1 (Y y) (18) and the estimate of V 1 is Ṽ 1 δ 3 3 W 1 3 3 W 1 max Y,Y W 1 3 3 W 1 3 1 δ3. (19) Using Fan and Gijels (1996, Theorem 3.1 and equation (3.2)) the andwidth which minimizes the asymptotic mean integrated squared error of ˆF (2) (y x) is where The constant V 4 may e estimated y µ V 1/9 q 1 1.1 gv 4 n V 4 F (4) (y x) 2 dy. ˆV 4 δ 5 p p p max Y,Y p p p 1 δ5 (2) where p 4 and δ 5 is the fifth unit vector, suggesting the plug-in andwidth ˆq 1 1.1 Ã! 1/9 ˆV. (21) ĝ ˆV 4 n 4.4 Estimation of V 2 and V 3 If f (y x) does not depend on x then we have the simplification V 3 R f (y) 2 dy which is a wellstudied estimation prolem. For a andwidth a let e the n 1 vector of ones and ˆf a (y) 1 n n j1 φ (1) a (y Y i)φ (1) a y Y 1 e a Gaussian kernel estimate of f (y). An estimate of V 3 in this case is ˆf a (y)2 dy φ (2) a φ (1) a (y Y ) φ(1) y Y dy 1 a Y Y 1 (22) where a 2a. The second equality is equation (7.1) ofmarronandwand(1992): φ (r) a (y Y i ) φ (s) a (y Y j ) dy ( 1) r φ (r+s) 2a (Y i Y j ). (23) 7

The estimator (22) is the Jones and Sheather (1991) estimator. Hansen (24) descries a multiple plug-in method for selection of a. We recommend this choice and hold it fixed for the remainder of this su-section and the next. Now consider estimation of V 3 allowing for dependence on x. Note that the estimate of f (y) in (24) is a regression of φ (1) a (y Y i) on a constant. Similarly, we can estimate f (y x) as the intercept in a polynomial regression on x i x. For p definedin(11) let The associated estimate of V 3 is ˆV 3 ˆf a (y x)2 dy Z δ 1 1 p p p δ 1 ˆf a (y x) δ 1 1 p p p φ (1) a (y Y ). (24) p p p φ (2) a φ (1) a (y Y ) φ(1) y Y dy p 1 p p δ1 a Y Y p p p 1 δ1. (25) Alternatively, we could use a fully nonparametric estimator such as a local linear estimator. However, the optimal andwidths depend on derivatives of the conditional density that are difficult to estimate. We thus do not consider further complications. In addition, using an analog of (16) and (24), a simple estimator of V 2 is ˆV 2 ˆf (2) a (y x) ˆf a (y) dy Z δ 3 1 p p p φ a (y Y ) φ a y Y dy p 1 p p δ1 where the third equality uses (23). δ 3 p p p φ a Y Y p p p 1 δ1. (26) 4.5 Plug-In Bandwidth Our plug-in estimate of (8) is M (h, ) R ³Ṽ hψ + 4 Ṽ 1 2 h2 2 ˆV 2 + h4 ˆV 3 ĝn 4 2 4. (27) The plug-in andwidth pair ( h, ) jointly minimize M (h, ). A closed-form solution is not availale, so the function needs to e numerically minimized. One caveat is that we must constrain h<ψ/ṽ to ensure a sensile solution. (When h>ψ/ṽ then M is strictly increasing in so unconstrained minimization sets which is not sensile.) As a practical matter it makes sense to ound h more sustantially, we suggest the constraint h ψ/2ṽ,andthisisdoneinournumericalapplications. The plug-in andwidth for the unsmoothed estimator F is found y setting h and minimizing 8

to find ˆ ³ RṼ/ĝṼ 1 n 1/6. To summarize, the computation algorithm is as follows. First, ĝ is calculated from (9) using (1). Second, for some p 4, ˆV, ˆV 1 and ˆV 4 are calculated using (13), (17) and (2). Third, using the andwidth (15), Ṽ is calculated using (14). Fourth, using the andwidth (21), Ṽ 1 is calculated using (19). Fifth, the andwidth a is calculated using the method of Hansen (24), allowing ˆV 2 and ˆV 3 to e calculated using (26) and (25). Finally, ĝ, Ṽ,Ṽ 1, ˆV 2, ˆV 3 are set into (27) which is numerically minimized over (h, ) to otain ( h, ). 5 Simulation The performance of the CDF estimators is investigated in a simple simulation experiment using ivariate data. The variales i are generated as iid N(, 1). The variale Y i is generated as Y i i + e i where e i is iid, independent of i. The error e i is distriuted as G, one of the first nine mixture-normal distriutions from Marron and Wand (1992). Consequently, the true CDF for Y i is G(y x). Foreach choice of G we generated 1, independent samples of size n 5, 1 and 3. ForeachsampleweestimatedtheCDFatx 1 and y { 2., 1.5, 1.,.5,.,.5, 1., 1.5, 2.}, using the NW, SNW, LL and SLL estimators. Our measure of precision is the exact mean-squared error, averaged over these values of y (as an approximation to the MISE). The standard normal kernel was used for all estimates. We first astracted from the issue of andwidth selection and evaluated the precision of the four estimates using the infeasile finite-sample-optimal andwidth, calculated y quasi-newton minimization. The results are presented in the first three columns of Tale 1. Wehavetakentheunsmoothed NW estimator as a aseline, and have divided the MISE of the other estimators y that of the NW estimator. Thus the numers in Tale 1 represent the relative improvement attained relative to the NW estimator. The results show that across models and sample sizes, the SLL estimator achieves the lowest MISE. In each case, using local linear estimation rather than local averaging reduces the MISE, and using smoothing lowers the MISE. We next investigated the performance of our plug-in andwidths ˆ for the LL estimator and ( h, ) for the SLL estimator. On each sample descried aove we calculated these plug-in andwidths and CDF estimators, and evaluated the MISE across the 1, samples. The results are presented in the final two columns of Tale 1. As efore, the MISE is normalized y the MISE of the infeasile NW estimator. Thus the numers represented the relative improvement attained y the feasile plug-in estimators relative to this infeasile estimator. In each case, the plug-in SLL estimator has lower MISE than the plug-in LL estimator. In some cases the difference is minor, in other case it is quite sustantial (the largest gains are for Model #5, where the MISE is reduced y aout 25%). In each case the MISE is higher than if the infeasile optimal andwidth had een used, in some cases sustantially higher, suggesting that further improvements in andwidth choice may e possile. 9

Overall, the simulation evidence confirms the implications of the asymptotic analysis. We can recommend use of the SLL estimator (with our proposed plug-in andwidth) for empirical practice. 6 GDP Growth Forecasting We illustrate the method with a simple application to the U.S. real GDP growth rate. Let GDP i denote the level of quarterly real GDP and let y i 4(ln(GDP i ) ln (GDP i 1 )) denote the annualized quarterly growth rate. Set x i y i 1. We restrict the sample to run from first third quarter of 1984 to the first quarter of 24 (79 oservations). This choice is consistent with the decline in output volatility documented y McConnell and Perez-Quiros (2). We estimated the CDF using the local linear (LL) and smoothed local linear (SLL) estimators for y i given x i x, fixing x to equal 2% and 4.5%, which are approximately the 25 th and 75 th quantiles of the unconditional distriution. The CDF was evaluated at 51 evenly spaced increments etween -2 and 7, and the andwidths were calculated y the plug-in method descried in Section 4.5. To provide a contrast, we also estimated the CDF using a homoskedastic Gaussian AR(1). Figure 1 displays the three CDFs for the case of x 2. and Figure 2 for the case x 4.5. In oth cases, the three estimates are meaningfully different from one another. The parametric estimate has a fairly different location and variance. The two nonparametric estimates have the same asic shape, ut the unsmoothed local linear estimate is visually quite erratic, and the smoothed local linear estimator appears to apply an intuitively correct degree of smoothing. It is also interesting to note that the LL estimator uses a plug-in andwidth of ˆ 1.59 in Figure 1 and ˆ 2.13 in Figure 2, while the SLL estimator uses 1.44, h.53 and 2. and h.23 respectively. In oth cases, the plug-in rule selects similar values for for the two estimators (with that for SLL slightly smaller), and selects a consideraly smaller value for h than. 7 Conclusion We have shown how to comine local linear methods and smoothing to produce good nonparametric estimates of the conditional distriution function for ivariate data. We have shown that meaningful improvements in estimation efficiency can e achieved y using these techniques. Furthermore, we have derived data-ased plug-in rules for andwidth selection which appear to work in practice. These tools should prove useful for empirical application. This analysis has een confined to ivariate data. An important extension would to the case of multivariate data. In particular, this extension would require the study and development of feasile andwidth selection rules. 1

8 Appendix: Proof of Theorem 1 The proof of Theorem 1 is ased on the following set of expansions. Lemma 1 Let g (s) ds dx s g(x) and K h (Y i)k h (y Y i ) F (y x). Ew i g(x)+ 2 2 g(2) (x)+o( 4 ) (28) E (w i (x i )) 2 g (1) (x)+o( 4 ) (29) E ³w 2 i (x i ) 2 g(x)+o( 4 ) (3) E (w i Kh(Y i )) 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x) (31) + h2 2 f (y x) g(x)+o 4 E (w i (x i ) Kh(Y i )) 2 F (1) (y x) g(x)+o 4 (32) E w 2 i K h(y i ) 2 Rg(x)(F (y x)(1 F (y x)) hψf (y x)) +O () (33) E ³w 2 i 2 (x i ) 2 Kh(Y i ) O() (34) Proof. Throughout, we make use of the assumption that h O() to simplify the ounds. The derivations make use of the following Taylor series expansions which are valid under the assumption that the functions are fourth order differentiale: g(x v) g(x) vg (1) (x)+ v2 2 2 g(2) (x) v3 3 6 g(3) (x)+o( 4 ) (35) F (y hu v) F (y v) huf (y v)+ h2 u 2 2 f (y v) h3 u 3 6 f (y v)+o(h 4 ) (36) F (y x v) F (y x) vf (1) (y x)+ 2 v 2 2 F (2) (y x) 3 v 3 6 F (3) (y x)+o( 4 ) (37) f (y x v) f (y x) v y f (1) (y x)+o( 2 ). (38) To show (28), y a change of variales and (35) Ew i 1 w µ x v g(v)dv w (v) g(x v)dv w (v) µg(x) vg (1) (x)+ v2 2 2 g(2) (x) v3 3 6 g(3) (x) dv + O( 4 ) g(x)+ 2 2 g(2) (x)+o( 4 ). 11

Note that this makes use of the facts that R w (v) dy 1, R w (v) v, R w (v) v2 1 and R w (v) v3. Similarly E (w i (x i )) (x v) 1 µ x v w vw (v) g(x v)dv g(v)dv vw (v) µg(x) vg (1) (x)+ v2 2 2 g(2) (x) dv + O( 4 ) 2 g (1) (x)+o( 4 ) yielding (29) and E ³w 2 i (x i ) (x v) 2 1 w 2 2 g(x)+o( 4 ) µ x v v 2 w (v) g (x v) dv g(v)dv yielding (3). To show (31) and(32),first oserve that y integration y parts, a change of variales and (36) K h (y u) f (u v) du +O(h 4 ) k h (y u) F (u v) du k (u) F (y hu v) du k (u) µf (y v) huf (y v)+ h2 u 2 2 f (y v) h3 u 3 6 f (y v) du F (y v)+ h2 2 f (y v)+o(h 4 ). (39) 12

By a change of variales, and using (35) and (37), F (y v) 1 µ x v w g(v)dv F (y x v) g(x v)w (v) dv µf (y x) vf (1) (y x)+ 2 v 2 2 F (2) (y x) 3 v 3. µg(x) vg (1) (x)+ 2 v 2 2 g(2) (x) v3 3 6 g(3) (x) w(v)+o( 4 ) 6 F (3) (y x) F (y x) g(x)+ 2 2 F (y x) g(2) (x)+ 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x)+o( 4 ). (4) Similarly, y a change of variales and (35) and (38), f (y v) 1 µ Z x v w g(v)dv f (y x v) g(x v)w(v)dv f (y x) g(x)+o( 2 ). (41) Together, (28), (39), (4) and (41) show that which is (31). E (w i Kh(Y i )) E (w i K h (Y i )) E (w i ) F (y x) µ 1 x v w K h (y u) f (u v) g(v)dudv µ g(x)+ 2 2 g(2) (x) F (y x)+o( 4 ) F (y v) 1 µ µ x v w g(v)dv g(x)+ 2 2 g(2) (x) F (y x) g(v)dv + O( 4 ) + h2 2 f (y v) 1 w µ x v h2 2 f (y x) g(x)+ 2 2 F (2) (y x) g(x)+ 2 F (1) (y x) g (1) (x) +O 4 13

Similarly, using (29), (39) and similar derivations, E (w i (x i ) Kh(Y i )) E (w i (x i ) K h (Y i )) E (w i (x i )) F (y x) (x v) 1 µ x v w K h (y u) f (u v) g(v)dvdu + 2 g (1) (x)f (y x)+o( 4 ) vf (y x v) g(x v)w (v) dv + h2 2 vf (y x v) g(x v)w (v) dv + 2 g (1) (x)f (y x)+o( 4 ) 2 F (1) (y x) g(x)+o 4 which is (32). To show (33) first oserve that y a change of variales and (35), E Z wi 2 µ 1 x v 2 2 w g (v) dv 1 w (v) 2 g (x v) dv 1 ³ w (v) 2 g(x) vg (1) (x) dv + O() Rg(x) + O(). (42) Second, using (39), a change of variales, (35) and (37), E wi 2 K h (Y i ) µ 1 x v 2 w K h (y u) f (u v) g(v)dvdu µ 1 x v 2 µ h 2 2 w g(v)f (y v) dv + O 1 1 w (v) 2 g(x v)f (y x v) dv + O () ³ ³ w (v) 2 g(x) vg (1) (x) F (y x) vf (1) (y x) dv + O( 2 ) Rg(x)F (y x) + O(). (43) 14

Third, y integration y parts, a change of variales, and (36) K h (y u) 2 f (u v) du 2 K h (y u) k h (y u) F (u v) du 2 2 K (u) k (u) F (y hu v) du K (u) k (u)(f (y v) huf (y v)) du + O(h 2 ) F (y v) hψf (y v)+o(h 2 ). (44) The final equality uses the definition (1) and the fact that the symmetry of k(u) aout zero implies 2K (u) k (u) du. Fourth, using (44), (35), (37) and (38), E wi 2 K h (Y i ) 2 µ 1 x v 2 2 w K h (y u) 2 f (u v) g(v)dvdu µ 1 x v 2 µ h 2 2 w (F (y v) hψf (y v)) g(v)dv + O 1 w (v) 2 (F (y x v) hψf (y x v)) g(x v)dv + O () Rg(x)(F (y x) hψf (y x)) + O() (45) Together, (42), (43) and (45) show E wi 2 Kh(Y i ) 2 E wi 2 K h (Y i ) 2 2E wi 2 K h (Y i ) F (y x)+e wi 2 F (y x) 2 Rg(x)(F (y x) hψf (y x)) 2Rg(x)F (y x)2 + Rg(x)F (y x) 2 + O () Rg(x)(F (y x)(1 F (y x)) hψf (y x)) + O () which is (33). (34) is shown similarly. ProofofTheorem1:From equations (29)-(3), Eˆβ x ³ E ³w i (x i ) 2 1 (E (wi (x i ))) + O 2 g(x) 1 g (1) (x)+o 2. 15

Comined with (28), this yields Next, using equations (31) and(32), Together with (46) we find Ewi Ew i EˆβE (w i (x i )) + O( 2 ) (46) f(x)+o( 2 ). E (w i K h(y i )) E (w i K h(y i )) EˆβE (w i (x i ) K h(y i )) + O 4 2 2 F (2) (y x) g(x)+ h2 2 f (y x) g(x)+o 4. ³ E F h, (y x) F (y x) Ew i K h (y Y i) + O 4 1 2 Ew i ³ 2 F (2) (y x)+h 2 f (y x) + O 4. Similarly, using (33) and (34) Var³ F h, (y x) 1 Var(wi K h (y Y i)) n (Ewi )2 µ 1 E (w i Kh (y Y i)) 2 n g(x) 2 + O n R (F (y x)(1 F (y x)) hψf (y x)) g(x)n + O µ O ³n 6/5. n Together, these expressions and the assumption that cn 1/5 estalish that ³ 2 E F h, (y x) F (y x) R (F (y x)(1 F (y x)) ψhf (y x) f(x)) g(x)n + h2 2 f (y x) F (2) (y x) 2 + 4 f (y x) F (2) (y x) 2 4 + h4 f (y x) 2 4 + O ³n 6/5. Integrating over y we otain ³ Z R ³ 2 R F (y x)(1 F (y x)) dy ψh R f (y x) dyf(x) E F h, (y x) F (y x) dy g(x)n + h4 f (y x) 2 dy + h2 2 f (y x) F (2) (y x) dy 4 2 + 4 F (2) (y x) 2 dy + O ³n 6/5 4 16

which equals the expression in (8) using the equivalence f (y x) F (2) (y x) dy f (y x) f (2) (y x) dy which holds y integration y parts. This completes the proof. 17

Tale 1: Normalized MISE Conditional Distriution Function Estimators n 5 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian.76.7.56.79.67 Skewed Unimodal.8.64.53.69.6 Strongly Skewed.89.85.78.88.82 Kurtotic Unimodal.82.77.66.81.72 Outlier.65.69.49.99.76 Bimodal.76.71.55.84.72 Separated Bimodal.93.82.77.95.92 Asymmetric Bimodal.76.68.52.8.68 Trimodal.78.72.56.86.74 n 1 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian.8.67.58.71.63 Skewed Unimodal.83.63.54.66.58 Strongly Skewed.84.89.84.92.88 Kurtotic Unimodal.85.76.68.81.73 Outlier.61.61.44 1.6.8 Bimodal.9.66.55.72.64 Separated Bimodal.96.8.77.84.81 Asymmetric Bimodal.81.64.52.69.62 Trimodal.83.68.57.74.66 n 3 Optimal Bandwidth Plug-In Bandwidths Error Density SNW LL SLL LL SLL Gaussian.86.64.59.66.61 Skewed Unimodal.88.6.55.62.58 Strongly Skewed.97.93.91 1.2 1.16 Kurtotic Unimodal.87.73.68.85.79 Outlier.58.51.39.97.75 Bimodal.88.61.56.63.59 Separated Bimodal.97.78.76.81.8 Asymmetric Bimodal.89.59.52.61.57 Trimodal.95.62.65.65.61 18

References [1] Fan, Jianqing and Irene Gijels (1996): Local Polynomial Modelling and Its Applications: NewYork: Chapman and Hall. [2] Fan, Jianqing and Qiwei Yao (23): Nonlinear Time Series: Nonparametric and Parametric Methods. NewYork:Springer. [3] Hall, Peter, Rodney C. L. Wolff, andqiweiyao(1999): Methods for Estimating a Conditional Distriution Function, Journal of the American Statistical Association, 94, 154-163. [4] Hansen, Bruce E. (24): Bandwidth Selection for Nonparametric Distriution Estimation, manuscript, University of Wisconsin. [5] Jones, M.C. and S. J. Sheather (1991): Using non-stochastic terms to advantage in kernel-ased estimation of integrated squared density derivatives, Statistics and Proaility Letters, 11, 511-514. [6] Marron, J.S. and M.P. Wand (1992): Exact mean integrated squared error, Annals of Statistics, 2, 712-736. [7] McConnell, M.M. and G. Perez-Quiros (2): Output fluctuations in the United States: What has changed since the early 198s? American Economic Review, 9, 1464-1476. 19