Nonparametric Regression

Similar documents
Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Nonparametric Density Estimation

Nonparametric Econometrics

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

Regression #3: Properties of OLS Estimator

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Nonparametric Methods

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Regression #4: Properties of OLS Estimator (Part 2)

Nonparametric Regression. Changliang Zou

Time Series and Forecasting Lecture 4 NonLinear Time Series

Test for Discontinuities in Nonparametric Regression

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Function of Longitudinal Data

Bickel Rosenblatt test

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Econ 582 Nonparametric Regression

A nonparametric method of multi-step ahead forecasting in diffusion processes

Transformation and Smoothing in Sample Survey Data

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Nonparametric Inference via Bootstrapping the Debiased Estimator

Introduction to Nonparametric Regression

41903: Introduction to Nonparametrics

Geographically Weighted Regression as a Statistical Model

An introduction to nonparametric and semi-parametric econometric methods

ECO Class 6 Nonparametric Econometrics

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Nonparametric Modal Regression

4 Nonparametric Regression

Local Polynomial Modelling and Its Applications

F9 F10: Autocorrelation

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Modelling Non-linear and Non-stationary Time Series

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Section 7: Local linear regression (loess) and regression discontinuity designs

Local Polynomial Estimation for Sensitivity Analysis on Models With Correlated Inputs

LOCAL POLYNOMIAL REGRESSION ON UNKNOWN MANIFOLDS. Department of Statistics. University of California at Berkeley, USA. 1.

Linear models and their mathematical foundations: Simple linear regression

Gibbs Sampling in Linear Models #1

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

Regression #8: Loose Ends

Regression Discontinuity Designs

Simple and Efficient Improvements of Multivariate Local Linear Regression

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41

Nonparametric Regression. Badr Missaoui

Additive Isotonic Regression

Nonparametric Estimation of the Marginal Effect in Fixed-Effect Panel Data Models

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 02 Linear classification methods I

Linear IV and Simultaneous Equations

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

On the Robust Modal Local Polynomial Regression

Confidence intervals for kernel density estimation

Kernel density estimation

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

The Simple Linear Regression Model

Error distribution function for parametrically truncated and censored data

Section Properties of Rational Expressions

A New Method for Varying Adaptive Bandwidth Selection

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

Day 4A Nonparametrics

Chapter 2: simple regression model

Regression I: Mean Squared Error and Measuring Quality of Fit

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

On a Nonparametric Notion of Residual and its Applications

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Nonparametric Small Area Estimation Using Penalized Spline Regression

Titolo Smooth Backfitting with R

3 Nonparametric Density Estimation

SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL

A Design Unbiased Variance Estimator of the Systematic Sample Means

SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES

Convergence rates for uniform confidence intervals based on local polynomial regression estimators

4. Nonlinear regression functions

Solving a Series. Carmen Bruni

Stat 5101 Lecture Notes

On Fractile Transformation of Covariates in Regression 1

Department of Statistics Purdue University West Lafayette, IN USA

Working Paper No Maximum score type estimators

Generalized Additive Models

Nonparametric Econometrics in R

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Estimation of Treatment Effects under Essential Heterogeneity

Transcription:

Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31

Consider the univariate nonparametric regression model: where y i and x i are scalars, for simplicity. Note that the marginal density for x might be obtained as: Similarly, the joint density for x and y could be estimated (simply) as: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 2 / 31

The function of interest, The denominator can be estimated, as shown on the last slide. As for the numerator, substitute in our estimator for the joint density, and obtain (assuming a symmetric, mean-zero kernel): Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 3 / 31

Thus, we have all the pieces we need to obtain a nonparametric estimator (called the Nadaraya-Watson estimator) of the conditional mean. Noting that the bandwidth terms cancel in the ratio, we obtain: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 4 / 31

Intuition Let us now try and justify this estimator in an intuitive way, much like we did for the case of nonparametric density estimation. Suppose that x is discrete-valued and we observe n 0 points with x = x 0. In this case, we might use the sample average as a consistent estimate of the conditional mean function at x 0 : This technique works great, of course, if x is discrete-valued. However, if x is continuous, the above will not work - we will never observe n 0 points for which x = x 0. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 5 / 31

Intuition To remedy this problem, we can average those y s for which the x s fall in some interval around x 0. We can then replace ˆm(x 0 ) = Ê(y x = x 0) with the sample average of the y s falling in this region: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 6 / 31

Intuition In the previous estimator, we placed equal weight on all the points in the interval, and zero weight on points outside the interval. More generally, we might replace the indicator function above with a continuous weight function: where, as before, K is a mean-zero symmetric density function. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 7 / 31

Under certain regularity conditions, we can establish pointwise consistency of the constant kernel estimator: ˆm(x 0 ) p m(x 0 ) To save time we defer the proof, although it follows similarly to the proof for the kernel density estimator. The method above generalizes to higher dimensions: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 8 / 31

Local Polynomial Regression We can consider this problem yet another way, which will lead to an estimator that improves upon the N-W-estimator. As with least-squares, we might wish to minimize the objective function: To this end, we take a second-order expansion of the regression function about the point x 0 : Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 9 / 31

Local Polynomial Regression We then substitute this expansion into our objective function and include a kernel-weighting term: From the above, the kernel-weighting term weights the points closer to x 0 more heavily than the terms farther away from x 0. (like weighted least-squares). Let α 0 = m(x 0 ), α 1 = m (x 0 ), and α 2 = m (x 0 ). Then we re-write the objective function as: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 10 / 31

Local Polynomial Regression We can stack this problem in matrix form: min α y 1 y 2. y n (x 1 (x 1 x 0 ) 1 x 0 ) 2 2 (x 1 (x 2 x 0 ) 2 x 0 ) 2 2....... (xn x 1 (x n x 0 ) 0 ) 2 2 α 0 α 1 α 2 ( ) x1 x K 0 0 0 hn ( ) x2 x 0 K 0 0 hn........ 0 0 K ( xn x0 hn ) y 1 y 2. y n which is equivalent to: (x 1 (x 1 x 0 ) 1 x 0 ) 2 2 (x 1 (x 2 x 0 ) 2 x 0 ) 2 2....... 1 (x n x 0 ) (xn x 0 ) 2 2 α 0 α 1. α 2 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 11 / 31

Local Polynomial Regression This objective function is just like the objective function which yields the GLS estimator. Thus, we have: The (1,1) element gives an estimate of the CMF at x 0, the (2,1) element gives an estimate of the marginal effect at x 0. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 12 / 31

Local Polynomial Regression So, here s the procedure for fitting a regression model via Local Polynomial regression. 1 Select a bandwidth h n and kernel K 2 Pick a set of points at which to evaluate the CMF (3 σ rule perhaps). This could also be all the x i. 3 For each point, compute ˆα as above. You can plot the (1, 1) elements to plot the conditional mean function. (Often, the (2,1) elements are of primary interest; marginal effects mimic regression coefficients.) 4 What happens if you just approximate m(x i ) with a constant α 0? Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 13 / 31

Other points to note: The above also generalizes to the multivatiate case. However, there is the curse of dimensonality - the rate of convergence slows down with the dimension of the problem: nh d n (assuming a common bandwidth is employed in all dimensions). How can you pick the bandwidth h n and the kernel K? Standard errors for the above point estimates are rather involved and difficult to compute. Bootstrapping is a possibility, though the bootstrap should correct for the bias of the estimator. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 14 / 31

It s an odd world There is a preference for odd-order fits. Let p be the order of the series expansion (ex. 1 = linear, 2 = quadratic) and v be the order of the derivative we seek to estimate. Then Ruppert and Wand (1994 Ann Stat) show that bias is reduced and performance at the boundary is improved by setting p v to be odd. This suggests a preference for local linear regression when estimating the conditional mean function. To get around the curse of dimensionality, some specify a nonparametric part for only one/few elements of x: This is called a partially linear or semilinear model. See Robinson (1988 Econometrica) or Yatchew (1997 Economics Letters) for estimation procedures. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 15 / 31

Bandwidth Selection Fan and Gijbels (1995) derive and optimal bandwidth rule. They consider an asymptotic weighted mean integrated squared error criterion: where m v (x) is the v th derivative of the CMF which we are interested in, and w(x) is a weighting function. They show the bandwidth which minimizes this criterion is of the form: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 16 / 31

Bandwidth Selection In the above, σ 2 (x) E[(y m(x)) 2 ], w(x) w 0 (x)f (x), C p,v (K) is a constant which depends on the expansion order (p), order of the derivative (v), and kernel K. Finally, m p+1 is the p + 1 th derivative of the unknown function m. This can be estimated as: [ hn ˆσ 2 1/(2p+3) w 0 (x)dx C p,v (K) n i=1 ( ˆmp+1 (x i )) 2. w 0 (x i )] We can obtain ˆσ 2 and ˆm p+1 by running a linear regression of y on x, x 2, x p+3. A starting choice for w 0 may be 1. Finally, note that there are a variety of other bandwidth selectors used in practice (e.g., cross-validation or AIC c [Hurvich et. al (1998)] ). Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 17 / 31

1.2 1.2 1 1 0.8 0.8 0.6 η =.00001 0.6 0.4 True Curve 0.4 True Curve 0.2 0.2 0 0 0.2 η =.15 0.2 η = 100 0.4 2 1 0 1 2 0.4 2 1 0 1 2 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 18 / 31

Generalizing to Multiple Regression Consider, for illustration, the case of a bivariate nonparametric regression problem: As before, we can take a first-order approximation of the regression function: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 19 / 31

Subbing this back into our objective function produces: We can then formulate a weighted least-squares type objective function, as before: where K is a 2 dimesnional kernel, H is the bandwidth or smoothing matrix, and Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 20 / 31

Partially Linear Models Consider a model of the form: This model is often called a semilinear or partially linear model. Here, we assume that the z s (which can be large in number) enter in a linear fashion, and the x, still assumed a scalar, enters nonparametrically. Two questions naturally arise: How should we estimate m? What about β? Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 21 / 31

Partially Linear Models Robinson s (1998 Econometrica) Estimator: Given the above specification, note: This implies: If we knew each of the conditional mean functions, we could just run a least squares regression. Robinson s idea is to estimate each of these CMF s nonparametrically, as we have discussed. Thus, we can estimate β by running least squares using the following regression: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 22 / 31

Partially Linear Models 1 Since β converges at the standard parametric rate (we can show this), we can ignore the fact that it is estimated (asymptotically) when deriving confidence intervals for m. (Some focus on β as the parameter of interest). 2 This procedure can be quite computationally intensive, since we need to perform k z + 2 nonparametric regressions in total, where k z is the number of variables in Z. 3 This estimator is asymptotically efficient. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 23 / 31

Partially Linear Models An alternate method has been suggested by Yatchew (1997 Economics Letters). He suggests the use of differencing to eliminate the unknown function m. Note that for a continuous m with x i x j : This intuition suggests the following simple estimator: Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 24 / 31

Partially Linear Models 1 Sort the data by ascending values of X. 2 Take adjacent differences of the sorted data, and estimate β by an OLS regression of the differenced y s on the differenced z s. 3 Given ˆβ, estimate the unknown function m pointwise using local linear regression of y z ˆβ on x, or from an alternate nonparametric estimation procedure. Over a compact support, and under certain regularity conditions on m the differencing technique asymptotically purges the model of the nonparametric component m, and consistent estimates of β are obtained. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 25 / 31

Partially Linear Models Yatchew (1997, 1998) describes how higher-order optimal differencing can be applied to estimate β, and approach the efficiency of Robinson s estimator as the order of differencing gets large. Note that this estimator only requires one nonparametric regression! Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 26 / 31

Smooth Coefficient Models (Li et al (2002, JBES)) Consider the following smooth coefficient model pause y i = α(z i ) + x i β(z i ) + ɛ i = X i δ(z i ), where X i = [1 x i ], δ(z i) = [α(z i ) β(z i ) ]. We can think of β(z i ) as a vector of (smooth) coefficient that depend on z. The standard partially linear model follows as β(z i ) = β. Let z be q 1 and x be p 1. (Typically, think of both q and z being equal to 1). They suggest the following estimator: ˆδ(z 0 ) = (nh q ) 1 n j=1 X jy j K ( ) zj z 0 h (nh q ) 1 n j=1 X jx j K ( zj z 0 h ). Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 27 / 31

Smooth Coefficient Models (Li et al (2002, JBES)) Intuition: This is like a weighted least squares rule. Suppose that z is a scalar, and assume that we are using a uniform kernel: { 1/2h if zj z K(x) = 0 h 0 otherwise Under this rule, we can see that ˆδ(z 0 ) = j: z j z 0 <h X j X j 1 j: z j z 0 <h X j y j. This is the least squares estimator of the intercept and slopes, using only those data points for which z j is close to z 0. Doing this over a grid of z 0 values will enable us to piece together the intercept function (as a function of z) and the slope coefficients (also as a function of z). Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 28 / 31

Tests Against Parametric Alternatives Li et al also provide a way to test against parametric alternatives. They consider a parametric version of the model: y i = X i δ 0 (z i ) + ɛ i, with δ 0 (z i ) being a particular parametric function of z, for example: y i = α 0 + γ 0 z i + x i β + ɛ i imples that X i = [1 x i ], δ 0 (z) = [(α 0 + z i γ 0 ) β 0 ]. We would like to test H 0 : H A : δ(z) δ 0 (z) a.e. = 0 δ(z) δ 0 (z) 0 on a set with positive measure Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 29 / 31

Tests Against Parametric Alternatives They propose the test statistic: Î n = (n 2 h q ) 1 i j i X i X jˆɛ i ˆɛ j K ( zi z j h ), where They also show where ˆɛ i = y i X i ˆδ0 (z i ). J n = nhq/2 Î n ˆσ 0 N(0, 1), ˆσ 2 0 = 2(n 2 h q ) 1 i j i ( ) ˆɛ 2 i ˆɛ 2 j K 2 zi z j. h Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 30 / 31

Tests Against Parametric Alternatives Notes: A rule of thumb for the bandwidth choice is h l = z l,sd n 1/(4+q), where z l,sd is the sample standard deviation of z l. Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 31 / 31