Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Similar documents
Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

4 Nonparametric Regression

Nonparametric Regression. Changliang Zou

Modelling Non-linear and Non-stationary Time Series

Nonparametric Density Estimation (Multidimension)

Time Series and Forecasting Lecture 4 NonLinear Time Series

Nonparametric Methods

Local Polynomial Regression

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Econ 582 Nonparametric Regression

41903: Introduction to Nonparametrics

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Nonparametric Econometrics

Introduction to Regression

Nonparametric Regression

Additive Isotonic Regression

3 Nonparametric Density Estimation

Bandwidth selection for kernel conditional density

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Single Index Quantile Regression for Heteroscedastic Data

Introduction to Regression

Introduction to Regression

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Single Index Quantile Regression for Heteroscedastic Data

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Nonparametric Regression. Badr Missaoui

Section 7: Local linear regression (loess) and regression discontinuity designs

Introduction to Regression

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

12 - Nonparametric Density Estimation

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Smooth simultaneous confidence bands for cumulative distribution functions

Tobit and Selection Models

18 Bivariate normal distribution I

Advanced Econometrics I

Multivariate Random Variable

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Day 4A Nonparametrics

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Multivariate Regression

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Goodness-of-fit tests for the cure rate in a mixture cure model

Nonparametric Inference via Bootstrapping the Debiased Estimator

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

Error distribution function for parametrically truncated and censored data

Regression Discontinuity Design Econometric Issues

Stat 5101 Lecture Notes

Nonparametric Density Estimation

Kernel Density Estimation

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

UNIT Define joint distribution and joint probability density function for the two random variables X and Y.

Lecture 14 Simple Linear Regression

A COMPARISON OF HETEROSCEDASTICITY ROBUST STANDARD ERRORS AND NONPARAMETRIC GENERALIZED LEAST SQUARES

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Gaussian Process Regression

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Semiparametric modeling and estimation of the dispersion function in regression

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Adaptive Kernel Estimation of The Hazard Rate Function

Continuous Random Variables

Multiple Random Variables

Regression Models - Introduction

ECON 4160, Autumn term Lecture 1

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Test for Discontinuities in Nonparametric Regression

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Introduction An approximated EM algorithm Simulation studies Discussion

The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error

Lecture 6: Discrete Choice: Qualitative Response

Semiparametric estimation of covariance matrices for longitudinal data

Variance Function Estimation in Multivariate Nonparametric Regression

Spatiotemporal Anatomical Atlas Building

Titolo Smooth Backfitting with R

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

The Multivariate Normal Distribution 1

Local Polynomial Modelling and Its Applications

Bootstrap of residual processes in regression: to smooth or not to smooth?

Chapter 2: Resampling Maarten Jansen

Simple and Efficient Improvements of Multivariate Local Linear Regression

Optimal global rates of convergence for interpolation problems with random design

Kernel density estimation

Classification. Chapter Introduction. 6.2 The Bayes classifier

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

Simple Linear Regression

Transcription:

Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured as: AGE SCHOOL 6) and experience squared. E(Y SCHOOL, EXP) = β +β 2 SCHOOL+β 3 EXP+β 4 EXP 2. coefficient estimates for the wage equation: Dependent Variable: Log Wages Variable Coefficients S.E. t-values SCHOOL 0.0898 0.0083 0.788 EXP 0.0349 0.0056 6.85 EXP 2 0.0005 0.000 4.307 constant 0.5202 0.236 4.209 R 2 = 0.24, sample size n = 534 Table : Results from OLS estimation CPS 985, n = 534 Berndt (99) Introduction -3 Introduction -4 Wage <-- Schooling Wage <-- Experience Wage <-- Schooling, Experience 0.5.5 5 0 5 0 0. 0.2 0.3 0.4 0.5 0 0 20 30 40 50 4.2 Experience 27.5 3.8 2.3.9.5 6.0 0.0 Schooling 4.0 Figure : wage-schooling profile and wage-experience profile R: SPMcps85lin Figure 2: Parametrically estimated regression function R: SPMcps85lin

Introduction -5 Introduction -6 Wage <-- Schooling, Experience Linear regression E(Y SCHOOL,EXP) = const+β SCHOOL+β 2 EXP 2.4 Nonparametric Regression 2.0 m(.) is a smooth function E(Y SCHOOL, EXP) = m(school, EXP) 4.2 Experience 27.5 3.8.7 6.0 0.0 Schooling 4.0 Figure 3: Nonparametrically estimated regression function R: SPMcps85reg Introduction -7 Introduction -8 Engel Curve Example: Engel curve Engel (857)..., daß je ärmer eine Familie ist, einen desto größeren Anteil von der Gesammtausgabe muß zur Beschaffung der Nahrung aufgewendet werden. (The poorer a family, the bigger the share of total expenditures that has to be used for food.) 0 0.2 0.4 0.6 Figure 4: Engel curve, U.K. Family Expenditure Survey 973 R: SPMengelcurve2

Introduction - 3 Introduction - 4 Example: Wage equation Nonparametric regression E(Y SCHOOL, EXP) = m(school, EXP) Semiparametric Regression E(Y SCHOOL,EXP) = α+g (SCHOOL)+g 2 (EXP) Y - -0.5 0 Wage <-- Schooling Y -0.4-0.2 0 0.2 Wage <-- Experience g (.), g 2 (.) are smooth functions 5 0 5 X 0 0 20 30 40 50 X Figure 8: Additive model fit vs. parametric fit, wage-schooling (left) and wage-experience (right) profiles R: SPMcps85add Introduction - 5 Introduction - 6 Wage <-- Schooling, Experience Example: Binary choice model if person imagines to move to west, Y = 0 otherwise. 0.4 0. E(Y x) = P(Y = x) = G(β x) 4.2 Experience 27.5 3.8-0. 6.0 0.0 Schooling 4.0 typically: logistic link function (logit model) E(Y x) = P(Y = x) = +exp( β x) Figure 9: Surface plot for the additive model R: SPMcps85add

Introduction - 7 Introduction - 8 Logit Model Link Function, Responses Heteroscedasticity problems binary choice model with Var(ε) = 4 where u has a (standard) logistic distribution { +(β x) 2} 2 Var(u), -3-2 - 0 2 Index Figure 0: Logit model for migration R: SPMlogit Introduction - 9 Introduction - 20 Wrong link consequences True versus Logit Link G(Index) -4-3 -2-0 2 3 4 Index Sampling Distribution 0 0.2 0.4 0.6 True Ratio vs. Sampling Distribution -.5.5 True+Estimated Ratio Figure : The link function of a homoscedastic logit model (thin) vs. a heteroscedastic model (solid) R: SPMtruelogit Figure 2: Sampling distribution of the ratio of estimated coefficients and the ratio s true value R: SPMsimulogit

Introduction - 2 Introduction - 22 Link Function, Responses Single Index Model -3-2 - 0 2 Index Summary: Introduction Parametric models are fully determined up to a parameter (vector). They lead to an easy interpretation of the resulting fits. Nonparametric models only impose qualitative restrictions like a smooth density function or a smooth regression function m. They may support known parametric models. They open the way for new models by their flexibility. Semiparametric model combine parametric and nonparametric parts. This keeps the easy interpretation of the results, but gives more flexibility in some aspects of the model. Figure 3: Single index model for migration R: SPMsim Nonparametric Regression 4- Nonparametric Regression {(X i,y i )}, i =,...,n; X R d,y R Engel curve: X = net-income, Y = expenditure Y = m(x)+ε CHARN model: time series of the form Y t = m(y t )+σ(y t )ξ t Nonparametric Regression 4-2 Univariate Kernel Regression model Y i = m(x i )+ε i, i =,...,n m( ) smooth regression function, ε i i.i.d. error terms with Eε i = 0 we aim to estimate the conditional expectation of Y given X = x m(x) = E(Y X = x) = y f(y x) dy = y f(x,y) f X (x) dy where f(x,y) denotes the joint density of (X,Y) and f X (x) the marginal density of X Example: normal variables (X,Y) N(µ,Σ) = m(x) is linear.

Nonparametric Regression 4-4 Nonparametric Regression 4-5 Nadaraya-Watson Estimator idea: (X i,y i ) have joint a pdf, so we can estimate m( ) by a multivariate kernel estimator f h, h(x,y) = n ( ) ( ) n h K x Xi y Yi h hk h resulting estimator = y f h, h(x,y)dy = n n K h (x X i )Y i n n K h (x X i )Y i m h (x) = n n K h (x X j ) j= = r h(x) f h (x) Engel Curve Figure 44: Nadaraya-Watson kernel regression, h = 0.2, U.K. Family Expenditure Survey 973 R: SPMengelcurve Nonparametric Regression 4-8 Nonparametric Regression 4-9 Statistical properties of the Nadaraya-Watson estimator m h (x) m(x) = = r h(x) m(x) f h (x) f X (x) { }[ { r h (x) fh f h (x) m(x) (x) f X (x) + f }] h (x) f X (x) +{ m h (x) m(x)} f X(x) f h (x) f X (x) calculate now bias and variance in the same way as for the KDE: AMSE{ m h (x)} = σ 2 (x) nh f X (x) K 2 2 }{{} variance part { + h4 m (x)+2 m (x)f X (x) 4 f X (x) } 2 µ 2 2(K) } {{ } bias part h=0.05 h=0. h=0.2 h=0.5 Figure 45: Four kernel regression estimates for the 973 U.K. Family Expenditure data with bandwidths h = 0.05, h = 0., h = 0.2, and h = 0.5 R: SPMregress

Nonparametric Regression 4-0 Nonparametric Regression 4- Asymptotic normal distribution For some regularity conditions and h = cn /5 remarks n 2/5 { m h (x) m(x)} L N { m (x) c2 µ 2 (K) + m (x)f X (x) } 2 f X (x) }{{} b x bias is a function of m and m f variance is a function of σ 2 and f, σ2 (x) K 2 2 cf X (x) } {{ } vx 2. Pointwise Confidence Intervals [ m h (x) z α 2 K 2 σ 2 (x) nh f h (x), m h (x)+z α 2 K 2 σ 2 (x) n σ 2 (x) = n W hi (x){y i m h (x)} 2, nh f h (x) σ 2 and f both influence the precision of the confidence interval correction for bias!? analogous to KDE: confidence bands (Bickel and Rosenblatt; 973) ] Nonparametric Regression 4-2 Nonparametric Regression 4-3 Confidence Intervals Local Polynomial Estimation Taylor expansion for sufficiently smooth functions m(t) m(x)+m (x)(t x)+m (x)(t x) 2 2! + +m(p) (x)(t x) p p! consider a weighted least squares problem min β n { Y i β 0 β (X i x) β 2 (X i x) 2... β p (X i x) p} 2 Kh (x X i ) Figure 46: Nadaraya-Watson kernel regression and 95% confidence intervals, h = 0.2, U.K. Family Expenditure Survey 973 R: SPMengelconf = resulting estimate for β provides estimates for m (ν) (x), ν = 0,,...,p

Nonparametric Regression 4-4 Nonparametric Regression 4-5 notations X x (X x) 2... (X x) p X 2 x (X 2 x) 2... (X 2 x) p X =....... X n x (X n x) 2... (X n x) p Y = (Y,,Y n ) W = diag({k h (x X i )} n ) local polynomial estimator β(x) = ( X WX ) X WY local polynomial regression estimator m h,p (x) = β 0 (x) (Asymptotic) Statistical Properties under regularity conditions, h = cn 5, nh remarks: n 2/5 { m h, (x) m(x)} L N asymptotically equivalent to higher order kernel ( c 2 µ 2 (K) m (x), σ2 (x) K 2 2 2 cf X (x) analog theorem can be stated for derivative estimation ) Nonparametric Regression 4-6 Nonparametric Regression 4-7 Local Polynomial Regression Derivative Estimation Figure 47: Local polynomial regression, p =, h = 0.2, U.K. Family Expenditure Survey 973 R: SPMlocpolyreg Figure 48: Local polynomial derivative estimation, p = 2, h by rule of thumb, U.K. Family Expenditure Survey 973 R: SPMderivest

Kernel Density Estimation 3-3 Kernel Density Estimation 3-4 Different Kernel Functions Required properties of kernels K( ) is a density function: K( ) is symmetric: K(u)du = and K(u) 0 uk(u)du = 0 Kernel K(u) Uniform 2 ( u ) Triangle ( u )( u ) 3 Epanechnikov 4 ( u2 )( u ) 5 Quartic 6 ( u2 ) 2 ( u ) 35 Triweight 32 ( u2 ) 3 ( u ) Gaussian 2π exp( 2 u2 ) π Cosinus 4 cos(π 2u)( u ) Table 2: Kernel functions Kernel Density Estimation 3-5 Kernel Density Estimation 3-6 Uniform Epanechnikov K(u) K(u) -.5.5 u Triangle -.5.5 u K(u) K(u) -.5.5 u Quartic -.5.5 Figure 27: Some kernel functions: Uniform (top left), Triangle (bottom left), Epanechnikov (top right), Quartic (bottom right) R: SPMkernel u Example: Construction of the KDE consider the KDE using a Gaussian kernel f h (x) = n ( ) x Xi K nh h here we have = nh n u = x X i h 2π exp( 2 u2 )

Kernel Density Estimation 3-52 Kernel Density Estimation 3-53 Example: bandwidth matrix H = 0 0 How to get a Multivariate Kernel? u = (u,...,u d ) product kernel radially symmetric or spherical kernel K(u) = K(u )... K(u d ) K( u ) K(u) = K( u )du R d with u def = u u Product Kernel Radial-symmetric Kernel Figure 37: Contours from bivariate product (left) and bivariate radially symmetric (right) Epanechnikov kernel R: SPMkernelcontours Kernel Density Estimation 3-54 Example: bandwidth matrix H = 0 0 0.5 Kernel Density Estimation 3-55 Example: bandwidth matrix H = 0.5 0.5 /2 Product Kernel Radial-symmetric Kernel Product Kernel Radial-symmetric Kernel - -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 - -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 Figure 38: Contours from bivariate product (left) and bivariate radially symmetric (right) Epanechnikov kernel R: SPMkernelcontours Figure 39: Contours from bivariate product (left) and bivariate radially symmetric (right) Epanechnikov kernel R: SPMkernelcontours

Kernel Density Estimation 3-56 Kernel properties K is a density function K is symmetric K has a second moment (matrix) R d K(u)du =, K(u) 0 R d uk(u)du = 0 d R d uu K(u)du = µ 2 (K)I d where I d denotes the d d identity matrix K has a kernel norm K 2 2 = K 2 (u)du Nonparametric Regression 4-8 Higher Order Kernels Kernel is of order p if u j K(u) du = 0 and u p K(u) du < j =,...,p K_opt(v=0,p=4,6) 0.5 0..0.5 2.0 K_opt(v=,2,p=5,6) 20 0 0 0 20.0 0.5 0..0.0 0.5 0..0 Figure 49: Optimal p th order kernels for v th derivative est. R: SPMhokernel Nonparametric Regression 4-9 Nonparametric Regression 4-20 Optimal and Gauss-type Higher Order Kernels Order(v, p) K(u) Opt (0,4) 5 32 (3 0u2 +7u 4 ) Opt (0,6) 35 256 (5 05u2 +89u 4 99u 6 ) Opt (,3) 5 4 (u3 u) Opt (, 5) 05 32 ( 5u+4u3 9u 5 ) Opt (2, 4) 05 6 ( +6u2 5u 4 ) Opt (2, 6) 35 64 ( 5+63u2 35u 4 +77u 6 ) Gauss (0, 4) 2 (3 u2 )φ(u) Gauss (0, 6) 225 8 (5 0u2 +u 4 )φ(u) Gauss (0, 8) 48 (05 05u2 +2u 4 u 6 )φ(u) Gauss (0, 0) 384 (945 260u2 +378u 4 36u 6 +u 8 )φ(u) Estimators using Higher Order Kernels Kernel estimators with higher order (p > 2) kernels achieve better bias rates (typically: K of order p bias h p ) Therefore, the asymptotic statistical properties are equivalent to those of local polynomials of corresponding order Like for local polynomials, the optimal choice is to have p v > 0 even Note that this is asymptotic. In practice, performance (especially graphical one) might be poor unless samples size is huge Can also be applied to density estimation but may give negative estimates for some x-values, especially where data are sparse or samples are small

Nonparametric Regression 4-2 Bandwidth Choice in Kernel Regression Nonparametric Regression 4-22 Example: simulated data for previous figure Bias^2, Variance and MASE Simulated Data Bias^2, Variance and MASE 0 0.0 0.0 0.02 0.02 0.03 0.03 y, m, mh 0.05 0. 0.5 Bandwidth h x Figure 50: MASE (thick line), bias part (thin solid line) and variance part (thin dashed line) for simulated data R: SPMsimulmase Figure 5: Simulated data with curve m(x) = {sin(2πx 3 )} 3 Y i = m(x i) + ε i,x i U[0,],ε i N(0,0.) R: SPMsimulmase Nonparametric Regression 4-23 Convergence rates (univariate case) for a kernel K of order p AMSE(h) = nh C +h 2p C 2 Nonparametric Regression 4-24 Cross Validation n ASE = n { m h (X j ) m(x j )} 2 w(x j ) j= MASE = E{ASE X = x,,x n = x n } Plug-In Methods h opt = argmin h AMISE(h) = h opt n +2p, AMISE(hopt ) n 2p 2p+ Idea: calculate pre-estimators of all unknown parts in C and C 2 Most practical: rough approximations should do, e.g. use some parametric higher order polynomials (at least of order two to obtain also the second derivatives) estimate MASE by resubstitution function (w is a weight function) p(h) = n {Y i m h (X i )} 2 w(x i ) n separate estimation and validation by using leave-one-out estimators CV(h) = n {Y i m h, i (X i )} 2 w(x i ) n minimizing gives ĥcv

Nonparametric Regression 4-25 Nonparametric Regression 4-26 Generalized Cross Validation Because CV does easily break down or is computationally too intensive for large data sets, Generalized Cross Validation has been proposed. Minimizes the residual sum of squares corrected for the degrees of freedom (dof). Unfortunately, there exist several definitions like e.g. nˆσ 2 n dof (=) RSS n dof, n RSS (n dof) 2 Also for the dof we find several definitions. Considering only estimators linear in Y, i.e. ˆm(X) = AY, typical proposals are trace(a), trace(aa t ), n trace(2a AA t ) Nadaraya-Watson Estimate Any of this definitions makes only sense for symmetric A. Minimizing gives ĥgcv Figure 52: Nadaraya-Watson kernel regression, cross-validated bandwidth ĥcv = 0.5, U.K. Family Expenditure Survey 973 R: SPMnadwaest Nonparametric Regression 4-27 Local Polynomial Estimate Figure 53: Local polynomial regression, p =, cross-validated bandwidth ĥcv = 0.56, U.K. Family Expenditure Survey 973 R: SPMlocpolyest