Lecture 9: Regression: Regressogram and Kernel Regression

Similar documents
Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

4 Conditional Distribution Estimation

Nonparametric regression: minimax upper and lower bounds

Lecture 2: Monte Carlo Simulation

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

ON LOCAL LINEAR ESTIMATION IN NONPARAMETRIC ERRORS-IN-VARIABLES MODELS 1

Notes On Nonparametric Density Estimation. James L. Powell Department of Economics University of California, Berkeley

Lecture 33: Bootstrap

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

1 Inferential Methods for Correlation and Regression Analysis

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Estimating the Population Mean using Stratified Double Ranked Set Sample

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Lecture 3: MLE and Regression

Bayesian Methods: Introduction to Multi-parameter Models

Random Variables, Sampling and Estimation

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

CONCENTRATION INEQUALITIES

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

CSE 527, Additional notes on MLE & EM

Properties and Hypothesis Testing

Estimation for Complete Data

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Simple Linear Regression

Output Analysis and Run-Length Control

Expectation and Variance of a random variable

Lecture 11 and 12: Basic estimation theory

10-701/ Machine Learning Mid-term Exam Solution

Section 14. Simple linear regression.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 15: Learning Theory: Concentration Inequalities

Topic 9: Sampling Distributions of Estimators

Lecture 3: August 31

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

Stability analysis of numerical methods for stochastic systems with additive noise

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

ECON 3150/4150, Spring term Lecture 3

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

1 Introduction to reducing variance in Monte Carlo simulations

TAMS24: Notations and Formulas

Stat 421-SP2012 Interval Estimation Section

Exponential Families and Bayesian Inference

DEGENERACY AND ALL THAT

7.1 Convergence of sequences of random variables

Statistical Properties of OLS estimators

Matrix Representation of Data in Experiment

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 13: Maximum Likelihood Estimation

Maximum Likelihood Estimation and Complexity Regularization

Stat 139 Homework 7 Solutions, Fall 2015

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Distribution of Random Samples & Limit theorems

Week 6. Intuition: Let p (x, y) be the joint density of (X, Y ) and p (x) be the marginal density of X. Then. h dy =

ALLOCATING SAMPLE TO STRATA PROPORTIONAL TO AGGREGATE MEASURE OF SIZE WITH BOTH UPPER AND LOWER BOUNDS ON THE NUMBER OF UNITS IN EACH STRATUM

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Statistics 511 Additional Materials

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Lecture 11 Simple Linear Regression

Algebra of Least Squares

Estimation of the Mean and the ACVF

Chapter 6 Principles of Data Reduction

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Machine Learning Brett Bernstein

11 Correlation and Regression

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Local Polynomial Regression

32 estimating the cumulative distribution function

7.1 Convergence of sequences of random variables

MA Advanced Econometrics: Properties of Least Squares Estimators

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

ECE 901 Lecture 13: Maximum Likelihood Estimation

1.010 Uncertainty in Engineering Fall 2008

Lecture 15: Density estimation

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Problem Set 4 Due Oct, 12

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Maximum Likelihood Estimation

10/ Statistical Machine Learning Homework #1 Solutions

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 7 Testing Nonlinear Inequality Restrictions 1

Lecture 12: September 27

The standard deviation of the mean

Kernel density estimator

Topic 9: Sampling Distributions of Estimators

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Simulation. Two Rule For Inverting A Distribution Function

Recurrence Relations

Differentiation Techniques 1: Power, Constant Multiple, Sum and Difference Rules

Lecture 7: Properties of Random Samples

Topic 9: Sampling Distributions of Estimators

Transcription:

STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X, Y,, X, Y be a bivariate radom sample I te regressio aalysis, we are ofte iterested i te regressio fuctio mx EY X x Sometimes, we will write Y i mx i + ɛ i, were ɛ i is a mea 0 oise Te simple liear regressio model is to assume tat mx β 0 + β x, were β 0 ad β are te itercept ad slope parameter I tis lecture, we will talk about metods tat direct estimate te regressio fuctio mx witout imposig ay parametric form of mx 92 Regressogram Biig We start wit a very simple but extremely popular metod Tis metod is called regressogram but people ofte call it biig approac You ca view it as regressogram regressio + istogram For simplicity, we assume tat te covariates X i s are from a distributio over [0, ] Similar to te istogram, we first coose M, te umber of bis Te we partitio te iterval [0, ] ito M equal-widt bis: [ [ [ B 0,, B 2 M M, 2 M 2,, B M M M, M [ ] M, B M M M, We x B l, we estimate mx by m M x Y iix i B l IX average of te resposes wose covariates is i te same bi as x i B l Bias Te bias of a regressogram estimator is bias m M x O Variace Te variatio of a regressogram estimator is Var m M x O 9- M M

9-2 Lecture 9: Regressio: Regressogram ad erel Regressio Terefore, te MSE ad MISE will be at rate MSE O M 2 + O M, MISE O M 2 + O M, leadig to te optimal umber of bis M /3 ad te optimal covergece rate O 2/3, te same as te istogram Similar to te istogram, te regressogram as a slower covergece rate compared to may oter competitors we will itroduce several oter cadidates However, tey istogram ad regressogram are still very popular because te costructio of a estimator is very simple ad ituitive; practitioers wit little matematical traiig ca easily master tese approaces 93 erel Regressio Give a poit x 0, assume tat we are iterested i te value mx 0 Here is a simple metod to estimate tat value We mx 0 is smoot, a observatio X i x 0 implies mx i mx 0 Tus, te respose value Y i mx i + ɛ i mx 0 + ɛ i Usig tis observatio, to reduce te oise ɛ i, we ca use te sample average Tus, a estimator of mx 0 is to take te average of tose resposes wose covariate are close to x 0 To make it more cocrete, let > 0 be a tresold Te above procedure suggests to use i: X m loc x 0 Y i x 0 i Y ii X i x 0 x 0 I X i x 0, 9 were x 0 is te umber of observatios were te covariate X : X i x 0 Tis estimator, m loc, is called te local average estimator Ideed, to estimate mx at ay give poit x, we are usig a local average as a estimator Te local average estimator ca be rewritte as m loc x 0 Y ii X i x 0 I X i x 0 were W i x 0 I X i x 0 l I X l x 0 Y i I X i x 0 l I X l x 0 W i x 0 Y i, 92 is a weigt for eac observatio Note tat W ix 0 ad W i x 0 > 0 for all i,, ; tis implies tat W i x 0 s are ideed weigts Equatio 92 sows tat te local average estimator ca be writte as a weigted average estimator so te i-t weigt W i x 0 determies te cotributio of respose Y i to te estimator m loc x 0 I costructig te local average estimator, we are placig a ard-tresoldig o te eigborig poits tose witi a distace are give equal weigt but tose outside te tresold will be igored completely Tis ard-tresoldig leads to a estimator tat is ot cotiuous To avoid problem, we cosider aoter costructio of te weigts Ideally, we wat to give more weigts to tose observatios tat are close to x 0 ad we wat to ave a weigt tat is smoot Te Gaussia fuctio Gx 2π e x2 /2 seems to be a good cadidate We ow use te Gaussia fuctio to costruct a estimator We first costruct te weigt Wi G G x 0 l G x 0 X l 93

Lecture 9: Regressio: Regressogram ad erel Regressio 9-3 Te quatity > 0 is te similar quatity to te tresold i te local average but ow it acts as te smootig badwidt of te Gaussia After costructig te weigt, our ew estimator is m G x 0 Wi G G x 0 Y i l G x 0 X l Y i Y ig l G x 0 X l 94 Tis ew estimator as a weigt tat cages more smootly ta te local average ad is smoot as we desire Observig from equatio 9 ad 94, oe may otice tat tese local estimators are all of a similar form: m x 0 Y i l x 0 X l Wi x 0 Y i, Wi x 0 l x 0 X l, 95 were is some fuctio We is a Gaussia, we obtai estimator 94; we is a uiform over [, ], we obtai te local average 9 Te estimator i equatio 95 is called te kerel regressio estimator or Nadaraya-Watso estimator Te fuctio plays a similar role as te kerel fuctio i te DE ad tus it is also called te kerel fuctio Ad te quatity > 0 is similar to te smootig badwidt i te DE so it is also called te smootig badwidt 93 Teory Now we study some statistical properties of te estimator m We skip te details of derivatios 2 Bias Te bias of te kerel regressio at a poit x is bias m x 2 2 µ m x + 2 m xp x + o 2, px were px is te probability desity fuctio of te covariates X,, X ad µ x 2 xdx is te same costat of te kerel fuctio as i te DE Te bias as two compoets: a curvature compoet m x ad a desig compoet m xp x px Te curvature compoet is similar to te oe i te DE; we te regressio fuctio curved a lot, kerel smootig will smoot out te structure, itroducig some bias Te secod compoet, also kow as te desig bias, is a ew compoet compare to te bias i te DE Tis compoet depeds o te desity of covariate px Note tat i some studies, we ca coose te values of covariates so te desity px is also called te desig tis is wy it is kow as te desig bias Variace Te variace of te estimator is Var m x σ2 σ 2 px + o, were σ 2 Varɛ i is te error of te regressio model ad σ 2 2 xdx is a costat of te kerel fuctio te same as i te DE Tis expressio tells us possible sources of variace First, te variace icreases we σ 2 icreases Tis makes perfect sese because σ 2 is te oise level We te oise level is large, we expect te estimatio error icreases Secod, te desity of covariate px is iversely related to te variace Tis is also very reasoable because we px is large, tere teds to be more data poits ttps://ewikipediaorg/wiki/erel_regressio 2 if you are iterested i te derivatio, ceck ttp://wwwsscwiscedu/~base/78/noparametrics2pdf ad ttp: //wwwmatsmacesteracuk/~peterf/math380/npr%20n-w%20estimatorpdf

9-4 Lecture 9: Regressio: Regressogram ad erel Regressio aroud x, icreasig te size of sample tat we are averagig from Last, te covergece rate is O, wic is te same as te DE MSE ad MISE Usig te expressio of bias ad variace, te MSE at poit x is MSE m x 4 4 µ2 m x + 2 m xp 2 x + σ2 σ 2 px px + o4 + o ad te MISE is MISE m 4 4 µ2 m x + 2 m xp 2 x dx + σ2 σ 2 px px dx + o4 + o 96 Optimizig te major compoets i equatio 96 te AMISE, we obtai te optimal value of te smootig badwidt opt C /5, were C is a costat depedig o p ad 932 Ucertaity ad Cofidece Itervals How do we assess te quality of our estimator m x? We ca use te bootstrap to do it I tis case, empirical bootstrap, residual bootstrap, ad wild bootstrap all ca be applied But ote tat eac of tem relies o sligtly differet assumptios Let X, Y,, X, Y be te bootstrap sample Applyig te bootstrap sample to equatio 95, we obtai a bootstrap kerel regressio, deoted as m Now repeat te bootstrap procedure B times, tis yields m,, m B, B bootstrap kerel regressio estimator Te we ca estimate te variace of m x by te sample variace Var B m x B B m l x m,bx, m,bx B l B l m l x Similarly, we ca estimate te MSE as wat we did i Lecture 5 ad 6 However, we usig te bootstrap to estimate te ucertaity, oe as to be very careful because we is eiter too small or too large, te bootstrap estimate may fail to coverge its target We we coose O /5, te bootstrap estimate of te variace is cosistet but te bootstrap estimate of te MSE migt ot be cosistet Te mai reaso is: it is easier for te bootstrap to estimate te variace ta te bias Tus, we we coose i suc a way, bot bias ad te variace cotribute a lot to te MSE so we caot igore te bias However, i tis case, te bootstrap caot estimate te bias cosistetly so te estimate of te MSE is ot cosistet Cofidece iterval To costruct a cofidece iterval of mx, we will use te followig property of te kerel regressio: m x E m x D N 0, σ2 σ 2 px m x E m x Var m x D N0,

Lecture 9: Regressio: Regressogram ad erel Regressio 9-5 Te variace depeds o tree quatities: σ 2, σ 2, ad px Te quatity σ2 is kow because it is just a caracteristic of te kerel fuctio Te desity of covariates px ca be estimated usig a DE So wat remais ukow is te oise level σ 2 A good ews is: we ca estimate it usig te residuals Recall tat residuals are e i Y i Ŷi Y i m X i We m m, te residual becomes a approximatio to te oise ɛ i Te quatity σ 2 Varɛ so we ca use te sample variace of te residuals to estimate it ote tat te average of residuals is 0: σ 2 2ν + ν e 2 i, 97 were ν, ν are quatities actig as degree-of-freedom i wic we will explai later Tus, a α CI ca be costructed usig σ σ m x ± z α/2 p x, were p x is te DE of te covariates Bias issue 933 Resamplig Teciques Cross-validatio Bootstrap approac ttp://facultywasigtoedu/yecic/7sp_403/lec8-npregpdf 934 Relatio to DE May teoretical results of te DE apply to te oparametric regressio For istace, we ca geeralize te MISE ito oter types of error measuremet betwee m ad m We ca also use derivatives of m as estimators of te correspodig derivatives of m Moreover, we we ave a multivariate covariate, we ca use eiter a radial basis kerel or a product kerel to geeralize te kerel regressio to multivariate case Te DE ad te kerel regressio as a very iterestig relatiosip Usig te give bivariate radom sample X, Y,, X, Y, we ca estimate te joit PDF px, y as p x, y 2 Xi x Yi y Tis joit desity estimator also leads to a margial desity estimator of X: p x p x, ydy Xi x Now recalled tat te regressio fuctio is te coditioal expectatio mx EY X x ypy xdy px, y y px dy ypx, ydy px

9-6 Lecture 9: Regressio: Regressogram ad erel Regressio Replacig px, y ad px by teir correspodig estimators p x, y ad p x, we obtai a estimate of mx as y p x, ydy m x p x y 2 X i x Yi y dy X i x X i x y Yi y X i x Yi X i x X i x Y i X i x X i x m x Yi y dy Note tat we x is symmetric, y Y i Namely, we may uderstad te kerel regressio as a estimator ivertig te DE of te joit PDF ito a regressio estimator dy 94 Liear Smooter Now we are goig to itroduce a very importat otio called liear smooter Liear smooter is a collectio of may regressio estimators tat ave ice properties Te liear smooter is a estimator of te regressio fuctio i te form tat mx l i xy i, 98 were l i x is some fuctio depedig o X,, X but ot o ay of Y,, Y Te residual for te i-t observatio ca be writte as e j Y j mx j Y j l i X j Y i Let e e,, e T be te vector of residuals ad defie a matrix L as L ij l j X i : l X l 2 X l 3 X l X l X 2 l 2 X 2 l 3 X 2 l X 2 L l X l 2 X l 3 X l X Te te predicted vector Ŷ Ŷ,, Ŷ T LY, were Y Y,, Y T is te vector of observed Y i s ad e Y Ŷ Y LY I LY Example: Liear Regressio For te liear regressio, let X deotes te data matrix first colum is all value ad secod colum is X,, X We kow tat β X T X X T Y ad Ŷ X β XX T X X T Y Tis implies tat te matrix L is L XX T X X T,

Lecture 9: Regressio: Regressogram ad erel Regressio 9-7 wic is also te projectio matrix i liear regressio Tus, te liear regressio is a liear smooter Example: Regressogram Te regressogram is also a liear smooter Let B,, B m be te bis of te covariate ad defie Bx be te bi suc tat x belogs to Te l j x IX j Bx IX i Bx Example: erel Regressio As you may expect, te kerel regressio is also a liear smooter Recall from equatio 95 m x 0 Y i l x 0 X l Wi x 0 Y i, W i x 0 l x 0 X l so l j x 94 Variace of Liear Smooter x Xj l x X l Te liear smooter as a ubiased estimator of te uderlyig oise level σ 2 Recall tat te oise level σ 2 Varɛ i We eed to use two tricks about variace ad covariace matrix For a matrix A ad a radom variable X, Tus, te covariace matrix of te residual vector CovAX ACovXA T Cove CovI LY I LCovYI L T Because Y,, Y are IID, CovY σ 2 I, were I is te idetity matrix Tis implies Cove I LCovYI L T σ 2 I L L T + LL T Now takig matrix trace i bot side, TrCove Vare i σ 2 TrI L L T + LL T σ 2 ν ν + ν, were ν TrL ad ν TrLL T Because te residual square is approximately Vare i, we ave e 2 i Vare i σ 2 2ν + ν Tus, we ca estimate σ 2 by σ 2 2ν + ν e 2 i, 99 wic is wat we did i equatio 97 Te quatity ν is called te degree of freedom I te liear regressio case, ν ν p +, te umber of covariates so te variace estimator σ 2 p e2 i If you ave leared te variace estimator of a liear regressio, you sould be familiar wit tis estimator Te degree of freedom ν is easy to iterpret i te liear regressio Ad te power of equatio 99 is tat it works for every liear smooter as log as te errors ɛ i s are IID So it sows ow we ca defie effective degree of freedom for oter complicated regressio estimator