ECON 3150/4150, Spring term Lecture 3

Similar documents
ECON 3150/4150, Spring term Lecture 1

Simple Linear Regression

Linear Regression Demystified

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Properties and Hypothesis Testing

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Algebra of Least Squares

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

CLRM estimation Pietro Coretto Econometrics

11 Correlation and Regression

Correlation Regression

Regression, Inference, and Model Building

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Linear Regression Models

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear Regression Models, OLS, Assumptions and Properties

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Zeros of Polynomials

(all terms are scalars).the minimization is clearer in sum notation:

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Statistical Properties of OLS estimators

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Random Variables, Sampling and Estimation

The Method of Least Squares. To understand least squares fitting of data.

Machine Learning for Data Science (CS 4786)

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

UNIT 11 MULTIPLE LINEAR REGRESSION

1 Inferential Methods for Correlation and Regression Analysis

Simple Linear Regression

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

Simple Regression Model

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 9: Sampling Distributions of Estimators

Lesson 11: Simple Linear Regression

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Efficient GMM LECTURE 12 GMM II

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

Machine Learning for Data Science (CS 4786)

Polynomial Functions and Their Graphs

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

11 THE GMM ESTIMATION

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

10-701/ Machine Learning Mid-term Exam Solution

Introduction to regression

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Lesson 10: Limits and Continuity

Topic 9: Sampling Distributions of Estimators

Infinite Sequences and Series

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Topic 9: Sampling Distributions of Estimators

University of California, Los Angeles Department of Statistics. Simple regression analysis

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

ARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t

Matrix Representation of Data in Experiment

ALGEBRAIC GEOMETRY COURSE NOTES, LECTURE 5: SINGULARITIES.

Lecture 11 Simple Linear Regression

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations

Lecture 2: Monte Carlo Simulation

2 Geometric interpretation of complex numbers

4. Partial Sums and the Central Limit Theorem

Paired Data and Linear Correlation

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Machine Learning Brett Bernstein

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

6.867 Machine learning

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

Introduction to Machine Learning DIS10

PROBABILITY LOGIC: Part 2

Section 14. Simple linear regression.

The Random Walk For Dummies

Math 155 (Lecture 3)

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

An Introduction to Randomized Algorithms

Bertrand s Postulate

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Pb ( a ) = measure of the plausibility of proposition b conditional on the information stated in proposition a. & then using P2

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

6.867 Machine learning, lecture 7 (Jaakkola) 1

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Rademacher Complexity

MAT1026 Calculus II Basic Convergence Tests for Series

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n

Transcription:

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Refereces to Lecture 3 ad 4 Stock ad Watso (SW) Ch 3.7 ad Ch 4 ( mai expositio) ad CH 17 (techical expositio, the level matches Ch 2 ad Ch 3); Bårdse ad Nymoe (BN) Kap 2, 3 ad Kap. 5.1-5.8 2 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step It is custom to motivate regressio, ad i particular the estimatio method Ordiary Least Squares, by settig fidig the best fittig lie i a scatter plot as the purpose of ecoometric modellig Nothig wrog i this but it should ot be take too far. Goodess of fit is oly oe aspect of buildig a relevat ecoometric model. Model parsimoy (explaiig a pheomea by simple models); theory cosistecy; ad relevat represetatio of couterfactuals to allow causal aalysis, are examples of model features that are just as importat as goodess of fit. After this caveat we start by presetig the mai ideas behid OLS estimatio i terms of fidig the best fittig lie a scatter plot of data poits. 3 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Basic ideas Scatter plot ad least squares fit Y 700 600 500 400 300 200 100 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 X 4 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Basic ideas Y 700 600 500 400 300 200 Which lie is best? Idea: Miimize sum of squared errors! But which errors? 100 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 X 5 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Basic ideas Which squared error? Y 2 3 1 x ( X i, Y i ) 1: Least vertical distace to lie 2. Least horizotal 3. Shortest distace to lie X X i 6 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Basic ideas ^Y i Y i Y x ( X i, Y i ) X i X Choose 1 whe wat to miimize squared errors from predictig Y i liearly from X i Residual: ˆε i = Y i Ŷ i, where Ŷ i is predicted value 7 / 30

Y 100 200 300 400 500 600 700 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Basic ideas Regressio lie ad predictio errors (projectios) 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 X 8 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra Ordiary least squares (OLS) estimates I The differet lies that we cosidered placig i the scatter-plot correspod to differet values of the parameter β 0 ad β 1 i the liear fuctio that coects give umbers X 1,X 2,... X with Y1 fitted,y2 fitted,..., Y fitted : Y fitted i = β 0 β 1 X i, i = 1, 2,..., We obtai the best fit Y fitted i Ŷ i (i = 1, 2,..., ) Ŷ i = ˆβ 0 ˆβ 1 X i, i = 1, 2,..., (1) by fidig the estimates of β 0 ad β 1 that miimizes the sum of squared residuals ( Yi Yi fitted ) 2: S(β 0,β 1 ) = (Y i β 0 β 1 X i ) 2 (2) 9 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra Ordiary least squares (OLS) estimates II where Cosequetly ˆβ 0 ad ˆβ 1 are determied by the 1oc s: Y ˆβ 0 ˆβ 1 X = 0 (3) X i Y i ˆβ 0 X i ˆβ 1 Xi 2 = 0 (4) X = 1 is the sample mea (empirical mea) of X. X i (5) It is expected that you ca solve the simultaeous equatio system (3)-(4). See Questio C i the first exercise-set! 10 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra A trick ad a simplified derivatio I The trick is to ote that β 0 + β 1 X i α + β 1 (X i X ) (6) whe the itercept parameter α is defied as α β 0 + β 1 X (7) This meas that the best predictio Ŷ i give X i ca be writte as Ŷ i = ˆβ 0 + ˆβ 1 X i ˆα + ˆβ 1 (X i X ) where ˆα ˆβ 0 + ˆβ 1 X (8) ad we therefore choose the α ad β 1 that miimize S(α,β 1 ) = [Y i α β 1 (X i X )] 2 (9) 11 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra A trick ad a simplified derivatio II Calculate the two partial derivatives ( kjereregele for each elemet i the sums): S(α,β 1 ) α = 2 S(α,β 1 ) = 2 β 1 [Y i α β 1 (X i X )] ( 1) ad choose ˆα ad ˆβ 1 as the solutios of 2 2 [Y i α β 1 (X i X )] (X i X ) [ Yi ˆα ˆβ 1 (X i X ) ] ( 1) = 0 (10) [ Yi ˆα ˆβ 1 (X i X ) ] (X i X ) = 0 (11) 12 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra A trick ad a simplified derivatio III ˆα Ȳ = 0 (12) where i X )Y i ˆβ 1 (X i X ) (X 2 = 0 (13) Ȳ = 1 Y i (14) the empirical mea of Y. Aother DIY exercise: Show that (10) gives (12), ad (11) gives (13) ad that the solutios of (12) ad (13) are ˆα = Ȳ, (15) ˆβ 1 = (X i X )Y i (X i X ) 2 (16) 13 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Least squares algebra A trick ad a simplified derivatio IV Note that for (16) to make sese, we eed to assume (X i X ) 2 > 0 (i.e., X is a variable, ot a costat) A geeralizatio of this will be importat later, ad is the called absece of perfect multicolliearity. To obtai ˆβ 0 we simply use ˆβ 0 = ˆα ˆβ 1 X (17) 14 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Residuals ad total sum of squares I Defiitio of OLS residuals: ˆε i = Y i Ŷ i, i = 1, 2,..., (18) where we deviate form the S&W otatio, which uses û i for the residual. Usig this defiitio i the 1oc s (10) ad (13) gives ˆε i = 0 (19) ˆε i (X i X ) = 0. (20) 15 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Residuals ad total sum of squares II ˆε i = 0 = ˆε = 1 ˆε i (X i X ) = 0 = ˆσ εx = 1 ˆε i = 0 (21) (ˆε i ˆε)(X i X ) = 0 (22) where ˆσ εx deotes the (empirical) covariace betwee the residuals ad the explaatory variable. These properties always hold whe we iclude the itercept (β 0 or α) i the model They geeralize to the case of multiple regressio as we shall later (22) is a orthogoality coditio. It says that the OLS residuals are ucorrelated with the explaatory variable. 16 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Residuals ad total sum of squares III ˆσ εx = 0 occurs because we have defied the OLS residuals i such a way that they measure what is left uexplaied i Y whe we have extract all the explaatory power of X 17 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Total Sum of Squares ad Residual Sum of Squares I We defie the Total Sum of Squares for Y as TSS = (Y i Ȳ ) 2 (23) We ca guess that TSS ca be split i Explaied Sum of Squares ESS = ad Residual Sum of Squares RSS = (Ŷ i Ŷ ) 2 (24) (ˆε i ε) 2 = SSR (25) 18 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Total Sum of Squares ad Residual Sum of Squares II SSR deotes Sum of Squared Residuals. RSS ad SSR are both used. TSS = ESS + RSS (26) To show this importat decompositio, start with (Y i Ȳ ) 2 = (Y i Ŷ i ) + (Ŷ }{{} i Ŷ ) ˆε i where we have used that Ȳ = 1 Y i = 1 (ˆε i + Ŷ i ) = Ŷ because of (19). Completig the square gives 2 19 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Total Sum of Squares ad Residual Sum of Squares III (Y i Ȳ ) 2 = RSS + 2 ˆε i (Ŷ i Ŷ ) + ESS } {{ } TSS Expad the middle term: ˆε i (Ŷ i Ŷ ) = ˆε i (ˆα + ˆβ 1 (X i X ) Ŷ ) = ˆα ˆε i + ˆβ 1 ˆε i (X i X ) Ŷ }{{}}{{} (19) (20) ˆε i }{{} (19) 20 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Total Sum of Squares ad Residual Sum of Squares IV Therefore ˆε i (Ŷ i Ŷ ) = 0 The residuals are ucorrelated with the predictios Ŷ i. Could it be differet? Hece we have the desired result: TSS = ESS + RSS (27) 21 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step The coefficiet of determiatio I To summarize the goodess of fit i the form of a sigle umber, the coefficiet of determiatio, almost everywhere deoted R 2, is used: R 2 = ESS TSS RSS = = 1 RSS TSS TSS TSS = 1 rate of uexplaied Y variatio (28) If ˆβ 1 = 0, RSS = (Y i Ŷ i ε) 2 = (Y i ˆα) 2 = (Y i Ȳ ) 2 = TSS. ad R 2 = 0 If RSS = 0,a perfect fit, the R 2 = 1 22 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step The coefficiet of determiatio II Hece we have the property 0 R 2 1 (29) These results deped o defiig the regressio fuctio as as i (1). If we istead use Ŷ i = ˆβ 0 ˆβ 1 X i, Ŷ o i i = ˆβ o i 1 X i which forces the regressio lie trough the origi: the correspodig residuals do ot sum to zero, the decompositio of TSS breaks dow. R 2 (as defied above) ca be egative! Work with See Questio D i the first exercise-set! 23 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Regressio ad correlatio I We defie the empirical correlatio coefficiet betwee X ad Y as r X,Y 1 1 (Y i Y )(X i X ) = 1 1 (X i X ) 2 1 1 (Y i Y ) 2 ˆσ XY ˆσ X ˆσ Y, (30) ˆσ XY deotes the empirical covariace betwee Y ad X. SW uses s XY ˆσ X ad ˆσ Y deote the two empirical stadard deviatios. SW uses s X ad s Y They are square roots of the empirical variaces, e.g., ˆσ X = ˆσ 2X = 1/( 1) (X i X ) 2 24 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Regressio ad correlatio II Note: Dividig y or 1 is ot really importat (but best stick to oe covetio) ˆσ X,Y ca be writte i three equivalet ways: ˆσ X,Y = 1 1 = 1 1 (X i X )(Y i Ȳ ) = 1 1 (Y i Ȳ )X i (X i X )Y i 25 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Regressio ad correlatio III The regressio coefficiet ca therefore be re-expressed as ˆβ 1 = 1 (X i X )Y i (X i X ) 2 = 1 (X i X )Y i 1 1 (X i X ) = ˆσ X,Y 2 ˆσ X 2 = ˆσ Y ˆσ X ˆσ X,Y ˆσ X ˆσ Y This shows that = ˆσ Y ˆσ X r X,Y (31) r X,Y = 0 is ecessary for ˆβ 1 = 0. Correlatio is ecessary for fidig regressio relatioships Still, ˆβ 1 = r XY (i geeral) ad regressio aalysis is differet from correlatio aalysis. 26 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Regressio ad causality I Three possible theoretical causal relatioships betwee X ad Y. Our regressio is causal if I is true, ad II (joit causality) ad III are ot true r XY = 0 i all three cases Ca also be that a third variable (Z) causes both Y ad X (spurious correlatio) 27 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Causal iterpretatio of regressio aalysis I Regressio aalysis ca refute a causal relatioship, sice correlatio is ecessary for causality But caot cofirm or discover a causal relatioship by statistical aalysis (such as regressio) aloe Need to supplemet the aalysis by theory ad by iterpretatio of atural experimets or quasi-experimets see page 126 ad the text box o page 131 i SW. Will see several examples later i the course. 28 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step Causal iterpretatio of regressio aalysis II I time series aalysis, the cetral cocept is autoomy of regressio parameters with respect to chages i policy variables. The cocept is developed i ECON 4160, but for those iterested Kap. 2.4, i BN gives a itroductio to this lie of thikig about correlatio ad causality. 29 / 30

Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step I this lecture we have leart about the method of ordiary least squares (OLS) to fit a straight lie to a scatter-plot of umbers (data poits). The cocepts of radom variables ad statistical model, that were cetral i Lecture 1 ad 2, have ot eve bee metioed! I Lecture 4 we start to bridge that gap by itroducig the regressio model. Note also the limitatio of fittig the straight lie : May scatter plots do ot eve resemble a liear relatioship: See Figure 3.3 i BN, ad the Phillips curve examples i Kap 3 i BN. Luckily, the OLS method ca be used i may such cases the poit will be is that the coditioal expectatio fuctio eed ot be liear Hece: Several reasos to brig the statistical model back ito the story, ad i particular the coditioal expectatio fuctio! 30 / 30