Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Similar documents
Chapter 3. Two-Variable Regression Model: The Problem of Estimation

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Properties of Least Squares

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

x i1 =1 for all i (the constant ).

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

e i is a random error

The Ordinary Least Squares (OLS) Estimator

a. (All your answers should be in the letter!

Linear Regression Analysis: Terminology and Notation

Outline. 9. Heteroskedasticity Cross Sectional Analysis. Homoskedastic Case

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Economics 130. Lecture 4 Simple Linear Regression Continued

Statistics for Economics & Business

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Professor Chris Murray. Midterm Exam

β0 + β1xi and want to estimate the unknown

Lecture 6: Introduction to Linear Regression

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

β0 + β1xi. You are interested in estimating the unknown parameters β

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Chapter 15 - Multiple Regression

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 14 Simple Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Basic Business Statistics, 10/e

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Yarine Fawaz ECONOMETRICS I

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

January Examinations 2015

Chapter 13: Multiple Regression

Correlation and Regression

Chapter 4: Regression With One Regressor

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Chapter 11: Simple Linear Regression and Correlation

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Introduction to Regression

Learning Objectives for Chapter 11

CHAPER 11: HETEROSCEDASTICITY: WHAT HAPPENS WHEN ERROR VARIANCE IS NONCONSTANT?

β0 + β1xi. You are interested in estimating the unknown parameters β

III. Econometric Methodology Regression Analysis

17 - LINEAR REGRESSION II

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Chapter 9: Statistical Inference and the Relationship between Two Variables

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Chapter 15 Student Lecture Notes 15-1

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

x = , so that calculated

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Lecture 3 Stat102, Spring 2007

Continuous vs. Discrete Goods

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 3 Specification

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

T E C O L O T E R E S E A R C H, I N C.

The Geometry of Logit and Probit

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

STAT 3008 Applied Regression Analysis

Diagnostics in Poisson Regression. Models - Residual Analysis

Problem of Estimation. Ordinary Least Squares (OLS) Ordinary Least Squares Method. Basic Econometrics in Transportation. Bivariate Regression Analysis

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Lecture 3: Probability Distributions

REGRESSION ANALYSIS II- MULTICOLLINEARITY

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

CHAPTER 8 SOLUTIONS TO PROBLEMS

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

/ n ) are compared. The logic is: if the two

Basically, if you have a dummy dependent variable you will be estimating a probability.

Topic 7: Analysis of Variance

Chapter 8 Indicator Variables

Econometrics: What's It All About, Alfie?

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Systems of Equations (SUR, GMM, and 3SLS)

Polynomial Regression Models

EXAMINATION. N0028N Econometrics. Luleå University of Technology. Date: (A1016) Time: Aid: Calculator and dictionary

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Scatter Plot x

Lecture 4 Hypothesis Testing

Empirical Methods for Corporate Finance. Identification

Unit 10: Simple Linear Regression and Correlation

Primer on High-Order Moment Estimators

Tests of Exclusion Restrictions on Regression Coefficients: Formulation and Interpretation

Comparison of Regression Lines

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Cathy Walker March 5, 2010

Transcription:

Outlne 3. Multple Regresson Analyss: Estmaton I. Motvaton II. Mechancs and Interpretaton of OLS Read Wooldrdge (013), Chapter 3. III. Expected Values of the OLS IV. Varances of the OLS V. The Gauss Markov Theorem I. Motvaton Zero Condtonal mean Example A drawback of SLR model: It may not be realstc to draw concluson that only x affects y when there are other factors that affect y. Example: y = wages and x = educ y = 0 + 1 + x wage = 0 + 1 educ + exper E(u educ,exper) = 0 u are other factors (eg. nnate ablty, etc.) Some Good Ponts of Multple Regresson Analyss (MRA) 1) MRA can control many factors. ) More of the varaton n y can be explaned. 3) MRA can have farly general functonal form relatonshps. How s u related to educ and exper? Gven the values of educ and exper, the average value of nnate ablty s zero; E(u educ,exper) = 0 I. Motvaton 3 I. Motvaton 4

General Form: k explanatory varables. A Model wth k explanatory varables y = 0 + 1 + x + + k x k 0 s the ntercept 1,,, k are slope parameters What s u? Consder the model y = 0 + 1 x 1 + x subscrpt : observaton number nd subscrpt: ndependent varable 1 or Zero Condtonal Mean Assumpton E(u, x k ) = 0 That mples that u and any of the ndependent varables, x k s uncorrelated. We can fnd the OLS regresson lne or SRF = + + x I. Motvaton 5 6 Interpretaton Interpretaton: Regresson Analyss Total dfferentate the equaton. Bvarate : wage = 0 + 1 educ 1 the margnal effect of educaton on wage. = + x 1 + x Ceters parbus nterpretaton of 1^ = let x =0 When ncreases by one unt, y changes by unts, controllng for the varable x. Multple : wage = 0 + 1 educ + exper 1 : the ceters parbus effect of educaton on average hourly wage. : the ceters parbus effect of experence on average hourly wage. What s ceters parbus? 7 8

General Model k regressors Interpretaton k regressors = + + x + + x k Consder the model wth k regressors y = 0 + 1 + x + + k x k Ceters parbus nterpretaton of Let x =0,... x k =0 We can fnd the OLS regresson lne or SRF = + + x + + x k = When ncreases by one unt, y changes by unts, holdng x, x 3,..., x k fxed. 9 10 Example: Regress wage on educ, exper, and tenure log(wage) = 0 + 1 educ + exper + 3 tenure Sample: 1 56 Included observatons: 56 Consder the model: Varable Coeffcent Std. Error t-statstc Prob. C 0.8436 0.10419.793 0.0066 log(wage) = 0 + 1 educ + exper + 3 tenure From estmaton: EDUC 0.0909 0.00733 1.5555 0 EXPER 0.00411 0.00173.391437 0.0171 TENURE 0.0067 0.003094 7.13307 0 log() = 0.84 + 0.09educ +.0041exper + 0.0tenure (0.104) (.0073) (.0017) (.0031) n=56; R =.316013 Interpret =0.09. What f exper and tenure each ncreases by one year? R-squared 0.316013 Mean dependent var 1.6368 Adusted R-squared 0.3108 S.D. dependent var 0.531538 S.E. of regresson 0.44086 Akake nfo crteron 1.07406 Sum squared resd 101.4556 Schwarz crteron 1.3984 Log lkelhood -313.548 F-statstc 80.3909 Durbn-Watson stat 1.768805 Prob(F-statstc) 0 11 1

Method of OLS estmaton Method of OLS estmaton Consder the populaton model k regressors y = 0 + 1 x 1 + x + + k x k Choose the estmates to mnmze the sum of squared resduals: subscrpt : observaton number or ndvdual =1,.,n nd subscrpt: regressor = 1,., k = (y x 1 x + + x k ) Sample counterpart for observaton y = + x 1 + x + + x k + We can fnd the OLS regresson lne or SRF = + x 1 + x + + x k Methods to fnd estmates (1) the method of moments, and () the method of OLS least squares There are k+1 equatons and k+1 unknowns. 13 14 Algebrac Propertes of OLS Statstcs Partallng Out Interpretaton = + + x + + x k If 1 > 0, then 1 underpredcts y 1 If < 0, then overpredcted y 1) The sample average of the resduals s zero (( ) = 0). ) The sample covarance between resduals and x s zero ( x = 0) 3) The sample averages of the dependent varable and regressors ( and ) are on the OLS regresson lne. : the effect on y of, holdng x,, x k constant the effect on y of after takng out the effect of x,, x k Defne : Can you fnd,...,? after takng out the effect of x,, x k rˆ 1 y ˆ 1 rˆ 1 15 16

Goodness-of-Ft R Squared Revsted We can thnk of each observaton as beng made up of an explaned part, and an unexplaned part, y yˆ uˆ Defnton R = SSE/SST R = 1 SSR/SST We then defne the followng: y s the total sum of squares (SST) yˆ y s the explaned sum of squares (SSE) y ˆ s the resdual sum of squares (SSR) u Then SST SSE SSR Interpret: It s the proporton of the varaton n y explaned by the explanatory varables (the OLS regresson lne). Words of Cauton Algebrac fact: R never decreases when any varable s added to a regresson. Thus, t s a poor tool for decdng whether a varable should be added to the model. 17 18 Goodness of Ft (contnued) Goodness-of-Ft s the dependent varable We can also thnk of R as beng equal to the squared correlaton coeffcent between the actual y and the values yˆ R y ˆ ˆ y y y y ˆ ˆ y y y We can thnk of each observaton as beng made up of an explaned part, and an unexplaned part, x x r We then defne the followng: r x x x x 1 1 1 s the total sum of squares (SST ) s the explaned sum of squares (SSE ) s the resdual sum of squares (SSR ) 1 1 Then SST SSE SSR 1 1 1 1 1 19 0

Example: Regress wage on educ, exper, and tenure III. Expected Values of the OLS Estmators Consder the model: log(wage) = 0 + 1 educ + exper + 3 tenure Goal: to show that OLS estmators are unbased for the populaton parameters under these four assumptons. (MLR) From estmaton: log() = 0.84 + 0.09educ +.0041exper + 0.0tenure (0.104) (.0073) (.0017) (.0031) n=56; R =.316013 Interpret R Assumptons MLR.1 lnear n parameters MLR. random samplng MLR.3 no perfect collnearty. MLR.4 zero condtonal mean 1 MLR.1 lnear n parameters MLR. Random Samplng y = 0 + 1 x 1 + x + + k x k MLR. Random Samplng A random sample of n observatons {y, x 1,..., x k : =1,,...,n} Random samplng defnton If Y 1, Y,, Y n are ndependent random varables wth a common pdf f(y;, ), then {Y 1, Y,, Y n } s a random sample from the populaton represented by f(y;, ) =1 y 1 1 k = y x x 3.......................... We also say that Y are..d. (ndependent, dentcally dstrbuted) random varables from a dstrbuton. =n y n x n1 x n... X n1 3 4

MLR.3 No Perfect Collnearty There are no exact lnear relatonshp among the ndependent varables and ntercept. MLR.3 allows ndependent varables to be correlated. Example: relate hourly wage to experence wage = 0 + 1 exper + expermt exper: years of experence expermt = 1exper (perfect collnearty) Example: the effect of campagn expendtures on campagn outcomes votea : percent of the vote for canddate A expenda: expendtures by canddate A expendb: expendtures by canddate B totalexpend = expenda + expendb Example: the effect of educaton expendtures and ncome on test score. avgscore: expen: ncome: test score gov t spendng per student famly ncome avgscore = 0 + 1 expen + ncome Mcronumerosty MLR.3 also fals f the sample sze n s too small. 5 6 MLR.4: Zero Condtonal Mean E(u,x,,x k ) = 0 Example: Correct Specfcaton wage = 0 + 1 educ 1 + exper y = 0 + 1 + x E(u,x ) = 0 and x are exogenous explanatory varables. Theorem 3.1: Under assumptons MLR.1 MLR.4, The OLS estmators are unbased estmators of the populaton parameters. E( ) = = 1,,k Incorrect Model y = 0 + 1 + v (omttng exper) v = x E(v, x ) 0 s the endogenous explanatory varable. for any value of the populaton parameter 7 8

Unbasedness A sngle estmate cannot be sad to be unbased. OLS estmates from all possble repeated samples are obtaned. Ths procedure leads to unbased OLS estmators. Irrelevant Varables and Unbasedness Suppose the correct model s y = 0 + 1 + x but we msspecfy the model as y = 0 + 1 + x + 3 x 3 There s no effect on the unbasedness of E( ) = 0 E( ) = 1 E( ) = E( ) = 0 Overspecfyng a model does not affect the unbasedness, but has an effect on varances. 9 30 Omtted Varable Bas Omtted Varable Bas Correct Specfcaton: log(wage) = 0 + 1 educ + ablty y = 0 + 1 + x y: wage : educaton x : nnate ablty x x x x x u x x x x 1 1 1 1 1 1 1 1 1 = + + x Due to lack of data on nnate ablty, we omt x. y = 0 + 1 + v v = x snce E( u ) 0, takng expectatons we have E x1 x1x 1 1 E x1 x1 = + E( ) = 1 + What s the sze of the bas due to omtted relevant varable? where s the slope coeffcent of the smple regresson of x on 31 3

s based Postve or negatve bas: E( ) 1 = + The omtted varable bas s > 0 :postve bas (when?) < 0 :negatve bas (when?) E( ) 1 = + > 0 f corr(,x ) > 0 < 0 f corr(,x ) < 0 E( ) > 1 : has an upward bas. E( ) < 1 : has a downward bas. Two cases where s unbased: Summary of Drecton of Bas 1) f = 0, or ) f = 0 Corr(, x ) > 0 Corr(, x ) < 0 > 0 Postve bas Negatve bas < 0 Negatve bas Postve bas 33 34 Postve or negatve bas: E( ) 1 = + IV. The Varance of the OLS Estmators Example: Regress log(wage) on educ log( ) = 0.583773 + 0.08744educ n=56, R =.0186 What can you say about the OLS slope estmate? MLR.5 homoskedastcty assumpton VAR(u,,x k ) = The varance n the error term u s the same for all combnatons of outcomes of the explanatory varables. 35 36

Example: wage = 0 + 1 educ + exper + 3 tenure VAR(u educ, exper, tenure) = Gauss Markov assumptons: Assumptons MLR. 1 MLR. 5 are collectvely known as the Gauss Markov assumptons for cross secton regresson. If we have the same varance, homoskedastcty 1 = 16 If we have dfferent varances, heteroskedastcty 1 16 (volaton). Exercse: What are E(y x) and VAR(y x)? y = 0 + 1 + x + + k x k E(y x) = 0 + 1 + x + + k x k VAR(y x) = Ths mples that the varance of y does not depend on the combnaton of outcomes of the explanatory varables. Theorem 3.: Under Gauss Markov assumptons, = 1,,k ˆ var( ) SST (1 R ) SST = (x x bar) (SST s the total sample varaton n x ) R s the R squared from regresson of x on all other explanatory varables (ncludng an ntercept) 37 38 Three components: ˆ var( ) SST (1 R ) Example: two regressors and R 1 1 st component : error varance hgh hgh VAR( ) Remedy : One way to reduce error s to add more explanatory varables to the equaton. y = 0 + 1 + x ˆ var( ) SST (1 R ) nd Component SST low SST hgh VAR( ) Remedy : One way to ncrease SST s to ncrease sample sze. 3rd Component R hgh R hgh VAR( ) Remedy : One way to mtgate the problem of multcollnearty s to ncrease sample sze. R 1 : s the R squared from the regresson of on x R 1 1 means that and x are hghly correlated and VAR( )..e., x explans much of the varaton n 39 40

R and Multcollnearty Remedy: Multcollnearty Multcollnearty mples hgh R. Multcollneary refers to hgh correlaton between two or more of the ndependent varables. Eg. R = 0.9 Ths means that 90% of the sample varaton n x s explaned by the remanng ndependent varables appearng n the equaton. We may try to drop ndependent varables to reduce multcollnearty. Cauton: But droppng a varable that belongs n the populaton model leads to bas. Multcollnearty Problem?? Not Really, but lkely: VAR( ^) also depends on and SST 41 4 Dtch or Keep a Varable Example: Loan approval rate Dtch or Keep depends on the queston we want to answer (see next example). Model: y = 0 + 1 + x + 3 x 3 y loan approval rate poverty rate x average ncome average housng value x 3 If housng value(x )and ncome (x 3 ) are hghly correlated, then Var( ) and Var( ) are large. Goal: want to know the effect of poverty rate on loan approval rate. If (poverty rate) s uncorrelated wth x and x 3 (.e., R 1 = 0), then the varance of s accurate. 43 44

Varance n Msspecfed Model: Two Conclusons: Model I: = + + x Model II: = + Tradeoff between bas and varance ( 0) Usng unbasedness as a crteron, s preferred to Usng varance as a crteron, s preferred to (1) When = 0, and are unbased, and Var( ) <Var( ) (Model II s a abetter model) () When 0, Tradeoff: s bas, s unbased, but Var( ) < Var( ) by gnorng that the error varance ncreases n Model II. (Model I could be a better model) 45 46 When 0, two good reasons why x should be ncluded? Estmatng error varance : = E(u ) = (1) Bas n does not shrnk as the sample sze grows. But VAR( ) 0 as n. So multcollnearty nduced by addng x becomes less mportant. () When x s excluded from the model, the error varance ncreases. It s the average of squared errors, whch could be an unbased estmator of. But we cannot observe the errors u. Theorem 3.3 Under the Gauss Markov Assumptons MLR.1 MLR.5 E( ) =, where uˆ 1 ˆ n k 1 n 47 48

The unbased estmator of n the general multple regresson case s s.d. and s.e. of 1 ˆ n n uˆ Degrees of freedom = n (k+1) = n k 1 k 1 Standard devaton (s.d) of depends on that cannot be observed Standard error (s.e) of depends on the estmated value, ^ s.e. ( ) hnges on the assumpton of constant varance (homoskedastcty) can ether decrease or ncrease when another regressor s added to a regresson (for a gven sample). The change depends on the role of SSR and degrees of freedom. Note that heteroskedastcty volates MLR.5 and s.e.( ) s nvald, but ths does not mply that s based. 49 50 IV. Effcency of OLS Theorem 3.4 (Gauss Markov Theorem) OLS estmators ( = 1,,k) are the best lnear unbased estmators (BLUE) of the populaton parameters ( = 1,,k) Summary When MLR.1 MLR.4 hold, OLS estmators are unbased. In the class of competng lnear unbased estmators, OLS s the best wthn ts class of lnear unbased estmators. IOWs, OLS estmator s BLUE. (1) OLS estmator s lnear. () OLS estmator s unbased. (3) OLS has mnmum varance. Wth Assumptons MLR.1 MLR.5, Gauss Markov Theorem suggests that no lnear unbased estmators wll be better than OLS estmators. IV. Effcency of OLS 51 IV. Effcency of OLS 5

Recap of Multple Regresson Analyss Motvaton Mechancs and Interpretaton of OLS Expected Values of the OLS Varances of the OLS The Gauss Markov Theorem 53