coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Similar documents
Multiple Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multiple Regression Analysis

Multiple Linear Regression CIVL 7012/8012

Multiple Regression Analysis: Heteroskedasticity

Steps in Regression Analysis

Multiple Regression: Inference

Econometrics Multiple Regression Analysis: Heteroskedasticity

Statistical Inference with Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Introductory Econometrics

Multiple Regression Analysis

Homoskedasticity. Var (u X) = σ 2. (23)

Statistical Inference. Part IV. Statistical Inference

Inference in Regression Analysis

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Applied Econometrics (QEM)

ECNS 561 Multiple Regression Analysis

CHAPTER 6: SPECIFICATION VARIABLES

THE MULTIVARIATE LINEAR REGRESSION MODEL

Lecture 4: Multivariate Regression, Part 2

Review of Econometrics

The Simple Regression Model. Simple Regression Model 1

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Econometrics. Final Exam. 27thofJune,2008. Timeforcompletion: 2h30min

Recitation 1: Regression Review. Christina Patterson

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Interpreting Regression Results

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Multivariate Regression Analysis

Ch 2: Simple Linear Regression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

ECO375 Tutorial 4 Introduction to Statistical Inference

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Lectures 5 & 6: Hypothesis Testing

Heteroskedasticity. Part VII. Heteroskedasticity

Applied Statistics and Econometrics

ECON The Simple Regression Model

1 Independent Practice: Hypothesis tests for one parameter:

The Linear Regression Model

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

ECON3150/4150 Spring 2016

Inference for the Regression Coefficient

Econometrics Summary Algebraic and Statistical Preliminaries

Measuring the fit of the model - SSR

Lecture 4: Multivariate Regression, Part 2

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

ECO220Y Simple Regression: Testing the Slope

Lecture 5: Hypothesis testing with the classical linear model

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Linear models and their mathematical foundations: Simple linear regression

Model Mis-specification

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Econometrics - 30C00200

Advanced Econometrics I

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Inference in Regression Analysis

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Inference in Regression Model

Regression Models - Introduction

Bias Variance Trade-off

Simple Linear Regression: The Model

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Brief Suggested Solutions

Lecture 2 Multiple Regression and Tests

Simple Linear Regression

ECON3150/4150 Spring 2015

Correlation Analysis

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Homework Set 2, ECO 311, Fall 2014

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Business Statistics. Lecture 10: Course Review

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Multiple Regression Analysis

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

The Simple Regression Model. Part II. The Simple Regression Model

Applied Statistics and Econometrics

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

Motivation for multiple regression

Review of Statistics

Inferences for Regression

Answers to Problem Set #4

Properties of the least squares estimates

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Chapter 8 Heteroskedasticity

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Heteroscedasticity and Autocorrelation

Applied Quantitative Methods II

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

The Statistical Property of Ordinary Least Squares

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Transcription:

Review - Interpreting the Regression If we estimate: It can be shown that: where ˆ1 r i coefficients β ˆ+ βˆ x+ βˆ ˆ= 0 1 1 2x2 y ˆβ n n 2 1 = rˆ i1yi rˆ i1 i= 1 i= 1 xˆ are the residuals obtained when we estimate the regression ˆ ˆ 1=γ 0+ γ ˆ 2x2 The estimated effect of x 1 on y equals the (simple regression) estimated effect of the part of x 1 that is not explained by x 2 Note that the average of the residuals is always 0, hence the expression for the simple linear regression estimator is simplified This interpretation holds in general (with more variables), Multiple Linear Regression 1

Review Conditions under which exclusion of variables preserves unbiasedness of estimators Estimate the following regressions: - - - ~ y= yˆ = ~ β βˆ 0 0 ~ + β x + βˆ 1 1 x 1 1 + βˆ ~ Ifβˆ 2 = 0, then β1 = βˆ 1 (check first order conditions) ~ If x ˆ 1 and x2 are uncorrelated, then β1 = β1 ~ However, in general it will be the case that β βˆ 2 x 2 1 1 Multiple Linear Regression 2

Review - More or Less Variables? In general, and assuming MLR.1 to MLR.4 holds for as many variables as those under consideration: If we do not include a variable and this variable is uncorrelated with the included regressors, then the OLS estimators will be unbiased Remember, if the other factors (in u) are uncorrelated with the regressors, we can still interpret the estimated effects as ceteribus paribus effects If we do not include a variable and this variable is correlated with the included regressors, then the OLS estimators will be biased, except if the coefficient of the variable not included is 0 in the full model Multiple Linear Regression 3

So, always more variables? Even if they are irrelevant (or almost irrelevant) and therefore do not induce bias in the other estimators? No! Why? Variances of the estimators can become large! Can show, under MLR.1 to MLR.5, that: =1,2,,k Is the coefficient of determination from regressing x on all the other regressors. Tells us how much the other regressors explain x Multiple Linear Regression 4

Understanding OLS Variances =1,2,,k Strong linear relations among the independent variables are harmful: a larger R 2 implies a larger variance for the estimators (almost multicollinearity) If some irrelevant variable is uncorrelated with the remaining regressors, then including it maintains the variance unchanged (not interesting case ) Typically, variables that you think would be useful but turn out to seem irrelevant, are highly correlated with variables already included. This is undesirable as the variances of the estimators become large. So, avoid including these variables, since estimators for the other coefficients will be unbiased and display a smaller variance a larger σ 2 implies a larger variance of the OLS estimators a larger SST implies a smaller variance of the estimators (increases with sample size, so in large samples we should not be too worried!!) Multiple Linear Regression 5

Review - The Gauss-Markov Theorem Under MLR.1 to MLR.5 (the so-called Gauss-Markov Assumptions) it can be shown that OLS is BLUE Best Linear Unbiased Estimator Thus, if the 5 assumptions are presumed to hold, use OLS No other linear and unbiased estimator has a variance smaller than OLS Variances here are matrices, we are saying that is a positive semi-definite matrix (implies that all individual OLS parameter estimators have smaller variance than any other linear unbiased estimator for those parameters) Multiple Linear Regression 6

Inference in the Multiple Linear Regression Model 7

Inference in the multiple linear regression model Suppose you want to test whether a variable is important in explaining variation in the dependent variable: E.g., is the effect of tenure on wages statistically significant (i.e., different than zero)? Is the effect of height on wages statistically significant? Or suppose you want to test whether a coefficient has a particular value E.g., is the effect of one additional year of schooling on expected monthly wages equal to 200? Need to take into account sampling distribution of our estimators We will check whether under the maintained hypothesis (or null htpothesis) the observed values of certain test statistics are likely If they are not we say we reect the null Inference 8

Inference in the multiple linear regression model Assumption MLR.6 (Normality) The distribution of the population error u is independent of x 1, x 2,,x k and u is normally distributed with mean 0 and variance σ 2 We write: u ~ Normal (0,σ 2 ) Independence is stronger than MLR.4 (zero conditional mean). It implies MLR.4. Also, normality and independence imply MLR.5 so all the results regarding unbiasedness and variance of the estimators remain valid Normality is unrealistic in many cases (e.g., wages cannot be negative but normality of u could deliver negative wages). However, most results would hold in large samples without the normality assumption Inference 9

Classical Linear Model Assumptions MLR.1 through MLR.6 are the Classical Linear Model assumptions With these assumptions, one can prove that the OLS estimators are the minimum variance unbiased estimators: no other unbiased estimator has a variance smaller than OLS Inference 10

Distribution of OLS estimators Under MLR.1 through MLR.6 it is straightforward to show that: y x ~ Normal (β 0 + β 1 x 1 + + β k x k, σ 2 ) Also, since the OLS estimators are a linear function of the error term u, then (conditional on the x s) : βˆ ~ Normal ( βˆ β ) sd ( β, Var( βˆ )) ( βˆ ) ~ Normal, so that ( 0,1) : where sd stands for standard deviation (square root of the variance, derived in previous classes) Inference 11

Distribution of OLS estimators Now, theσ 2 that appears in the expression for the standard deviation of the estimators must be estimated 2 2 2 Also, conditional on the x s n k 1) σˆ / σ ~ χ which implies: ( ) ( ) βˆ β βˆ β βˆ βˆ β ( n k 1 ( ) sd ( ) ( ) σ Normal (0,1) = = ( ) ( ) ( ) σ 2 ~ t se βˆ ˆ ˆ ˆ ˆ 1 sd β se β n k sd β χn k 1 n k 1 Therefore, conditional on the x s, we have: Degrees of ( βˆ β ) se ( βˆ ) ~ t n k 1 freedom : n k 1(for large n this is similar to a Normal (0,1)) Inference 12

Performing a test on a coefficient 1 - Set the null hypothesis (and the alternative). E.g., H 0 : β = 0 (coefficient on experience in our wage regression) and H 1 : β > 0 2 - Choose significance level α (Probability of reecting the null if the null is actually true) E.g., α=0.05 3 - Look at the sampling distribution of the test statistic t (random variable) involving the parameter: t ( βˆ β ) = k seβ ( ˆ ) n 1 Under the null hypothesis, the test statistic should be small across samples. Reect the null if the observed value of the test statistic is very unlikely (very large). ~ t Inference 13

Performing a test on a coefficient 4 - For one-sided tests where the alternative is favoured if t obs is large and positive (e.g., H 1 : β > 0), reect the null if the observed test statistic, t obs, is larger than c, where c is implicitly given by: Prob[t>c H 0 is true]=α For one-sided tests where the alternative is favoured if t obs is large and negative (e.g., H 1 : β < 0), reect the null if the observed test statistic, t obs, is smaller than -c, where c is implicitly given by: Prob[t<-c H 0 is true]=α For two-sided tests, where the alternative is favoured if t obs is large in absolute value (e.g., H 1 : β 0), reect the null if the absolute value of observed test statistic, t obs, is larger than c, where c is implicitly given by: Prob[ t >c H 0 is true]=α Inference 14

One-Sided Alternative y i = β 0 + β 1 x i1 + + β k x ik + u i H 0 : β = 0 H 1 : β > 0 t obs here: Fail to reect the null ( βˆ β ) Test statistic : t = tn k 1 seβ (1 α) ( ˆ ) ~ t obs here: Reect the null α - Distribution of the test statistic under the null 0 Inference 15 c

Two-Sided Alternatives y i = β 0 + β 1 X i1 + + β k X ik + u i H 0 : β = 0 H 1 : β 0 t obs here: Reect the null α/2 ( βˆ β ) Test statistic : t = tn k 1 seβ t obs here: Fail to reect the null (1 α) ( ˆ ) ~ t obs here: Reect the null α/2 -c - Distribution of the test statistic under the null 0 Inference 16 c

Example: Hypothesis testing Dependent variable: Log of wages The t ratios are the observed values of the test statistic for testing β = 0, e.g., 96.75=0.07614/0.00079 Inference 17

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0 (coefficient on education in our wage regression) against H 1 : β 0 t obs =(0.07614-0)/0.00079=96.75 t obs >1.96 so we reect the null. We say the coefficient for education is significant at the 5% level We use Normal approximation since n is large 0.025 0.95 0.025 -c=-1.96 - Distribution of the test statistic under the null 0 c=1.96 Inference 18

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0 (coefficient on education in our wage regression) against H 1 : β > 0 (clearly more reasonable ) t obs =(0.07614-0)/0.00079=96.75 t obs >1.645 so we reect the null. We use Normal approximation since n is large 0.95 0.05 - Distribution of the test statistic under the null 0 c=1.645 Inference 19

Example: Hypothesis testing Choose α=0.05 Test H 0 :β = 0.07 (coefficient on education in our wage regression) against H 1 : β 0.07 t obs =(0.07614-0.07)/0.00079=7.772 t obs >1.96 so we reect the null. We use Normal approximation since n is large 0.025 0.95 0.025 -c=-1.96 0 c=1.96 Inference 20

P-Value Given the observed value of the t statistic, what would be the smallest significance level at which the null H 0 :β = 0 would be reected against the alternative H 1 : β 0? This is the P-Value It is given by Prob[ t > t obs H 0 true] P-Value /2 P-Value /2 1- P-value - t obs If the α>p-value we would reect the null! t obs Inference 21

Confidence intervals A (1 - α) % confidence interval is defined as: βˆ in a ± c. se t n k 1 ( βˆ ), where c is distribution the 1- α 2 percentile If the hypothesized value of a parameter (b ) is inside the confidence interval, we would not reect the null β = b against β b at the significance level α Inference 22

Testing multiple exclusion restrictions Unrestricted model: Restricted model: H 1 : Not H 0 Under the null: r stands for restricted and ur for unrestricted, q is number of restrictions Does SSR ur decrease enough compared to SSR r? If F obs is too large we reect the null Inference 23

Testing multiple exclusion restrictions H 1 : Not H 0 Obtained by dividing numerator and denominator above by SST This is different from testing significance of each coefficient individually!! It is a test of oint significance Inference 24

Testing multiple exclusion restrictions: F test f(f) Reect the null if the observed test statistic, Fobs, is larger than c, where c is implicitly given by: Prob[F>c H 0 is true]=α fail to reect (1 α) 0 c α reect F Inference 25

Testing multiple exclusion restrictions: Example Dependent Variable: log of monthly wages, n=11064 Inference 26

Testing multiple exclusion restrictions: Example α=0.05 Inference 27

Overall Significance of the model Use: H 1 is: Not H 0 Under the null Testing general linear restrictions: in the practical sessions! Inference 28