Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

Similar documents
Econometrics Midterm Examination Answers

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 17-1: Regression with dichotomous outcome variable - Logistic Regression

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 12::(1) Analysis of Variance (2) Chi-Square Tests for Independence and for Goodness-of- t

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 10-11: Statistical Inference: Hypothesis Testing

Föreläsning /31

Econometrics Homework 1

1 A Non-technical Introduction to Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

1. The Multivariate Classical Linear Regression Model

Quantitative Techniques - Lecture 8: Estimation

Problem set 1 - Solutions

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 17-2: Index Numbers

Ch 2: Simple Linear Regression

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Introductory Econometrics

Lösningsförslag till skriftlig tentamen i FINANSIELL STATISTIK, grundnivå, 7,5 hp, torsdagen 15 januari 2009.

Lecture 10 Multiple Linear Regression

. a m1 a mn. a 1 a 2 a = a n

a11 a A = : a 21 a 22

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Economics 620, Lecture 13: Time Series I

Economics 620, Lecture 4: The K-Varable Linear Model I

Simple Linear Regression Analysis

Correlation Analysis

LECTURE 13: TIME SERIES I

i) the probability of type I error; ii) the 95% con dence interval; iii) the p value; iv) the probability of type II error; v) the power of a test.

Section Least Squares Regression

Study Notes on Matrices & Determinants for GATE 2017

Inference in Regression Analysis

Department of Economics Queen s University. ECON435/835: Development Economics Professor: Huw Lloyd-Ellis

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Biostatistics 380 Multiple Regression 1. Multiple Regression

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Linear Algebra. James Je Heon Kim

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Linear Algebra Review

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inference in Normal Regression Model. Dr. Frank Wood

Elementary Row Operations on Matrices

Testing Linear Restrictions: cont.

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Multiple Linear Regression

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

MULTIVARIATE POPULATIONS

The Multiple Regression Model

Lecture 6 Multiple Linear Regression, cont.

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Lecture Notes Part 2: Matrix Algebra

ECON The Simple Regression Model

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Lecture 14 Simple Linear Regression

Regression Models - Introduction

Multiple Choice Questions (circle one part) 1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e

Simple and Multiple Linear Regression

6. Multiple Linear Regression

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

STAT 350: Geometry of Least Squares

ECO220Y Simple Regression: Testing the Slope

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Web Appendix to Multivariate High-Frequency-Based Volatility (HEAVY) Models

Collection of Formulae and Statistical Tables for the B2-Econometrics and B3-Time Series Analysis courses and exams

BNAD 276 Lecture 10 Simple Linear Regression Model

Inference for Regression

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

MATRICES. a m,1 a m,n A =

Ch 3: Multiple Linear Regression

Basic Econometrics - rewiev

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Chapter 1. The Noble Eightfold Path to Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Notes on Time Series Modeling

The linear regression model: functional form and structural breaks

Chapter 14. Linear least squares

1 Correlation between an independent variable and the error

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Basic Business Statistics 6 th Edition

Intro to Linear Regression

Multiple Linear Regression

Applied Statistics and Econometrics

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Violation of OLS assumption- Multicollinearity

Review of Linear Algebra

Econometrics Lecture 1 Introduction and Review on Statistics

11. Bootstrap Methods

Econometrics of Panel Data

Addition and subtraction: element by element, and dimensions must match.

Math 3330: Solution to midterm Exam

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

STA 4210 Practise set 2a

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Inferences for Regression

Section 9.2: Matrices.. a m1 a m2 a mn

Transcription:

Finansiell Statistik, GN, 5 hp, VT28 Lecture 5: Multiple Linear Regression & Correlation Gebrenegus Ghilagaber, PhD, ssociate Professor May 5, 28

Introduction In the simple linear regression Y i = + X i + i () the least square estimates of the parameters were obtained from the normal equations: and Y i = na + b X i Y i = a X i + b X i (2) X 2 i (3) 2

2 The Models and its assumptions Suppose, we extend the simple linear regression model by including one explanatory variable. Then, the population multiple regression model becomes and its sample estimate is given by Y i = + X i + 2 X 2i + i (4) by i = a + b X i + b 2 X 2i + e i (5) The sum of squares of the error terms is, then, given by: e 2 i = b Y i 2 X n Y i = (Y i a b X i b 2 X 2i ) 2 ; (6) 3

and the least square estimates of the parameters are obtained from the normal equations: Y i = na + b X i + b 2 X 2i (7) and X i Y i = a X i + b Xi 2 + b 2 X i X 2i (8) X 2i Y i = a X 2i + b X i X 2i + b 2 X2i 2 (9) 4

3 Standard ssumptions for Multiple Regression (with two explanatory variables) Normality - the i are normally distributed. Zero mean - the i have zero mean, E( i ) = Constant Variance (Homoscedasticity): the i are normally distributed.with mean and constant variance, i s N(; 2 ) Independence: the i are independent, Cov( i ; j ) = for i 6= j: 5

X i and i are uncorrelated: The X i are either xed or random but uncorrelated with the i, Cov( i ; X i ) = No Multicollinearity: the explanatory variables X and X 2 are not strongly correlated 6

4 Estimating Multiple Regression Parameters In obtaining the least square estimates of the parameters in multiple regression, it is easier to work with the deviations: y i = Y i Y ; x i = X i X ; and x 2i = X 2i X 2 instead of Y i ; X i ; and X 2i : In such a case, and y i = x 2i = Yi Y = ; X2i X 2 = x i = Xi X = ; 7

Thus, equations (8) and (9) may be rewritted as, x i y i = b x 2 i + b 2 x i x 2i () x 2i y i = b x i x 2i + b 2 x 2 2i () 8

From equations () and () we get, b = b 2 = x 2 2i x 2i y i 2 x 2 i x 2 2i x i x 2i x i y i x 2 i x i y i 2 x 2 i x 2 2i x i x 2i x 2i y i x i x 2i x i x 2i and (it can be shown that), a = Y b X b 2 X 2 9

5 Decomposing the total variance - the NOV Table s we did in Simple Linear Regression, we can now decompose the total variance into its various sources and create an NOV table: Y i Y = Y i b Y i + b Y i Y = Y i b Y i + b Y i Y so that Yi Y 2 = n X = +2 Yi Yi b Y i 2 + n X b Y i Y 2 Yi b Y i b Y i Y b Y i 2 + n X b Y i Y 2

The corresponding NOV-table will then be given by Source of Degrees Sum of Mean F-ratio variation of freedom Squares Squares Regression k SS R = Error n k SS E = b Y i Y 2 MS R = SS R k Yi b Y i 2 MS E = SS E n k F = MS R MS E Total n SS T = Yi Y 2 MS T = SS T n Note that the degrees of freedom and the sum of squares are additive but not the mean squares: k + (n k ) = n ; SS R + SS E = SS T ; MS R + MS E 6= MS T

6 The Residual Standard Error, S 2 e; & the coef- cient of multiple determination, R 2 oth Se 2 and R 2 may be used to evaluate the goodness-of- t of our multiple regression model. The Residual Standard Error, Se; 2 is just the standard deviation of the error terms: S e = v Yi b 2 Y u i t n k = s SSE n k = s SSE ; when k = 2. n 3 2

The coe cient of multiple determination, R 2, is given by R 2 = SS R SS = E SS T SS T It gives the proportion (percentage) of the total variation in the dependent variable (Y) that is explained by the explanatory variables X and X 2. The larger the value of R 2, the better the t of the model. The adjusted, Radj 2, that takes due account of the degrees of freedom, is given by Radj 2 SS = E = (n k ) SS = E n SS T = (n ) SS T n k! SS = R n SS T n k = R 2 n n k = R 2 n n k 3

gain, note that R 2 adj R2, indicating that the unadjusted R 2 is an overestimate. oth S e and R 2 measure the goodness-of- t for a regression model, but S e is an absolute measure while R 2 is a relative measure. 4

7 Testing for the overall model-signi cance To test, the appropriate test statistics is H : = 2 = ::: = k = H : i 6= ; for at least one i. F = MS R MS E = Y b i Y 2 Yi Y b 2 C i which is to be compared with F (k;n k ;) : n k k 5

This is a global test in the sense that if the test is signi cant (H is rejected), we don t yet know which of the i is (are) signi cantly di erent from. Note also that the test statistics may be related to the coe cient of multiple determination, R 2, as follows: F = = Y b i Y 2 Yi Y b 2 C i R 2! n k R 2 k n k k = 2 byi Y (Y i Y ) 2 (Y i Y ) 2 X n 2 byi Y (Y i Y ) 2 C n k 6 k

8 Tests on sets of individual regression coe - cients To test, say H : i = H : i 6= for the individual coe cients, we may use the t-statistic: t = b i S(b i ) 7

and compare the calculated value of t with that of t (n k ; 2 ): The standard errors of the individual estimates are given by and S(b ) = S(b 2 ) = Se 2 x 2 2i x 2 i x 2 2i Se 2 x 2 i x 2 i x 2 2i where, x i = X i X ; and x 2i = X 2i X 2 : x i x 2i x i x 2i 2 2 8

9 Con dence Interval for the mean response Once we get the least square estimates of the model parameters, the estimated regression model is given by by i = a + b X i + b 2 X 2i This model may be used, among others, to predict values of Y for given values of X and X 2 : Thus, for new values X ;n+ and X 2;n+, the predicted value of Y is given by: by n+ = a + b X ;n+ + b 2 X 2;n+ Since, Y b i is a statistic (computed from a sample) it is subject to variation. This variation is measured by its standard error which is given by s Se S byn+ = S Y = 2 s MSE n = n(n k ) 9

This may, then, be used to construct ( )% con dence interval for the predicted population mean response, E(Y n+ jx ; n+ ; X 2 ; n+ ) as! S e by n+ t (n 3; 2 ) p ; Y b S e n+ + t n (n 3; 2 ) p n 2

Example(s) i Y i X i X 2i 2 3 2 3 2 4 3 5 3 5 4 4 4 5 7 5 2 X 2 5 5 Mean 4:2 3 3 y i x i x 2i x 2 i x 2 2i x i y i x 2i y i x i x 2i (a) Fit a Simple Linear Regression: b Y i = a + b X i and estimate all relevant quantities (NOV, S 2 e ; R2, etc...). (b) Do same with b Y i = a 2 + b 2 X 2i (c) Fit a Multiple Linear Regression b Y i = a 3 + b 3 X i + b 4 X 2i (with NOV, S 2 e; R 2, etc...) and compare the results with those in (a) and (b) 2

Introduction to Matrix lgebra 2 (This section is Extra! It is not part of the course, but it may be helpfull to know!!!) 2. De nition & Notation matrix is is a rectangular array of numbers. If has n rows and p columns, we say it is of order n x p: For instance, n observations on p 22

variables give an n x p matrix as follows: = a a 2 : : : a p a 2 a 22 : : : a 2p : : : : : : : : : a n a n2 : : : a np C vector is a matrix with only one row or column: a = a a 2 : : : a c 23

is a row-vector, while b = b b 2 : : : b r C is a column-vector 24

2.2 Elementary Operations with Matrices If = a : : : a p : : : : : : a n : : : a np C and = b : : : b p : : : : : : b n : : : b np C then, their sum is given by + = a + b : : : a p + b p : : : : : : a n + b n : : : a np + b np C 25

For a constant, c c = ca : : : ca p : : : : : : ca n : : : ca np C Further, if number of columns in is equal to number of rows in (p = n) then their product is given by a b +a 2 b 2 +::: + a p b p ::: a b p +a 2 b 2p +::: + a p b pp : : *= : : : : a n b +a n2 b 2 +::: + a np b p ::: a n b p +a n2 b 2p +::: + a np b pp 26

2.3 Row Exchanges, Inverse, Transpose The transpose of an r x c matrix is denoted by and is the c x r matrix formed by interchanging the roles of rows and columns: = a a 2 : : : a n a 2 a 22 : : : a n2 : : : : : : : : : a p a 2p : : : a np C 27

The inverse of matrix is denoted by and is such that = = I = : : : : : : : : : : : : : : : : : : is the identity-matrix whose elements are -s in the main-diagonal and -s elsewhere C 28

2.4 Square Matrices, Symetric Matrices, etc... matrix is said to be square matrix if its number of rows and columns are equal matrix is said to be symetric matrix if it = (if it is equal to its transpose) 29

2.5 Determinants The determinant of a matrix is denoted by det() or jj and is de ned only for square matrices, For a 2 x 2 matrix its determinant is given by = a a 2 a 2 a 22! det () = jj = a a 22 a 2 a 2 3

while for a 3 x 3 matrix its determinant is given by = a a 2 a 3 a 2 a 22 a 23 a 3 a 32 a 33 C det () = jj = a a 22 a 33 + a 2 a 23 a 3 + a 3 a 2 a 32 a 3 a 22 a 3 a a 23 a 32 a 2 a 2 a 33 Computation of larger matrices gets more complicated but there are special methods 3

2.6 Eigen-values and eigen-vectors 2.7 Positive-de nite matrices 32

3 The Matrix-approach to Linear Regression 3. Model formulation Let Y = y y 2 : : : y n C variable (dependent variable), be a column-vector of n observations of the response 33

X = x 2 : : : x p x 22 x 2p : : : : : : : : : x n2 x np an n x (p+) matrix of explanatory variables (including a constant for the intercept), C = 2 C a column-vector of regression coe cients (one intercept and p 34

p slopes), and = 2 : : : n C a column-vector of disturbance (error) terms. Then, the multiple regression model may be written in matrix form as Y = X + 35

3.2 Model ssumptions Soime of the standard assumptions are E () = = : : : C ; 36

and Cov () = E = 2 I = 2 : : : : : : : : : : : : : : : : : : C Thus, E (Y) = E (X + ) = E (X ) + E () = E (X ) = X 37

3.3 Estimation of Parameters If e = Y c Y = Y X b = y by y 2 by 2 : : : y n by n is the estimated vector of error terms, then the vector of coe cients is estimated by minimizing the sum of squares of these error terms (Least Square method): e e = Y X b Y X b = Y Y 2 b X Y + b X X b C 38

This sum of squares is then minimized by di erentiating e e with respect to b, equating to and solving for b : so that b e e = =) 2X Y + 2 b X X = b = X Y X X = X X X Y and the tted regression model is given by cy = X b = X X X X Y and E b = E X X X Y = X X X E (Y) = X X X X = showing that the least square estimate b is an unbiassed estimator of the true parameter. 39

3.4 Numerical Examples Let Y = 5 7 9 3 5 7 9 2 23 C ; Y 2 = 28 25 22 9 6 3 7 4 C ; X = 2 3 4 5 6 7 8 9 C ; =! ; =! Then, Y = X =) b = X X X Y 4

where X X = 2 3 4 5 6 7 8 9! 2 3 4 5 6 7 8 9 C = 55 55 385! and X X = 55 55 385! = :466667 :666667 :666667 :222! 4

while X Y = 2 3 4 5 6 7 8 9! 5 7 9 3 5 7 9 2 23 C = 4 935! 42

Thus, b = =! b = X b X X :466667 :666667 Y = :666667 :222! :466667 4 :666667 935 = 3! :666667 4 + :222 935 2! 4 935! =) b = 3 and b = 2 43

Similarly, Y 2 = X =) b = X X X Y 2 where X Y 2 = 2 3 4 5 6 7 8 9! 28 25 22 9 6 3 7 4 C = 45 55! 44

so that b = b b! = X X X Y 2 == :466667 45 :666667 55 = :666667 45 + :222 55 =) b = 3 and b = 3 :466667 :666667 :666667 :222! = 3 3!! 45 55! The results are intuitively appealing since... 45