L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014
What we have learned last time... Population regression line Sample regression line The term u i We wished to find ˆβ 1 and ˆβ 2 so that û i can be minimal. Feng Li (SAM, CUFE) Econometrics 2 / 18
Today we are going to learn... 1 To find the best β 1 and β 2 2 The properties of ordinary least squares 3 The assumptions for the linear regression model 4 Standard errors of OLS 5 Determination of Goodness of fit Feng Li (SAM, CUFE) Econometrics 3 / 18
To find the best β 1 and β 2 ï The problem We knew the population regression function is not easy to have. Instead we estimate it from the sample regression function, i.e. Y i = ˆβ 1 + ˆβ 2 X i + û i We wish to have small û i for i = 1, 2,..., n It s difficult to have a fair solution: your regression line resulting some û i are very small, but others are big, which is unfair. Feng Li (SAM, CUFE) Econometrics 4 / 18
To find the best β 1 and β 2 ï Using the ordinary least squares method Recall that the difference between the population mean Y i and the estimated conditional mean Ŷ i One possible solutions it to let n ř û i =Y i Ŷ i =Y i ˆβ 1 ˆβ 2 X i û 2 i to be a minimal so that every observation is considered. Is this good and why not to minimize ř n This yields to minimize nÿ nÿ û 2 i = (Y i Ŷ i ) 2 = nÿ (Y i ˆβ 1 ˆβ 2 X i ) 2 u2 i? Feng Li (SAM, CUFE) Econometrics 5 / 18
To find the best β 1 and β 2 ï Using the ordinary least squares method This is straightforward by applying differential calculations (details in Appendix 3A), i.e. B ř n û2 i Bˆβ 1 = 2 B ř n u2 i Bˆβ 2 = 2 nÿ (Y i ˆβ 1 ˆβ 2 X i ) = 0 nÿ (Y i ˆβ 1 ˆβ 2 X i )X i = 0 Simplify these equations we have (how?) nÿ Y i =nˆβ 1 + ˆβ 2 nÿ Y i X i =ˆβ 1 Can you obtain ˆβ 1 and ˆβ 2 now? n ÿ ÿ n X i X i + ˆβ 2 ÿ n X 2 i Feng Li (SAM, CUFE) Econometrics 6 / 18
To find the best β 1 and β 2 ï Using the ordinary least squares method That is easy, from the first equation, we have ˆβ 1 = 1 nÿ 1 nÿ Y i ˆβ 2 X i = Ȳ ˆβ 2 X n n Plug this result into the second equation in previous slides nÿ nÿ nÿ Y i X i =(Ȳ ˆβ 2 X) X i + ˆβ 2 Solve ˆβ 2 nř ř Y i X i Ȳ n X i ř n n ř Y i X i nȳ n X i ř n n Y i X i nř Y i nř ˆβ 2 = nř X 2 i ř X n = ř X i n n X 2 i n ř X n = ř X i n n X 2 i ( ř n X i ) 2 = nř (X i X)(Y i Ȳ).Verify this! nř (X i X) 2 Feng Li (SAM, CUFE) Econometrics 7 / 18 X 2 i X i
To find the best β 1 and β 2 ï Using the ordinary least squares method If we let x i = X i X and y i = Y i Ȳ, then the previous result can be written as ˆβ 2 = ř n x iy i ř n x2 i Further more (homework!), ˆβ 2 = ř n x iy i ř n x2 i = ř n x ř n iy i ř n X2 i n X = X iy i ř 2 n X2 i n X 2 and n ˆβ 1 = Ȳ Xř ř x iy i n. x2 i Have you noticed that, the OLS does not depend on the assumption on u i? Feng Li (SAM, CUFE) Econometrics 8 / 18
The properties of ordinary least squares (OLS) The regression line finally can be expressed as Ŷ i = ˆβ 1 + ˆβ 2 X i where ˆβ 1 and ˆβ 2 are determined from previous slides. The regression line goes through the sample means of Y and X, i.e., Ȳ = ˆβ 1 + ˆβ 2 X holds. (Why?) The mean of our estimated Y, ( 1 n ř Ŷi ) is equal to the mean of Y, ( 1 n ř Yi ), because (verify this!) 1 ÿ Ŷi = 1 ÿ (ˆβ 1 + ˆβ 2 X i ) = 1 ÿ (Ȳ ˆβ 2 X + ˆβ 2 X i ) n n n = 1 ÿ 1 ÿ Ȳ ˆβ 2 ( X X i ) = 1 ÿ 1 ÿ Ȳ = Yi. n n n n The mean of the residuals û i is zero which is directly verified by an equation in slide 6. (which one?) Feng Li (SAM, CUFE) Econometrics 9 / 18
The properties of ordinary least squares (OLS) It is easy to have y i = ˆβ 2 x i + û i. Think about the equation in the first property and Ȳ = ˆβ 1 + ˆβ 2 X + û i. (How?) The residuals û i are uncorrelated with the predicted Y i. Just show that ř ûi ŷ i = 0. (How?) The residuals û i are uncorrelated with X i. Just show that ř û i X i = 0. (How?) Feng Li (SAM, CUFE) Econometrics 10 / 18
The assumptions for the linear regression model 1 The linear in linear regression model means linear in the parameters. 2 The regressor X is fixed (not random); X and the error term are independent, i.e., cov(x i, u i ) = 0. 3 Zero mean value of disturbance u i, i.e., E(u i X i ) = 0 4 Homoscedasticity (constant variance of u i ), i.e., var(u i ) = E(u i E(u i X i )) 2 = E(u 2 i X i) = σ 2. Feng Li (SAM, CUFE) Econometrics 11 / 18
The assumptions for the linear regression model 1 No autocorrelation between the disturbances, i.e., cov(u i, u j X i, X j ) = 0 for i j. 2 The number of observations n must be greater than the number of parameters. 3 The X values must not be all the same. (What will happen if all X i are the same? ) Feng Li (SAM, CUFE) Econometrics 12 / 18
Time to think about the assumptions again 1 Are these too realistic? 2 Can our data satisfy all of those assumptions? 3 What will happen if we break some of them? Feng Li (SAM, CUFE) Econometrics 13 / 18
Standard errors of OLS 1 Given the Gaussian assumptions, it is shown (Appendix 3A) that var(ˆβ 2 ) = σ2 ř x 2 i ñ se(ˆβ 2 ) = σ ař x 2 i ř d X 2 ř var(ˆβ 1 ) = i X n ř x 2 σ 2 2 i ñ se(ˆβ 1 ) = i n ř x 2 σ i ř û2 i n 2 2 The variance of u i, (σ 2 ) is estimated by ˆσ 2 =, where n 2 is known as the degrees of freedom, and ř û 2 i is called the residual sum of squares b ř û2 (RSS). Further more ˆσ = i n 2 is called the standard error (se) of the regression. 3 The parameters ˆβ 1 and ˆβ 2 are dependent on each other, that is (Section 3A.4) cov(ˆβ 1, ˆβ 2 ) = X var(ˆβ 2 ) = X σ2 ř x 2 i Feng Li (SAM, CUFE) Econometrics 14 / 18
Determination of Goodness of fit ï The idea 1 The total sum of squares (TSS) is the variation of Y about there sample mean, i.e., ÿ y 2 i = ÿ ÿ ŷ 2 i + û2 i (verify this!) ÿ (Yi Ȳ) 2 = ÿ (Ŷ i Ȳ) 2 + ÿ û2 i T SS =ESS + RSS 2 A good model should be ESS Ñ TSS, RSS Ñ 0 (but this is not the sufficient condition). Feng Li (SAM, CUFE) Econometrics 15 / 18
Determination of Goodness of fit ï The goodness of fit coefficient, r 2 1 Define the coefficient of determination of goodness of fit r 2 (0 ď r 2 ď 1) as 2 Properties of r 2 r 2 = ESS TSS = 1 RSS TSS 1 r 2 can be linked with ˆβ 2 : r 2 = ˆβ 2 2 ř x 2 ř i y 2 i 2 r 2 can be linked with sample variance of X and Y: r 2 = ˆβ 2 S 2 x 2 S 2 y 3 The coefficient of correlation for X and Y is actually r =?r 2? ř x 2 i ř y 2 i 1 Its traditional formula is r = ř x i y i 2 correlation can be positive and negative, 1 ď r ď 1 3 r xy = r yx. 4 Correlation coefficients can only determine linear correlation. Feng Li (SAM, CUFE) Econometrics 16 / 18
The correlation coefficient, r ï A visual example Feng Li (SAM, CUFE) Econometrics 17 / 18
Take home questions 1 Verify the properties in slides 9 and 10. 2 Do the numerical example in the end of Chapter 3 with Excel or a calculator. 3 Exercises (S1): 2.7, 2.13, 3.1, 3.6, 3.7, 3.14, 3.16, 3.19 4 How do you appliy maximum likelihood method to find the coefficients? Feng Li (SAM, CUFE) Econometrics 18 / 18