Lesson 11: Simple Linear Regression

Size: px

Start display at page:

Download "Lesson 11: Simple Linear Regression"

Chastity Knight
5 years ago
Views:

1 Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested i allowig the expected value to vary with some variables. For istace, i the discussio o the mea icome, sometimes we may wat to kow how icome is related to the level of educatio. Kowig how icome is related to the level of educatio will allow us to predict better a perso s icome give his/her educatioal backgroud. I additio, the Huma Capital Theory i Ecoomics tells us there should be a positive relatio betwee icome ad educatio which is viewed as huma capital. 1 Recall that if we have several radom variables described by a multivariate distributio, we ca talk about coditioal expectatios. Recall the defiitio of coditioal expectatio for the discrete case with two radom variables X ad Y. Defiitio 1 (Coditioal Expectatio): For two discrete radom variables that are joitly distributed with a bivariate probability distributio, the coditioal expectatio or coditioal mea E(X Y = y j ) is computed by the formula: E(X Y = y j ) = x xp X Y (x y j ) = x 1 P X Y (x 1 y j ) + x 2 P X Y (x 2 y j ) x N P X Y (x N y j ) Sometimes, we write µ X Y =yj = E(X Y = y j ). The ucoditioal expectatio or mea of X is related to the coditioal mea. E(X) = y E(X Y = y)p X Y (y) = E[E(X Y )] 1 See Becker, Gary (1964): Huma Capital, 1st editio (NBER). ECON1003 Lesso 11: Simple Liear Regressio 1

2 For cotiuous radom variables, the coditioal expectatio or coditioal mea E(X Y = y) is computed by the formula: E(X Y = y) = xf(x y)dx x I the huma capital example, Y will be icome, X will be years of schoolig. Whe our iterest is the expected value (or called populatio mea) of a radom variable, we use the sample average as a estimator. Whe our iterest is the coditioal expected value (or called the coditioal populatio mea), we ca use the coditioal sample average as a estimator. Example 1 (coditioal sample average I): Suppose we have the followig sample of 10 observatios with two variables, mothly earigs (Y, i dollars) ad geder (X, male=1, female=2). Obs # X Y Obs # X Y The sample average icome coditioal o male is (y 1 + y y 6 )/6 = ( )/6 = The sample average icome coditioal o female is (y 7 + y y 1 0)/4 = ( )/4 = Based o these sample averages, we will coclude that a typical male will ear dollars per moth while female will ear dollars. Geder (X) Coditioal mea E(Y X = x) coditioal sample mea Ê(Y X = x) 1 E(Y X = 1) E(Y X = 2) The geeral formula to compute the coditioal sample average is Ê(Y X = x) = X=x Y i #(X = x) ECON1003 Lesso 11: Simple Liear Regressio 2

3 Example 2 (coditioal sample average II): Suppose we have the followig sample of 40 observatios with two variables, mothly earigs (Y, i dollars) ad years of schoolig (X, from 0 to 20). Obs # X Y Obs # X Y Obs # X Y Obs # X Y The sample average icome coditioal o 0 year of schoolig is (y 1 + y 2 )/2 = ( )/2 = Similarly, we ca compute the sample average icome coditioal o differet years of schoolig X Ê(Y X = x) #obs X Ê(Y X = x) #obs We make several observatios of the above example. ECON1003 Lesso 11: Simple Liear Regressio 3

4 1. Some of the coditioal sample average are based o oly oe observatio. Usig it as a estimate of the coditioal expectatio is very imprecise. 2. Some of the coditioal sample average are missig because the iavailablity of data. For istace, we have o observatio of X = 7. What ca we do to improve our estimatio of the coditioal mea? It turs out that the estimatio may be improved if we are willig to assume some relatioship betwee X ad Y. A liear relatioship is commoly assumed betwee two variables E(Y X) = β 0 + β 1 X (1) I example 2, we used 40 observatios to produce 20 coditioal meas. O average, we have two observatio to estimate each coditioal mea ad we could ot estimate the populatio mea coditioal o X = 7 because there is o observatio coditioal o X + 7. If we are willig to assume a liear relatioship betwee E(Y X) ad X as i equatio (1), we oly eed to estimate two parameters β 0 ad β 1. O average, we will be usig 20 observatios to estimate oe parameter. Oce we have the estimates of β 0 ad β 1, we ca produce the coditioal mea of Y for each X we are iterested i. I additio, we would be able to estimate the populatio mea coditioal o X = 7 eve if we have o observatio coditoal o X = 7. 2 Simulatio 1 (Liear expectatio): We simulatio 10 observatios for each X with differet variace of the error term. E(Y X) = 3 + 2X ɛ N(0, σ 2 ) Y = E(Y X) + ɛ X is assumed to take discrete values of 1,2,...,9. The observatios so geerated are plotted below. Note that the expected values lie o the straight lies. 2 Followig this logic, the estimatio of the coditoal mea will be improved if we are willig to assume ay relatioship betwee X ad Y such that the umber of parameters is greatly reduced. For istace, we may assume E(Y X) = β 0 + β 1 X + β 2 X 2. However, ote that if the true coditioal expectatio is ot related to X as assumed, usig the assumed relatioship to estimate E(Y X) will be wrog. Thus, the choice of the fuctioal form of E(Y X) is extremely importat. That is why we ofte check the liearity assumptio by doig a scatter plot of Y agaist X. Whe E(Y X) is assumed to have a specific fuctioal form with a set of parameters, the regressio is called parametric. Whe E(Y X) is ot assumed ay specific fuctioal form (ad hece o assumed parameters), the regressio is called oparametric. ECON1003 Lesso 11: Simple Liear Regressio 4

5 (a) σ 2 = (b) σ 2 = 4 Figure 1: Distributio of data from a liear regressio model Simulatio 2 (No-liear expectatio): We simulatio 10 observatios for each X with differet variace of the error term. E(Y X) = 3 + 2X ɛ N(0, σ 2 ) Y = E(Y X) + ɛ X is assumed to take discrete values of 1,2,...,9. The observatios so geerated are plotted below. Note that the expected values lie o the the curve (a) σ 2 = (b) σ 2 = 100 Figure 2: Distributio of data from a o-liear regressio model Example 3 (Which datasets are from a liear expectatio model): Guess which datasets are likely from a liear expectatio model. ECON1003 Lesso 11: Simple Liear Regressio 5

6 (a) Dataset # (b) Dataset # (c) Dataset # (d) Dataset #4 Figure 3: Datasets from four differet models It turs out that the datasets are draw from the simulatios reported above. Dataset #3 is ulikely from a liear model. However, oe ca easily coclude that Dataset #4 is likely from a liear model because the oliearity is mild relative to the dispersio of the data. Give that we believe that the uderlyig model is liear, how do we estimate β 0 ad β 1? 1 Estimatio of the simple liear model There are at least two approaches to estimate the liear model: 1. The method of momets 2. The ordiary least squares It turs out that the two differet approach yield the same estimator for the parameters β 0 ad β 1. ECON1003 Lesso 11: Simple Liear Regressio 6

7 1.1 The method of momets Suppose we have observatios of (X, Y ) pair. We ca imagie that the observatios of Y are radom draws from a ormal distributio with mea E(Y X) ad some variace σ 2, i.e., Y N(E(Y X), σ 2 ) Let ɛ = Y E(Y X). We have ɛ N(0, σ 2 ) Thus, a radom draw of Y is like a radom draw of e plus E(Y X). Thus, the assumed liear model (1) meas Y = β 0 + β 1 X + ɛ (2) Note that ɛ has zero mea, i.e., E(ɛ). Thus, oe ca use the coditio E(ɛ) as oe criteria to estimate the parameters. The problem is that oe equatio ca also be used to solve oe coefficiet oly (either β 0 or β 1 ). To solve (get estimate) for two coefficets, we will eed aother coditio. Oe possibility is to assume that ɛ is draw idepedetly of X. That is, E(ɛ X). E(ɛ X) implies E(Xɛ X) ad E(Xɛ) 3. Thus, we have two coditios, E(ɛ) ad E(Xɛ). E(ɛ) = E(Y β 0 + β 1 X) = E(Y ) β 0 + β 1 E(X) E(ɛX) = E[(Y β 0 + β 1 X)X] = E(Y X) β 0 E(X) + β 1 E(X 2 ) Two equatios are just eough to solve the two coefficiets β 0 or β 1. If we have a data sample of obseratios, how do we estimate β 0 or β 1? Note that E(.) is really the populatio average. I our estimatio, we do ot observe ɛ, β 0 ad β 1. What we have are oly observatios of (x i, y i ), i = 1,...,. We ca use the sample aalog (i.e., sample average to replace for the populatio average) to estimate the paramters. That is, we defie e i = y i b 0 + b 1 x i ad compute correspodig sample averages ad set them to equal zero. b 0, b 1 ad e i are sample aalog of β 0, β 1 ad ɛ i the model. Our objective is to fid b 0 ad 3 Note that E(Xɛ) implies Cov(X, ɛ). ECON1003 Lesso 11: Simple Liear Regressio 7

8 b 1, ad hece e i. Ê(e) = e i E(Xe) = x ie i Let s verify this method with somethig we are familair with the estimatio of populatio mea. Example 4 (Estimatio of populatio mea): Suppose Y N(β 0, σ 2 ). Thus, β 0 is the populatio mea of Y. We have observatio of y i, i = 1, 2,...,. We wat to estimate the populatio mea of Y. Fittig ito the liear model framwork, we write Y = β 0 + ɛ (3) Thus, we have oly oe parameter to estimate, i.e., β 0. Let b 0 be a estimate of β 0. First, we write e i = y i b 0. Secod, we will compute the sample average of e i ad set it to zero. e i (y i b 0 ) y i b 0 y i b 0 b 0 = y i Thus, the method yields sample average as a estimator of β 0. Example 5 (Estimatio of the liear model): Suppose Y N(β 0 + β 1 X, σ 2 ). Thus, β 0 + β 1 X is the populatio mea of Y coditioal o X. We have observatio of (x i, y i ), i = 1,...,. We wat to estimate the liear relatioship of coditioal populatio mea. This is exactly the liear model framwork as i equatio 1. Thus, we have two parameter to estimate, i.e., β 0 ad β 1. Let b 0 ad b 1 be estimates of β 0 ad β 1. First, we write e i = y i b 0 b 1 x i. Secod, we will compute the sample average of e i ad ECON1003 Lesso 11: Simple Liear Regressio 8

9 x i e i ad set them to zero. e i (y i b 0 b 1 x i ) (y i b 0 b 1 x i ) (4) e ix i (y i b 0 b 1 x i )x i (y i b 0 b 1 x i )x i (5) Thus, the two equatios (4 ad 5) may be used to solve for the two ukow b 0 ad b 1 This approach is called the method of momets because the estimatio is based o the matchig the sample momets (sample averages) with the populatio momets (E(.)). 1.2 The method of ordiary least squares Aother view is to fid the lie that best fit the data. I the liear model 2, we would like to choose the b 0 ad b 1 so that the error e is miimized. e = Y b 0 + b 1 X Whe we have obseratios of x i, y i (ad hece e i ), aturally, we will have some positive e i ad some egativae e i. A operatioal procedure to miimize e is to choose b 0 ad b 1 such that the sum of squared errors is miimized. S(b 0, b 1 ) = e 2 i = (y i b 0 b 1 x i ) 2 Miimizig the S(b 0, b 1 ) with respective to b 0 ad b 1 yields the followig two first order coditios: S(b 0, b 1 ) b 0 = 2(y i b 0 b 1 x i )( 1) (y i b 0 b 1 x i ) (6) ECON1003 Lesso 11: Simple Liear Regressio 9

10 S(b 0, b 1 ) b 1 = 2(y i b 0 b 1 x i )( x i ) (y i b 0 b 1 x i )x i (7) Note that these two coditios (6 ad 7) are the same as those two coditios usig the method of momets approach (4 ad 5). 2 The coveiece of matrix otatios The use of matrix greatly simplify our aalysis. Our model Y = β 0 + β 1 X + ɛ may be rewritte i matrix otatios ( Y = 1 X ) β 0 β 1 + ɛ = Zβ + ɛ Premultiply with Z, we have Z Y = Z Zβ + Z ɛ The coditio to estimate β is E(Z ɛ). Hece E(Z Y ) = E(Z Z)β + E(Z ɛ) E(Z Y ) = E(Z Z)β β = [E(Z Z)] 1 E(Z Y ) ECON1003 Lesso 11: Simple Liear Regressio 10

11 Suppose we have a sample of observatios (y i, x i ), i = 1,...,. We have y 1 y 2... = 1 x 1 1 x b 0 b 1 + e 1 e 2... y 1 x e or i compact form Y = Zb + e Premultiply by Z, we have Z Y = Z Zb + Z e Z Y = Z Zb b = (Z Z) 1 Z Y (8) where Z e because Z e is the sample aalog of E(Z ɛ) which is assumed to equal zero i the model. 3 Properties of the OLS estimator For coveiece, we use the matrix otatios to discuss the properties of the OLS estimator. 3.1 Ubiasedess Recall the defiitio of ubiasedess. Defiitio 2 (Ubiasedess): A estimator θ = θ(x 1, x 2,..., x ) for a populatio parameter β is called ubiased if E(θ) = β ECON1003 Lesso 11: Simple Liear Regressio 11

12 Thus, b is a ubiased estimator of β if E(b) = β. I the followig discussio, it is coveiet to assume that x i are fixed ad kow. E(b) = E((Z Z) 1 Z Y) = E((Z Z) 1 Z Zβ + ɛ) = E((Z Z) 1 Z Zβ) + E((Z Z) 1 Z ɛ) = E((Z Z) 1 Z Z)β + E[(Z Z) 1 Z E(ɛ Z)] = β Thus, b is ubiased if E(ɛ Z). 3.2 The estimators are ormally distributed Note that b is a ratio of sample meas. If we assume that x i are fixed ad kow, the b will be a weighted average of y i. Thus, for sample with more tha 30 observatios, Cetral Limit Theorem may be applied to coclude that b will be ormally distributed. I showig the ubiasess, we have compute the mea of b. It remais to fid the variace of b. V (b) = V (Z Z) 1 Z Y) = V ((Z Z) 1 Z Zβ + ɛ) = V ((Z Z) 1 Z Zβ) + V ((Z Z) 1 Z ɛ) + 2COV ((Z Z) 1 Z Zβ), ((Z Z) 1 Z ɛ) = V (β) + V ((Z Z) 1 Z ɛ) + 2COV (β, ((Z Z) 1 Z ɛ) = V ((Z Z) 1 Z ɛ) = E[(Z Z) 1 Z ɛ)(z Z) 1 Z ɛ) ] = E[(Z Z) 1 Z ɛɛz(z Z) 1 ] = (Z Z) 1 Z E(ɛɛ)Z(Z Z) 1 ] = (Z Z) 1 Z Iσ 2 Z(Z Z) 1 ] = σ 2 (Z Z) 1 Z Z(Z Z) 1 ] = σ 2 (Z Z) 1 ECON1003 Lesso 11: Simple Liear Regressio 12

13 Thus, b A N(β, σ 2 (Z Z) 1 ) Usually σ 2 is ukow ad has to be estimated based o e i = y i b 0 b 1 x i. S 2 = e i 2 S 2 is also called stadard error of estimate. Why do we have a deomiator of ( 2) istead of ( 1) as i the usual estimate of populatio variace? It is becasue b 0 ad b 1 have to be estimated from data. I the estimatio of the populatio variace σ 2, these two umbers are assumed fixed. Hece, ( 2) reflects the loss of two degree of freedom. 3.3 BLUE The OLS estimator is also kow to be Best Liear Ubiased Estimator. Best because the estimator is a result of miimizig the sum of squared errors ad hece V (b) is the smallest amog all possible ways of obtaiig a estimate of β. Liear because liear model is assumed. Ubiased because b 0 ad b 1 are ubiased estimator of β 0 ad β 1 4 Iferece The kowledge about the distributio of b allows us to do various kids of iferece. The cofidece iterval about β ad testig hypothesis about β are straighforward. Let b = b 0 b 1 N β 0 β 1, V (b 0) C(b 0, b 1 ) C(b 0, b 1 ) V (b 1 ). (9) 4.1 Testig idividual parameters Ofte, we are iterested i testig whether the idividual populatio parameters are differet from zero at 5% level of sigificace. That is, H 0 : β 1 versus H 1 : β 1 0. The joit distributio of b 0 ad b 1 as show ECON1003 Lesso 11: Simple Liear Regressio 13

14 i (9) suggests that b 1 N(β 1, V (b 1 )). Hece we will reject the ull if b 1 0 V (b1 ) > 1.96 or b 1 0 V (b1 ) < 1.96 Sometimes, we would like to test whether the idividual populatio parameters are differet from oe at 5% level of sigificace. That is, H 0 : β 1 = 1 versus H 1 : β 1 1. The joit distributio of b 0 ad b 1 as show i (9) suggests that b 1 N(β 1, V (b 1 )). Hece we will reject the ull if b 1 1 V (b1 ) > 1.96 or b 1 1 V (b1 ) < 1.96 The testig about β 0 is similar. 4.2 Testig a set of parameters Suppose we are iterested i testig whether the two populatio parameters are ot equal at 5% level of sigificace. That is, H 0 : β 1 β 0 versus H 1 : β 1 β 0 0. The joit distributio of b 0 ad b 1 as show i (9) suggests that b 1 b 0 N(β 1, V (b 1 ) + V (b 0 ) 2C(b 0, b 1 )). Hece we will reject the ull if (b 1 b 0 ) 0 V (b1 ) + V (b 0 ) 2C(b 0, b 1 ) > 1.96 or (b 1 b 0 ) 0 V (b1 ) + V (b 0 ) 2C(b 0, b 1 ) < How good is the model? 5.1 Goodess of fit How well doest the model fit the data? A model is better fit of the data whe the implied e i are small. Because the OLS estimators are result of miimizig the sum of squared errors give x i ad y i, the estimator ECON1003 Lesso 11: Simple Liear Regressio 14

15 b is the best fit. However, there are alterative models usig differet x as a explaatory variables. Oe would wat to derive a commo measture to tell which explaatory variable will yield the best fit. Note that our aim is the predict y give x. Without x, we will be usig the sample mea of y as a predictio of y. I this case we will have a sum of squared errors SST = (y i ȳ) 2 With x, we will be usig the sample coditiol mea of y (i.e., b 0 + b 1 x) as a predictio of y. We will have a sum of squared errors SSE = (y i (b 0 + b 1 x i )) 2 It ca be show that SST = SSE + SSR where SSR is the regressio sum of sqaures SSR = A atural measure of goodess of fit is ((b 0 + b 1 x i ) ȳ) 2 R 2 = SSR SST = 1 SSE SST R 2 measures how the percetage of the total sum of sqaured errors that ca be explaied by the explaatory variable(s) i a regressio framework. Note that R 2 lies betwee 0 ad 1. A higher R 2 meas a explaatory variable (x) is better i predictig y. R 2 the explaatory variable x is useless i predictig y. R 2 = 1 the explaatory variable x predicts y perfectly. R 2 is also kow as the coefficiet of determiatio. A relative of R 2 is the correlatio coefficiet r. Defiitio 3 (Correlatio coefficiet): Suppose we have a sample of observatios (x i, y i ), i = 1, 2,...,. The Correlatio Coefficiet (r) is a measure of the stregth of the liear relatioship betwee two variables x ad y. r = (xi x)(yi ȳ) (xi x)2 1 1 (yi ȳ)2 1 = (x i x)(y i ȳ) (x i x) 2 (y i ȳ) 2 ECON1003 Lesso 11: Simple Liear Regressio 15

16 It ca rage from 1.00 to Values of 1.00 or 1.00 idicate perfect ad strog correlatio. Values close to 0.0 idicate weak correlatio. Negative values idicate a iverse relatioship ad positive values idicate a direct relatioship. It ca be show that i the coefficiet of determiatio (R 2 ) is the square of correlatio coefficiet (r 2 ). Note that R 2 is more geeral ad is valid for models with more tha oe explaatory variable, but the correlatio coefficiet applies oly to two variables. 5.2 Validity of assumptios The liearity assumptio We have assumed the Y = β 0 + β 1 X + e. Sometimes, Ecoomic theory or data suggest that the model may ot be liear. For example, i the huma capital example, it is ofte assumed that Y = β 0 +β 1 X +β 2 X 2 +e ad Y is log mothly earigs istead of mothly earigs. How do we kow whether liearity is a satisfactory assumptio? We ofte check by doig a scatter plot of the data y agaist x. If the plot suggest o-liearity, we will have to revised our model Same variace for all observatios homoskedasticity The observatios (x i, y i ), i = 1,..., are assumed to be draw from the same populatio Y N(E(Y X), σ 2 ) Whe this assumptio is ot correct, we will eed to do some adjustmet to our estimatio ad iferece. The assumptio may geerally be checked by plottig the residuals (e i ) agaist x. If we see the residuals to exhibit some patter, we will try to trasform the model or the data. To trasform the data, we ca defie y = log(y) or y = y 2, etc. To adjust the model, oe may add higher order terms (i.e., square terms, cubic terms) to allow for oliearity. ECON1003 Lesso 11: Simple Liear Regressio 16

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.