Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the unknown parameters β and β? The least squares prncple, also known as the method of ordnary least squares (OLS), s to fnd a soluton that mnmzes the sum of squared errors: Ths s a mnmzaton problem. Soluton s a calculus eercse. Mnmzng ponts b and b are obtaned as the soluton to the frst order condtons: S =0 β Dfferentaton gves: and S β = (y = = y S and = 0 β β β β β = = ) S = e = = = ( y β β ) S β = (y = = y β β ) β β = = = ote: negatve and postve errors get equal weght. Econ 36 - Chapter Econ 36 - Chapter

To obtan the mnmzng pont set the above equatons to 0 and evaluate at b and b. Ths gves: and b b = 0 = y = b b = 0 = y = = ow dvde through by and rearrange terms to get the normal equatons: y = b + b () = = y = b + b () = = = The normal equatons can be solved. From equaton () rearrange terms to get: b = y b = = (3) Substtute (3) nto () to get: y = Rearrange to get: y = = (y b y = b ) = = + b ote, the above used the result: ow solve to get the slope estmator: y y = b = (4) = = = = Equatons (3) and (4) are called the ordnary least squares (OLS) estmators for β and β. = y b where y and are the sample means. That s y = y = Equaton (3) s the ntercept estmator. 3 Econ 36 - Chapter 4 Econ 36 - Chapter

The estmators b and b gve an estmaton rule. How are these results used? Collect a numerc data set for an applcaton of nterest. y and are now the numerc observatons. Do the calculatons n equatons (3) and (4). Ths gves the numerc estmates denoted by b and b. These numbers are called ordnary least squares (OLS) estmates. Statstcal ote A numerc data set can be vewed as one sample from the populaton. It s useful to note that the slope estmator can be epressed n a number of equvalent ways. Equaton (4) can be wrtten as: y = y = = = b (4a) = = Another equvalent formula s: (y y) = b = (4b) = Suppose repeated samplng from the populaton was possble. A dfferent sample wll have a dfferent set of numerc data. Ths means that dfferent samples wll yeld dfferent least squares pont estmates. 5 Econ 36 - Chapter 6 Econ 36 - Chapter

Equaton (4b) can be used to get Equaton (4) by usng the results: Another statement for the slope estmator s: = (y y) = = ( y y y + y) cov(, y) b = var() and = = = = = = = = = y = = y = y y + y y ( + ) = = + + where the sample varance s: var( ) = = and the sample covarance s: cov(, y) = (y y) = ote the dvsor s. 7 Econ 36 - Chapter 8 Econ 36 - Chapter

The least squares ftted values or predcted values are: ŷ + = b b for =,..., The least squares resduals are: ê = y = y ŷ b b Propertes of least squares estmaton for =,..., The least squares ftted lne passes through the sample means y, The average value of ŷ s the sample mean y. That s, = = ŷ y The sum of the resduals s zero. That s, The ntercept estmate b Interpretng the Estmates The ntercept estmate may not have a meanngful economc nterpretaton f the sample observatons do not have values around = 0. Eample For the household ependture functon y s weekly household ependture on food (n dollars) and s weekly ncome (n $00). The estmated lnear regresson equaton s: ŷ 83.4 + 0. = for =,..., 40 As a frst guess, you may say that a household wth zero ncome (=0) wll spend about $83.4 each week on food. Thnk agan. An nspecton of the data set shows that, for the sample of 40 households, the mnmum weekly household ncome s 3.69 ( $369) and the mamum ncome s 33.40 ($3,340). = = ê 0 ote: the above propertes requre that an equaton ntercept β ncluded n the lnear regresson equaton. s The ftted regresson lne may not be useful for predctng food ependture at levels of ncome below the mnmum observed value or eceedng the mamum level n the data set. 9 Econ 36 - Chapter 0 Econ 36 - Chapter

The slope estmate b Ths gve a margnal mpact the estmated ncrease n the mean of the dependent varable y for a one unt ncrease n the eplanatory varable. Eample For the estmated food ependture equaton the slope estmate s 0.. The economc nterpretaton s: for a typcal household, weekly ependture on food ncreases by about $0. for an addtonal $00 n ncome. For reportng purposes, values can be rescaled. For eample, an equvalent statement s: a $0 ncrease n weekly household ncome leads to an ncrease n weekly ependture on food by about $.04. Elastcty Economsts are famlar wth the measure of elastcty defned as the percentage ncrease n y for a one percent ncrease n. An elastcty can be obtaned as: dln(y) dy ε = = = β dln() yd y Ths wll vary at every sample observaton. How can a summary measure be found? A fast method s to evaluate the elastcty at the sample means (, y). Ths gves an elastcty estmate calculated as: ε ˆ = y b For the household food ependture eample: 9.60 ε ˆ = b = 0. = 0.7 y 83.57 Ths says that a % ncrease n household ncome wll lead, on average, to a 0.7% ncrease n weekly food ependture. An estmated ncome elastcty less than one suggests that food s a necessty rather than a luury good. Econ 36 - Chapter Econ 36 - Chapter

There can be more than one method for gettng results. Here s another method for estmatng an elastcty from the least squares estmaton results. Defne the observatons: The sample mean s: An elastcty estmate s: ε ˆ Z = z b z = for =,..., y z = z = For the food ependture estmaton results: Z ε ˆ = z b = (0.074)0. = 0.76 Ths method gves a dfferent numercal answer compared to the prevous calculaton. Assessng the Least Squares Estmators The model of economc behavour s epressed as the lnear regresson equaton: where y + = β + β e for =,,..., y and are observable varables y s the dependent varable s the eplanatory varable β and β are unknown parameters (coeffcents) β s the ntercept coeffcent β s the slope coeffcent e s a random error The method of ordnary least squares (OLS) fnds an estmaton rule for β and β to mnmze the sum of squared errors: S e = ( y β ) = = = β 3 Econ 36 - Chapter 4 Econ 36 - Chapter

Soluton gves the least squares (OLS) estmators b and b. The predcted or ftted values are ŷ + = b b for =,..., The resduals are ê = y ŷ = y b b for =,..., The estmators b and b are functons of the y and. e s vewed as a random varable. Therefore y s a random varable and b and b are also random varables and ther statstcal propertes can be analyzed. To establsh some statstcal results a number of assumptons are requred. The standard assumptons are: () The lnear regresson equaton s correctly specfed as: y + β + e = β () E(e ) = 0 for all ( =,,..., ) Ths says the random errors have zero mean. That s, any omtted varables that are captured n e do not systematcally affect the mean value of y. 5 Econ 36 - Chapter 6 Econ 36 - Chapter

(3) var( e ) = σ (sgma-squared) for all (5) must have at least dfferent values. Ths says equal error varance for all observatons. Ths s called homoskedastcty (equal spread). ote that assumpton () mples var(e ) = E [( e E(e )) ] ( ) = Ee (4) cov( e,ej) = 0 for all j for all Ths says the covarance between any two errors s zero. ote that assumpton () mples cov(e,e ) j = E [( e E(e ))( e E(e ))] = E(ee ) j The correlaton between two errors s defned as: j j That s, var() > 0 (5*) s non-random or non-stochastc. That s, the values are fed n repeated samplng. Ths means cov(e, ) = E = 0 [( e E(e ))( E( ))] = E(e )( ) Ths says the error s uncorrelated wth the eplanatory varable. The above assumptons can now be used to establsh the statstcal propertes of the least squares (OLS) estmator. The focus of the presentaton wll be the slope estmator b. Smlar results can be obtaned for the ntercept estmator b. cov(e,e ) var(e )var(e ) j j Ths shows zero covarance s equvalent to uncorrelated errors. 7 Econ 36 - Chapter 8 Econ 36 - Chapter

If the standard assumptons are satsfed then b s an unbased estmator of β. That s, (b ) = β E That s, the w have the propertes w = = 0 and w = = Ths result can be shown. Introduce w = for =,,..., = A result s = = 0 = = Therefore w = 0 = ow state the slope estmator as: b (y y) = = = = w (y y) = = wy y w = = Also w = = w ( = = = = = = wy = Ths shows that b s a lnear functon of the y a weghted average of the y wth the w as weghts. Use assumpton () to substtute for the y to get b = w ( β = = β w + β + β + e ) w + we = = = = β + we = 9 Econ 36 - Chapter 0 Econ 36 - Chapter

Take epectatons to fnd The varance of the slope estmator can be found as follows. E(b ) = β + E w = e = var( b ) = var( β + we ) = β + = we(e ) use assumpton (5*) = var( ) we = β s a constant = β use assumpton () E(e ) = 0 for all Ths says the slope estmator b s an unbased estmator of β. What does ths mean? Wth a sample of numerc data a slope estmate can be calculated. Ths estmate wll be smaller or larger than the true unknown populaton value β. Another sample of observatons wll yeld a dfferent slope estmate that agan wll be smaller or larger than the true parameter. In repeated samplng, the average of all the calculated slope estmates wll equal β. = w = = σ = σ w = = σ var(e ) = = = ) use assumptons (5*) and (4) use assumpton (3) var( e ) = σ substtute for w Econ 36 - Chapter Econ 36 - Chapter

The varance gves a measure of the precson of the estmator. Inspecton of the varance formula shows the followng: an ncrease n sample sze generally leads to lower varance. Ths holds snce ( ncreases as ncreases. = Ths gves ncreased precson of the estmator. the greater the varablty n the more precse s the estmator. Ths holds snce the varance of the slope estmator can be epressed as: var( b ) = σ ( ) var() The Gauss-Markov Theorem It has been shown that the least squares (OLS) estmator b s a lnear unbased estmator of β. Here, lnear means that b s a weghted average of the y. The Gauss-Markov theorem says: If the standard set of assumptons s satsfed, then the least squares estmator has mnmum varance n the class of lnear unbased estmators. That s, the least squares estmator s BLUE (Best Lnear Unbased Estmator). Best means mnmum varance. the smaller the varablty n y (as reflected n σ ) the more precse s the estmator. 3 Econ 36 - Chapter 4 Econ 36 - Chapter

What does ths mean? Suppose That s, * b s another lnear unbased estmator of β. b * = ky = where the k are some weghts that are dfferent from the w. Also E (b * ) = β The Gauss-Markov theorem says * > var( b ) var(b) If any of the standard assumptons are volated then the least squares method may not be the best. There may be an estmator wth lower varance but t s ether based or not lnear n y. Estmatng the Varance of the Error Term ow take another look at the varance of the slope estmator: var( b The error varance ) An estmator for = σ σ σ = s unknown. s needed. The sum of squared resduals s: SSE ê = ( y b b ) = = = An unbased estmator for σˆ = ê = σ s: = SSE The dvsor s the number of degrees of freedom n the sum of squares. The degrees of freedom (df) s the number of ndependent peces of nformaton used to comple the sum of squares from observatons. For ths applcaton, two degrees of freedom are lost for the two estmated parameters (the ntercept and the slope). 5 Econ 36 - Chapter 6 Econ 36 - Chapter

ow replace the unknown error varance σ to get an estmator for var( b ) as: v âr(b ) = σˆ = The standard error of the slope estmator b s: se(b ) = vâr(b) 7 Econ 36 - Chapter