Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos (Y X ), =,,.., o the relatoshp whch, because t s ot exact, we shall wrte as: y = α + β x +,..., I ths relatoshp α, β ad ε =,.., are fudametally uobservable ad we would lke to estmate α ad β. Approach: The dea s to select estmates of α ad β lets call them α* ad β* whch yeld a straght le (called the regresso le) Y = α* + β*x whch mmses a measure of the aggregate dstace of the pots (Y X ), =,,.., to that le X Y space, where Y s measured o the vertcal axs. The measure we use s the sum of squared vertcal dstaces whch we shall call the Error Sum of Squares (ESS) so that α* ad β* are solutos to the problem: ( ) t m ESS Y α β X α, β = + The multvarate calculus s employed to solve ths problem by settg the partal dervatves of ESS wth respect to α* ad β* to zero (called the frst order codtos) ad solvg thus: The solutos to whch are: ( Y ( α β X) ) ESS = + = 0 α ( ( α β ) ) ESS = Y + X X 0 = β

α = Y β X β = ( Y Y X ( X X X Note the formula for β * has may alteratve equvalet versos whch may be see by observg that for the umerator: Y Y X = Y Y X X = X X Y = X Y YX ad for the deomator: X X X = X X X X = X X so that varous combatos of umerator ad deomator formulae yeld alteratve represetatos of β *. Assumptos the Ordary Least Squares model. Note that whle α, β ad ε, =,.., are fudametally uobservable we oly cocer ourselves wth estmatg α ad β whch defe the relatoshp betwee Y ad X. The ε =,.., are cosdered errors whch accommodate all the other flueces o Y ot accouted for α +βx as such we assume them to be radom ad to obey the followg four assumptos: ) E(ε ) = 0 for all. Ths assumpto really says that the average or et effect of all the other flueces o Y ot accouted for α +βx s costat ad zero for each observato (,..,). ) V(ε ) = σ > 0 for all. Ths assumpto really says that the varablty of the et effect of all the other flueces o Y ot accouted for α +βx s costat ad o zero for each observato (,..,).

Ths s sometmes referred to as the homoskedastcty assumpto ad s ot as ocuous as t seems. For example f Y were the cosumpto behavour of a dvdual ad X was ther dsposable come t says that the scale of varablty of the uobserved effects would be the same for both rch ad poor dvduals. 3) E(ε ε j ) = 0 for all j. 4) E(ε X j ) = 0. For ad j These last two assumptos derve from the geeral assumpto that the et effects of all the other flueces o Y ot accouted for α +βx are depedet of the X s ad of each other. Aga these are ot ocuous assumptos, for example f the equato relates to the behavour of dvduals ad dvduals the sample are related to oe aother some fasho they are ulkely to be true. Smlarly f the equato relates to behavour through tme today s radom effects are ulkely to be depedet of yesterdays effects. Gve these assumptos certa propertes of the estmators follow. Ubasedess. Uder the above assumptos the ordary least squares estmators α* ad β* are ubased so that E(α*) = α ad E(β*) = β whch may be demostrated as follows. From the varous formulae for β* we may wrte: ( ( α β + ) β = = = = X X Y X X + X ( X X X X X X whch, after otg that the sum of devatos from mea s equal to 0, may be wrtte as: β = β + ( X ( X X X

whch ca readly be see to have a expectato β. For α* ote that: α = Y + β X = α + βx + β X = α + β β X + sce E(β - β*) = 0 ad gve assumpto E(α*) = α. The varaces of the estmators. The varace of the estmators α* ad β* facltate ferece ad cofdece terval estmato. The ca be derved, gve the above assumptos as follows. The varace of ay radom varable W s gve by E[(W-E(W)) ] so that for β* from the above we may wrte: X X X X E ( β E( β )) E β β E ( X X ( X X) = + = Note from our assumptos that: σ ( ) E X X X X = 0 for all j j j = X X for all = j whch meas that the varace may be wrtte as: E ( X σ σ ( β E( β )) = = ( X ) ( X X ) X By smlar argumets the varace of α* s gve by: X E ( α E( α )) = σ + ( X X The esduals.

The thgs that are ot explaed by the model, the ε s are fact mplctly estmated by the model by Y - α* - β*x. These estmates of the errors are frequetly referred to as the esduals. They too have propertes whch reflect the assumptos made regardg the true errors.. The resduals sum to 0. = α β = β β e Y X Y Y X X ( Y Y) β ( X X) = = 0 Ths mmcs the dea that the errors have a expectato 0.. The resduals are orthogoal to the X s. ex = Y α β X X = Y Y β X β X X β ( Y Y X = = = 0 Y Y X X X X Y Y X X X X ( X X Whch mmcs the dea that the errors are depedet of the X s. Iferece ad Cofdece Itervals. Notg that β* may be wrtte as: β = β + ( X X) ( X X X t ca readly be see that the estmator s a lear fucto of the errors ε, =,.., so that f they were ormally dstrbuted the so would β* be, fact based upo the foregog t s dstrbuted as:

β N β, σ ( X a smlar fasho α* wll be dstrbuted as: α N α σ +, X ( X so that ferece ad cofdece terval computatos ca proceed accordgly. A Estmator for σ. As usual t s extremely rare for the value of the varace to be kow so that geerally t has to be estmated. The estmator σ * s gve by problem: σ ( Y ( α + β X) ) ) ESS = = whch s most coveetly calculated by otg that: ( ) β ( ) ESS = Y Y X X employg ths estmate meas that: ( β β) ad smlarly: σ + σ ( X ( α α) X ( ) t ( X t ( so ferece ad cofdece terval calculatos ca be coducted accordgly.

ad all that. A very commo strumet for measurg the degree of explaatory power a equato s the statstc whch measures the proporto of the varablty the Y s that s attrbutable to the X s. Gve the amout of varablty the Y s attrbutable to the errors s gve by the error sum of squares (ESS), the proporto of the varablty of the Y s attrbutable to the X s s gve by: ( ) ( ( ) Y Y ESS β X X = = ( Y Y Y Y where the last equalty s obtaed by substtutg the formula for ESS. Notg that - s: t follows that: = ESS ( Y Y) ( X X) β β ( = = ESS st dev ( β ) whch s the square of the classc t statstc for β whch wll have a F(,-) dstrbuto. Multple egresso. Here we cosder the exteso of the smple regresso techque to aalysg the lear relatoshp betwee K+ varables Y ad X, X,..,X K. We have K+-tuples of observatos (Y X, X,..,X K ) =,,.., o the relatoshp whch, because t s ot exact, we shall wrte as: K Y = α + β X + =,..., k k k =

I ths relatoshp α, β k, k =,..,K ad ε =,.., are fudametally uobservable ad we would lke to estmate the α ad β k. Essetally ths deals wth the case where Y s descrbed by more tha oe varable X. We requre oe addto to our 4 assumptos, amely > K+. I the same way that estmators for α, β were developed the smple regresso case (by mmsg ESS) estmators for α, β k, k =,..,K (deoted α *, β * k, k =,..,K ) ca be developed by mmzg wth respect to α *, β * k, k =,..,K the multvarate verso of the error sum of squares gve by: ESS = Y + X K α β k k k= The formulae for these estmates ad smlarly formulae for Var(α * ) ad Var(β * k), k =,..,K are complcated ad wll ot be gve here however these estmates are readly calculated usg avalable software packages ad just as the smple regresso case ferece ad cofdece tervals may be pursued by otg that: ( βk βk) var ( βk ) ( K ) t ad smlarly: ( αk αk) var ( αk ) ( k ) t so that ferece ad cofdece terval calculatos ca be coducted accordgly. the Multvarate Case. I the multvarate case ca be calculated exactly the same fasho as: = ESS ( Y Y)

however ths case f we wsh to use t the cotext of a jot test of the sgfcace of all of the X k s we have to use: ( K ) K (, K ) F K whch provdes us wth a oe sded upper taled test statstc for the sgfcace of the K explaatory varables X k.