Simple Linear Regression - PDF Free Download

Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato were developed by Sr Fracs Galto the late 18 s. The applcato ca be used to test for a statstcally sgfcat correlato betwee the varables. Fdg a relatoshp does ot prove a cause ad effect relatoshp, but the model ca be used to quatfy a relatoshp where oe s kow to exst. The model provdes a measure of the rate of chage of oe varable relatve to aother varable.. There s a potetal chage the value of varable as the value of varable chages. Varable values wll always be pared, oe termed a depedet varable (ofte referred to as the varable) ad a depedet varable (termed a varable). For each value of there s assumed to be a ormally dstrbuted populato of values for the varable. The lear model whch descrbes the relatoshp betwee two varables s gve as 1 The varable s called the depedet varable or respose varable (vertcal axs). s the populato equato for a straght le. No error s eeded ths yx. 1 equato because t descrbes the le tself. The term yx. s estmated wth at each value of wth ˆ. y.x the true populato mea of at each value of The varable s called the depedet varable or predctor varable (horzotal axs). the true value of the tercept (the value of whe ) 1 the true value of the slope, the amout of chage for each ut chage (.e. f chages by 1 ut, chages by 1 uts). The two populato parameters to abe estmated, ad 1 are also referred to as the regresso coeffcets. All varablty the model s assumed to be due to, so varace s measured vertcally The varablty s assumed to be ormally dstrbuted at each value of The varable s assumed to have o varace sce all varablty s (ths s a ew assumpto) The values ad 1 (b ad b 1 for a sample) are called the regressos coeffcets. The value s the value of at the pot where the le crosses the axs. Ths value s called the tercept. If ths value s zero the le crosses at the org of the ad James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 14 axes, ad the lear equato reduces from b + b 1 to b 1 ad s sad to have o tercept, eve though the regresso le does cross the axs. The uts o b are the same uts as for. The 1 value s called the slope. It determes the cle or agle of the regresso le. If the slope s, the le s horzotal. At ths pot the lear model reduced to b, ad the regresso s sad to have o slope. The slope gves the chage per ut of. The uts o the slope are the uts per ut. The populato equato for the le descrbes a perfect le wth o varato. I practce there s always varato about the le. We clude a addtoal term to represet ths varato. for a populato 1 b b e for a sample 1 Whe we put ths term the model, we are descrbg dvdual pots as ther posto o the le plus or mus some devato The Sum of Squares of devatos from the le wll form the bass of a varace for the regresso le Whe we leave the e off the sample model we are descrbg a pot o the regresso le, predcted from the sample estmates. To dcate ths we put a hat o the value, ˆ b b. 1 Characterstcs of a Regresso Le The le wll pass through the pot, (also the pot b, ) The sum of squared devatos (measured vertcally) of the pots from the regresso le wll be a mmum. Values o the le for ay value of ca be descrbed by the equato ˆ b b1 Commo objectves Regresso : there are a umber of possble objectves Determe f there s a relatoshp betwee ad. Ths would be determed by some hypothess test. The stregth of the relatoshp s, to some extet, reflected the correlato or R value. Determe the value of the rate of chage of relatve to. Ths s measured by the slope of the regresso le. Ths objectve would usually be accompaed by a test of the slope agast (or some other value) ad/or a cofdece terval o the slope. Establsh ad employ a predctve equato for from. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 141 Ths objectve would usually be preceded by a Objectve 1 above to show that a relatoshp exsts. The predcted values would usually be gve wth ther cofdece terval, or the regresso wth ts cofdece bad. Assumptos Regresso Aalyss Idepedece The best guaratee of ths assumpto s radom samplg. Ths s a dffcult assumpto to check. Ths assumpto s made for all tests we wll see ths course. Normalty of the observatos at each value of (or the pooled devatos from the regresso le) Ths s relatvely easy to test f the approprate values are tested (e.g. resduals ANOVA or Regresso, ot the raw values). Ths ca be tested wth the Shapro-Wlks W statstc PROC UNIVARIATE. Ths assumpto s made for all tests we have see ths semester except the Ch square tests of Goodess of Ft ad Idepedece Homogeety of error (homogeeous varaces or homoscedastcty) Ths s easy to check for ad to test aalyss of varace (S o mea or tests lke Bartalett s ANOVA). I Regresso the smplest way to check s by examg the the resdual plot. Ths assumpto s made for ANOVA (for pooled varace) ad Regresso. Recall that sample t-tests the equalty of the varaces eed ot be assumed, t ca be readly tested. measured wthout error: Ths must be assumed ordary least squares regressos, sce all error s measured a vertcal drecto ad occurs. Assumptos geeral assumptos The varable s ormally dstrbuted at each value of The varace s homogeeous (across ). Observatos are depedet of each other ad e depedet of the rest of the model. Specal assumpto for regresso. Assume that all of the varato s attrbutable to the depedet varable (), ad that the varable s measured wthout error. Note that the devatos are measured vertcally, ot horzotally or perpedcular to the le. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 14 Fttg the le Fttg the le starts wth a corrected SSDevato, ths s the SSDevato of the observatos from a horzotal le through the mea. The le wll pass through the pot,. The ftted le s pvoted o ths pot utl t has a mmum SSDevatos. How do we kow the SSDevatos are a mmum? Actually, we solve the equato for e, ad use calculus to determe the soluto that has a mmum of the sum of squared devatos. b b e 1 e ( b b ) ˆ 1 ˆ e [ ( b b1)] 1 1 1 The le has some desrable propertes E(b ) E(b 1 ) 1 E( ). Therefore, the parameter estmates ad predcted values are ubased estmates. Dervato of the formulas ou do ot eed to lear ths dervato for ths class! However you should be aware of the process ad ts objectves. Ay observato from a sample ca be wrtte as b b1 e. where; e a devato of the observed pot from the regresso le The dea of regresso s to mmze the devato of the observatos from the regresso le, ths s called a Least Squares Ft. The smple sum of the devatos s zero,, so mmzg wll requre a square or a absolute value to remove the sg. e The sum of the squared devatos s, e ˆ b b1 The objectve s to select b ad b 1 such that e s a mmum, by usg some techques from calculus. We have prevously defed the ucorrected sum of squares ad corrected sum of squares of a varable. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 143 The corrected sum of squares of The ucorrected SS s The correcto factor s The corrected SS s CSS S We wll call ths corrected sum of squares S ad the correcto factor C The corrected sum of squares of We could defe the exact same seres of calculatos for, ad call t S The corrected cross products of ad We eed a cross product for regresso, ad a corrected cross product. The cross product s. The ucorrected sum of cross products s The correcto factor for the cross products s C The corrected cross product s CCP S The formulas for calculatg the slope ad tercept ca be derved as follows Take the partal dervatve wth respect to each of the parameter estmates, b ad b 1. For b : ( e ) 1 b b b1, whch s set equal to ad solved for b. 1 ( )(-1) b b (ths s the frst ormal equato ) 1 Lkewse, for b 1 we obta the partal dervatve, set t equal to ad solved for b 1. ( e ) 1 b 1 ( b b )(- ) 1 1 ( b b ) b b ) (secod ormal equato ) 1 1 The ormal equatos ca be wrtte as, b b1 b b 1 At ths pot we have two equatos ad two ukows so we ca solve for the ukow regresso coeffcet values b ad b 1. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 144 Revew For b the soluto s: b b1 ad b b1 b1. Note that estmatg requres a pror estmate of b 1 ad the meas of the varables ad. For b 1, gve that, b b 1 ad b b1 the b b b b 1 1 1 1 1 1 1 b b b - S b so b 1 1 s the corrected cross products over the corrected S sum of squares of The termedate statstcs eeded to solve all elemets of a SLR are,,,, ad. We have ot see used the calculatos yet, but we wll eed t later to calculate varace. We wat to ft the best possble le through some observed data pots. We defe ths as the le that mmzes the vertcally measured dstaces from the observed values to the ftted le. The le that acheves ths s defed by the equatos b b1 b1 b - 1 S S These calculatos provde us wth two parameter estmates that we ca the use to get the equato for the ftted le. ˆ b b1. Testg hypotheses about regressos The total varato about a regresso s exactly the same calculato as the total for Aalyss of Varace. SSTotal SSDevatos from the mea Ucorrected SSTotal Correcto factor The smple regresso aalyss wll produce two sources of varato. SSRegresso the varato explaed by the regresso SSError the remag, uexplaed varato about the regresso le. These sources of varato are expressed a ANOVA source table. James P. Geagha Copyrght 1

Statstcal Methods I (EST 75) Page 145 Source d.f. Regresso 1 d.f. used to ft slope Error error d.f. Total 1 d.f. lost adjustg for ( correctg for ) the mea Note that oe degree of freedom s lost from the total for the correcto for the mea, whch actually fts the tercept. The sgle regresso d.f. s for fttg the slope. The correcto fts a flat le through the mea The regresso actually fts the slope. The dfferece betwee these two models s that oe has o slope, or a slope equal to zero ( b 1 ) ad the other has a slope ftted. Testg for a dfferece betwee these two cases s the commo hypothess test of terest regresso ad t s expressed as H: 1. The results of a regresso are expressed a ANOVA table. The regresso s tested wth a F test, formed by dvdg the MSRegresso by the MSError. Ths s a oe taled F test, as t was wth ANOVA, ad t has 1 ad 1 d.f. It tests the ull hypothess H: 1 versus the alteratve H: 1 1. The R statstc Source df SS MS F Regresso 1 SSRegresso MSRegresso MSRegresso / MSError Error SSError MSError Total 1 SSTotal Ths s a popular statstc for terpretato. The cocept s that we wat to kow what proporto of the corrected total sum of squares s explaed by the regresso le. Source d.f. SS Regresso 1 SSReg Error SSError Total 1 SSTotal I the regresso the process of fttg the regresso the SSTotal s dvded to two parts, the sum of squares explaed by the regresso (SSRegresso) ad the remag James P. Geagha Copyrght 1