REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION I liear regreio, we coider the frequecy ditributio of oe variable (Y) at each of everal level of a ecod variable (X). Y i kow a the depedet variable. The variable for which you collect data. X i kow a the idepedet variable. The variable for the treatmet. Determiig the Regreio Equatio Oe goal of regreio i to draw the bet lie through the data poit. The bet lie uually i obtaied uig mea itead of idividual obervatio. Example Effect of hour of mixig o temperature of wood pulp Hour of mixig (X) Temperature of wood pulp (Y) XY 4 4 7 8 6 9 74 8 64 5 86 86 9 4 X=4 Y=39 XY=8 X =364 Y =,967 =6
Temperature Effect of hour of mixig o temperature of w ood pulp 8 6 4 4 6 8 Hour of mixig The equatio for ay traight lie ca be writte a: Ŷ b bx where: b o = Y itercept, ad b = regreio coefficiet = lope of the lie The liear model ca be writte a: where: e i =reidual = Y Ŷ i i Y i β β X ε i With the data provided, our firt goal i to determie the regreio equatio Step. Solve for b (X X)(Y Y) (X X) ( XY) XY SS Cro Product b X SSCP SS X SS X X for the data i thi example
Temperature ( o F) X = 4 Y = 39 XY =,8 X = 364 Y =,967 ( XY) XY X X (4x39) 8 6 4 364 6 567 7 b 8. The umber calculated for b, the regreio coefficiet, idicate that for each uit icreae i X (i.e., hour of mixig), Y (i.e., wood pulp temperature) will icreae 8. uit (i.e., degree). The regreio coefficiet ca be a poitive or egative umber. To complete the regreio equatio, we eed to calculate b o. 39 4 Y - b X 8. 6 6 b - 3.533 Therefore, the regreio equatio i: 3.533 8.X Ŷ i 8 6 4 - -4-4 6 8-4 -6 Hour of mixig
Temperature Aumptio of Regreio. There i a liear relatiohip betwee X ad Y. The value of X are kow cotat ad preumably are meaure without error. 3. For each value of X, Y i idepedet ad ormally ditributed: Y~N(, σ ). Y Y. 4. Sum of deviatio from the regreio lie equal zero: 5. Sum of quare for error are a miimum. i ˆi Effect of hour of mixig o temperature of wood pulp 8 6 4-4 6 8 Hour of mixig If you quare the deviatio ad um acro all obervatio, you obtai the defiitio formula for the followig um of quare: Ŷ Y i Y i Ŷ i Y Y i = Sum Square Due to Regreio = Sum Square Due to Deviatio from Regreio (Reidual) = Sum Square Total
Tetig the hypothei that a liear relatiohip betwee X ad Y exit The hypothee to tet that a liear relatiohip betwee X ad Y exit are: H o : ß = H A : ß Thee hypothee ca be teted uig three differet method:. F-tet. t-tet 3. Cofidece iterval Method. F-tet The ANOVA to tet H o = ca be doe uig the followig ource of variatio, degree of freedom, ad um of quare: SOV df Sum of Square Due to regreio ( XY) XY X SSCP X SS X Reidual - Determied by ubtractio Y Total - Y SS Y Uig data from the example: X = 4 Y = 39 XY =,8 X = 364 Y =,967 Step. Calculate Total SS = Y Y 39,967-6 5,6.833
Step. Calculate SS Due to Regreio = ( XY) XY X X 4x39 8-6 4 364 6 3,489 7 4,59.7 Step 3. Calculate Reidual SS = SS Deviatio from Regreio Total SS - SS Due to Regreio 56.833-459.7 = 44.33 Step 4. Complete ANOVA SOV df SS MS F Due to Regreio 459.7 459.7 Due to Reg. MS/Reidual MS = 44.36 ** Reidual 4 44.33 3.533 Total 5 56.833 The reidual mea quare i a etimate of σ Y X, read a variace of Y give X. Thi parameter etimate the tatitic σ Y X. Step 5. Becaue the F-tet o the Due to Regreio SOV i igificat, we reject H o : ß = at the 99% level of cofidece ad ca coclude that there i a liear relatiohip betwee X ad Y. Coefficiet of Determiatio - r From the ANOVA table, the coefficiet of variatio ca be calculated uig the formula r = SS Due to Regreio / SS Total Thi value alway will be poitive ad rage from to.. A r approache., the aociatio betwee X ad Y improve. r x i the percetage of the variatio i Y that ca be explaied by havig X i the model. For our example: r = 459.7 / 56.833 =.97. We ca coclude that 9.7% (i.e..97 x ) of the variatio i wood pulp temperature ca be explaied by hour of mixig.
Method. t-tet The formula for the t-tet to tet the hypothei H o : ß = i: b t b where: b the regreio coefficiet, ad b Y X SS X For our example: Step. Calculate Remember that Y X = Reidual MS = [SS Y - (SSCP / SS X)] / (-) b We kow from previou part of thi example: Therefore, SS Y = 56.833 SSCP = 567. SS X = 7. b = ( Y X / SS X) SSCP SS Y - SS X - SS X 567 56.833-7 6-7.479
Step. Calculate t tatitic b t b 8..479 6.66 Step 3. Look up table t value Table t -) df = t.5/, 4df =.776 Step 4. Draw cocluio Sice the table t value (.776) i le that the calculated t-value (6.66), we reject H o : ß = at the 95% level of cofidece. Thu, we ca coclude that there i a liear relatiohip betwee hour of mixig ad wood pulp temperature at the 95% level of cofidece. Method 3. Cofidece Iterval The hypothei H o : ß = ca be teted uig the cofidece iterval: CI b t,( ) df ( b ) For thi example: CI b t,( ) df ( b ) 8..776.479 4.74 β.476 We reject Ho: ß = at the 95% level of cofidece ice the CI doe ot iclude.
Predictig Y Give X Regreio aalyi alo ca be ued to predict a value for Y give X. Uig the example, we ca predict the temperature of oe batch of wood pulp after mixig X hour. I thi cae, we predict a idividual outcome of Y X draw from the ditributio of Y. Thi etimate i ditict from etimatig mea or average of a ditributio of Y. The value of a idividual Y at a give X will take o the form of the cofidece iterval: CI Ŷ t,( ) df (Y XX ) where Y XX Y X, ad Example Y XX (X X) Y X Remember Y X i the Reidual Mea Square SS X We wih to determie the temperature of the oe batch of wood pulp after mixig two hour (i.e., Y X= ). Step. Uig the regreio equatio, olve for Ŷ whe X=. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.() =.667 Step. Solve for Y X= Y XX Y X (X X) SS X 3.533 57.765 6 ( 7) 7
Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ).667.776 57.65.667 34.868 Therefore : LCI -.ad UCI 47.535 Note: Thi CI i ot ued to tet a hypothei Thi CI tate that if we mix the wood pulp for two hour, we would expect the temperature to fall withi the rage of -. ad 47.535 degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace. Example We wih to determie the temperature of the oe batch of wood pulp after mixig eve hour (i.e., Y X=7 ). Step. Uig the regreio equatio, olve for Ŷ whe X=7. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.(7) = 53.67 Step. Solve for Y X=7 Y XX Y X (X X) SS X 3.533.789 6 (7 7) 7
Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ) 53.67.776.789 53.67 3.59 Therefore : LCI.658 ad UCI 83.676 Note: For X=7 (i.e., at the mea of X), the variace Y X=Xo i at a miimum. Thi CI tate that if we mix the wood pulp for eve hour, we would expect the temperature to fall withi the rage of.658 ad 83.676 degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace. Predictig Y Give X Regreio aalyi alo ca be ued to predict a value for Y give X. Uig the example, we ca predict the average temperature of wood pulp after mixig X hour. I thi cae, we predict a idividual outcome of Y X draw from the ditributio of Y. Thi etimate i ditict from ditributio of Y for a X. The value of a idividual Y at a give X will take o the form of the cofidece iterval: CI Ŷ t (,( ) df Y XX ) where Y XX Y X, ad Y XX Y X (X X) SS X
Example We wih to determie the average temperature of the wood pulp after mixig two hour (i.e., Y X= ). Step. Uig the regreio equatio, olve for Ŷ whe X=. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.() =.667 Step. Solve for Y X Y X Y X (X X) SS X ( 7) 3.533 6 7 54.3 Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ).667.776 54.3.667.443 Therefore : LCI -7.776 ad UCI 33. Note: Thi CI i ot ued to tet a hypothei Thi CI tate that if we mix the wood pulp for two hour ay umber of time, we would expect the average temperature to fall withi the rage of -7.776 ad 33. degree 95% of the time. We would expect the temperature to fall outide of thi rage 5% of the time due to radom chace.
Example We wih to determie the average temperature of wood pulp after mixig eve hour. Step. Uig the regreio equatio, olve for Ŷ whe X=7. Remember Ŷ =-3.533 + 8.X Ŷ = -3.533 + 8.(7) = 53.67 Step. Solve for Y X 7 Y X7 Y X (X X) SS X (7 7) 3.533 6 7 7.56 Step 3. Calculate the cofidece iterval CI Ŷ t,( ) df ( Y XX ) 53.67.776 7.56 53.67.53 Therefore : LCI 4.635 ad UCI 64.669 Note: For X=7 (i.e., at the mea of X), the variace i at a miimum. Y X X
Comparig ad Y X X Y X X i alway greater tha Y X X. Y X X Comparig the formula: Y XX Y XX (X X) Y X ad SS X (X X) Y X. SS X Notice that i the formula for you add oe while i the formula for Y X X you do ot. Y X X Compario of Y X X ad Y X X. X Y X X Y X X 57.767 54.3 7.789 7.56
Temperature We ca draw the two cofidece iterval a cofidece belt about the regreio lie. Cofidece belt for the effect of hour of mixig o temperature of wood pulp 3 9 7 5 3 LL - Id. Y UL - Id. Y LL - Y Bar UL - Y Bar - -3 4 6 8 4 Hour of mixig Notice that:. The cofidece belt are ymmetrical about the regreio lie. The cofidece belt are arrowet at the mea of X, ad 3. The cofidece belt for the ditributio baed o mea are arrower tha the ditributio baed o a idividual obervatio.
Determiig if Two Idepedet Regreio Coefficiet are Differet It may be deirable to tet the homogeeity of two b ' to determie if they are etimate of the ame ß. Thi ca be doe uig a t-tet to tet the hypothee: H o : ß = ß ' H A : ß ß ' b b' Where t (Reidual MS Reidual MS) The Table t-value ha ( - ) + ( - ) df. Example X SS X SS X Y Y 6 9 3 4 34 4 8 45 5 3 58 X = 5 Y = 8 Y = 87 X = 55 Y = 46 Y = 787 Step. Determie regreio coefficiet for each Y for Y XY = 7 Thu b = [7 - (5x8)/5] / [55-5 /5] =.6 for Y XY = 65 Thu b ' = [65 - (5x87)/5] / [55-5/5] = 9.
Step. Calculate Reidual MS for each Y Remember Reidual MS = SSCP SS Y SS X 8 6 46-5 Reidual MS 5 4.5 87 9 787-5 Reidual MS 5 7.7 Step 3. Solve for t.6 9. (4.5 7.7)x 6.4.(.) 4. Step 4. Look up table t-value with ( - ) + ( - ) df t.5/, 6 df = -.447 Step 4. Make cocluio Becaue the abolute value of the calculated t-value (-4.) i greater tha the abolute value of the tabular t-value (.776), we ca coclude at the 95% level of cofidece that the two regreio coefficiet are ot etimatig the ame ß.
Summary - Some Ue of Regreio. Determie if there i a liear relatiohip betwee a idepedet ad depedet variable.. Predict value of Y at a give X Mot accurate ear the mea of X. Should avoid predictig value of Y outide the rage of the idepedet variable that were ued. 3. Ca adjut Y to a commo bae by removig the effect of the idepedet variable (Aalyi of Covariace). 4. ANOVA (CRD, RCBD, ad LS) ca be doe uig regreio 5. Compare homogeeity of two regreio coefficiet. SAS Commad optio pageo=; data reg; iput x y; datalie; 4 7 6 9 8 64 86 9 ; proc reg; model y=x/cli clm; title 'SAS Output for Liear Regreio Example i Cla'; ru;
SAS Output for Liear Regreio Example i Cla The REG Procedure Model: MODEL Depedet Variable: y Number of Obervatio Read 6 Number of Obervatio Ued 6 Source DF Aalyi of Variace Sum of Square Mea Square F Value Pr > F Model 459.7 459.7 44.36.6 Error 4 44.3333 3.53333 Corrected Total 5 56.83333 Root MSE.753 R-Square.973 Depedet Mea 53.6667 Adj R-Sq.8966 Coeff Var 9.388 Variable DF Parameter Etimate Parameter Etimate Stadard Error t Value Pr > t Itercept -3.53333 9.4753 -.37.78 x 8..66 6.66.6
SAS Output for Liear Regreio Example i Cla The REG Procedure Model: MODEL Depedet Variable: y Ob Depedet Variable Predicted Value Output Statitic Std Error Mea Predict 95% CL Mea 95% CL Predict Reidual..6667 7.364-7.7797 33.3 -.68 47.54 8.3333 7. 8.8667 5.587 3.564 44.69-3.85 6.84 -.8667 3 9. 45.667 4.383 33.49 57.84 4.366 75.767-6.667 4 64. 6.667 4.383 49.49 73.84 3.566 9.967.7333 5 86. 77.4667 5.587 6.64 9.869 45.35 9.684 8.5333 6 9. 93.6667 7.364 73.3 4.3 58.793 8.54 -.6667 Sum of Reidual Sum of Squared Reidual 44.3333 Predicted Reidual SS (PRESS) 868.5699