Chapter 1 Simple Linear Regreion Introduction Exam Score v. Hour Studied Scenario Regreion Analyi ued to quantify the relation between (or more) variable o you can predict the value of one variable baed on the value of another develop an equation to predict the value of a dependent variable baed on the value of one or more independent variable Correlation Analyi meaure the trength of linear relation between a pair of variable if you plan to predict Y from X, they ought to be related! 1
Simple v. Multiple Regreion Simple Regreion Analyi ue a ingle independent variable to predict the dependent variable etimated Score 40.0816 + 1.4966(Hour) r.743 Multiple Regreion Analyi ue multiple independent variable to predict the dependent variable the et of independent variable hould be independent of one another and each hould be highly related to the dependent variable etimated Score 33.914 +3.47(GPA) -1.698(Abence) +1.395(Hour) r.7654 3 Characterizing Relationhip Direct Relation line of bet fit ha poitive lope Invere Relation line of bet fit ha negative lope Determinitic (Functional) Relation 100% pure relation between the pair of variable there i no catter with repect to line of bet fit, o the value of Y can be determined exactly (without error) baed on value of X Stochatic (Statitical, Random) Relation a le than perfect relation between the pair of variable ince variable other than X impact Y, there i catter with repect to line of bet fit and there will be error when ue x to predict y How characterize the apparent relation between Exam Score and Hour Studied? 4
Simple Linear Regreion Model Population Linear Regreion Equation y ß0 + ß1 x + e ε repreent the combined effect of other variable and i aumed to have mean of 0 and variance of σ Sample Linear Regreion Equation ŷ b 0 + b1 x 5 Leat Square Method: Line Of Bet Fit The ample regreion line won't perfectly fit the ample point there will be error in fit. Why? error in fit reidual (y - ŷ) Provide the bet fitting line in the ene that it ha the minimum amount of quared deviation between each oberved value and the correponding point on the regreion line Minimize the um of quared reidual in order to: prevent (+) and (-) error from cancelling draw added attention to any large error prefer to make everal mall error in order to avoid large error 6 3
Leat Square Method: Line Of Bet Fit Propertie of the Leat Square regreion equation 1) b 0 and b 1 are unbiaed etimator of ß 0 and ß 1 ) line pae through the point ( x, y) 3) the um of the reidual i zero (y - ŷ) 0 4) the um of the quared reidual i minimized (x-x)(y- b1 y) lope (x -x) y 0 1 intercept b Exam Score v. Hr Studied y - b the ample regreion equation i: x (y - ŷ) minimum compute the predicted value compute the reidual and quared reidual 7 Conditional Ditribution Of y Figure 1.8 on page 511 Why i y variable at any given value x? Ditribution of y i aumed Normal with mean ŷ The regreion equation i the line which connect the mean value of y at each value of x 8 4
Correlation Analyi Concept Meaure the trength of linear relation between two variable If you intend to ue X to predict Y, how trongly related are they? The lope of the ample regreion equation wa +1.4965 o thee variable eem to move together The mean exam core wa 76 and variation among tudent core wa 11.504 ome of the variation in core can be explained by taking into account hour tudied 9 Strength of Relationhip r.98, r.96 r.78, r.61 r.34, r.1 r.1, r.01 r-.01, r.00 r-.99, r.98 r-.64, r.41 r-.33, r.1 r-.11, r.01 10 5
95 Correlation Analyi TOTAL VARIATION EXPLAINABLE BY + UNEXPLAINABLE IN SCORES HOURS STUDIED BY HOURS STUDIED SST SSR + SSE (y y) (ŷ y) + (y (9-76) (88-76) + (9-88) ŷ) 85 75 y 76 65 55 45 35 0 5 10 15 0 5 30 35 11 Correlation Analyi TOTAL VARIATION EXPLAINABLE BY + UNEXPLAINABLE IN SCORES HOURS STUDIED BY HOURS STUDIED SST SSR + SSE (y y) (ŷ y) + (y ŷ) Exam Score v. Hour Studied SST SSR SSE 1 6
Coefficient Of Determination Meaure the proportion of variation in variable y that i explained by variable x Indicate how well the ample regreion line fit the ample data ρ etimated by r 0 < r < 1 r explained variation SSR total variation SST Exam Score v. Hr Studied (ŷ y) (y y) 13 Coefficient Of Correlation ρ etimated by r -1 < r < +1 r (ign of Interpretation: There i a (trength) (direct or invere) correlation between (variable X) and (variable Y) b 1 ) r Value of r Strength of correlation.9 to 1 very high.7 to.9 high.5 to.7 moderate.3 to.5.0 to.3 weak little if any Exam Score v. Hr Studied 14 7
Coefficient Of Correlation When working with multiple variable, common to obtain the correlation between each pair of variable a triangular correlation matrix Can invetigate whether or not the potential independent variable are truly independent of one another Score Hour GPA Hour 0.86 GPA 0.489 0.566 Abence -0.343-0.34 0.08 15 Limitation Of Regreion Analyi Regreion/Correlation cannot prove caue-and-effect relationhip Brightman article Don't ue the regreion model to predict beyond range of oberved X-value 16 8
Mean Square Error & Standard Error of Etimate Meaure amount of catter around the regreion line Serve a an etimate of σ SSE (y ŷ) M.S.E. n n - Standard Error of Etimate Square root of MSE Serve a an etimate of σ Ued for inference regarding the regreion line hypothei tet interval etimate Exam Score v. Hr Studied et SSE n (y ŷ) n - 17 t-tet for Significance of the Slope b 1 etimate ß 1 H 0 : ß 1 0 no relation between the two variable H A : ß 1?0 i a relation between the two variable tet tatitic b 1 whoe ampling ditr follow t n- Standard Error of the Slope meaure ROSE when ue b 1 to etimate ß 1 b 1 M.S.E. (x x) Exam Score v. Hr Studied et (x x) 18 9
Interval Etimation In Regreion Analyi What core would you predict for tudent who tudy 30 hour? We ve etimated that the mean core of all tudent tudying 30 hour i 85. Thi i a point etimate baed on a ample of n8. The etimate could be in error due to ource: 1) ampling error ince b o and b 1 are ample reult, they may be biaed we're not certain where the true population regreion equation i ) tochatic relation wherever the true population regreion equation actually i, there i catter around it due to the combined effect of other variable 19 Confidence Interval Etimate of the Mean Value of y Etimate the mean value for y at a given value of x Standard Error of the Conditional Mean (nib) account for ampling error in etimating b 0 and b 1 which would affect our predicted value ŷ et 1 n (x x) (x x) + CIfor y ŷ ± t n ŷ Pg. 59: notice that the width of the confidence band increae a you predict further away from x-bar Exam Score v. Hour Studied 95% CI for the mean core of tudent who tudy 30 hour 0 10
Prediction Interval Etimate of an Individual Value of y Etimate an individual value for y at a given x Standard Error of the Forecat (nib) account for ampling error and the fact that there i diperion around the regreion line ind et 1 1 (x x) + n (x x) + PI for y ind ŷ ± t n ind Pg. 531: notice that PI band are wider than CI band and that each i wider a you predict further away from x-bar Exam Score v. Hour Studied 95% PI for individual core of a particular tudent who tudie 30 hr 1 11