Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that lnear model. Usng the fre damage dataset from last week, we are assumng here that the varable on the x-axs (the dstance from the fre staton) wll predct the amount of fre damage caused to the house. In ths case therefore, dstance from the fre staton s the predctor varable and the damage to the property s the response varable. 2. Fttng the Lne Construct a scatter plot of the data to determne the nature of the relatonshp between the two varables. Then calculate the correlaton coeffcent, whch descrbes the relatonshp numercally. The next step s to calculate the equaton of the least squares regresson lne, whch s the data s lne of best ft. Ths step s what s known as fttng the lne. The reason ths s constructed s to help the researcher to see any trends and make predctons. The lne of best ft s needed because we want to predct the values of y from the values of x. In other words we want to predct the damage n s usng the dstance from the fre staton. When fttng a straght-lne model we ft what s called the least squares lne. Ths s a straght lne such that the vertcal dstance between the ponts and the lne s kept at a mnmum. An equaton for a straght-lne model has two components, the ntercept and the slope. Therefore the equaton of the least squares regresson lne takes the form, Response = ntercept + slope* (predctor varable) + ε (the error or resdual term) Or more generally: ŷ = a+ bx+ε Where a s the ntercept b s the slope of the lne ε s the dstance between the ftted lne and the data pont (I.e. The resduals) X s the chosen value of the predctor varable 1
Summary from lecture notes The formulae for the estmates of the slope and the ntercept are; Slope: SSxy b = Intercept: a = y bx SS x Where SS = ( x x)( y y) SS xy xx = ( x x) = xy ( x ) 2 2 = x n ( x)( y) 2 n n = sample sze 3. Fttng A Regresson Model n MINITAB Usng the drop down menus n Mntab, go to Stat - Regresson - Ftted Lne Plot 1. Select the response varable here 2. Select the predctor varable here 3. Ensure that the lnear model s selected Ths command wll gve you a scatter plot of the response varable versus the predctor varable wth the least squares lne shown n blue on the plot. The least squares regresson equaton wll be dsplayed over the plot. If you look at the sesson wndow you wll also see the ANOVA table for ths model and the assocated p-value, smlar to the table below. We wll cover what ths ANOVA table means n the next class. 2
Regresson Analyss: Damage - $ versus Dstance The regresson equaton s Damage - $ = 10.28 + 4.919 Dstance Regresson Lne S = 2.31635 R-Sq = 92.3% R-Sq(adj) = 91.8% Analyss of Varance Source DF SS MS F P Regresson 1 841.766 841.766 156.89 0.000 Error 13 69.751 5.365 Total 14 911.517 ANOVA Table What s the least squares regresson equaton? What s the slope? What s the ntercept? What type of relatonshp s there between dstance and damage? Now that we know what the least squares regresson equaton s we can use t to make predctons for the dependent varable. If a buldng whch was on fre was 10 mles from the nearest fre staton how much damage would be caused to t n the event of a fre? Hnt: Substtute 10 n for X n the regresson lne equaton. (a) For each of the datasets used n last week s lab, fnd the least squares regresson equaton, the slope and the ntercept. 1. 2. 3. 4. 3
Answer the followng usng your answer from (a) If the three most recent volunteers n a blood donaton clnc had blood pressures of 86, 91 and 101 respectvely, what would your estmate of ther platelet-calcum concentratons be? Fve people were randomly selected and ther heghts were measured to be 145, 150, 161, 165 and 177cm. What are the estmated weghts of these people? If you look more closely at the fgures you have just calculated and compare them to the actual values from the orgnal dataset you wll notce that they are not exactly the same. Ths s because the calculated fgures are ftted usng the regresson lne (ndcated n blue on the scatter plot). The dscrepances between the two numbers are the resduals. The coeffcent of correlaton s one way of quantfyng how large ths dscrepancy s. 6. The Coeffcent of Determnaton - R 2 How much of the varaton n y s explaned by the lnear relatonshp between x and y? The answer to ths s gven by the Coeffcent of Determnaton or R 2. The Coeffcent of Determnaton s the rato between the total varaton n the data and varaton 'explaned' by the lnear relatonshp between the predctor and response varables. Coeffcent of Determnaton - R 2 R 2 = SS regresson / SS Total What s R 2 for the regresson model ftted to the fre damage dataset? What does ths fgure mean? Note, that n the case of a smple lnear regresson model the coeffcent of determnaton s the correlaton coeffcent squared. Calculate the square root of R 2 and compare t to the correlaton coeffcent. 4
Calculate and nterpret the coeffcent of determnaton for each of the datasets from last week s lab. 1. 2. 3. 4. 5
Assgnment: Due 2 week s tme. From the Mntab class page download the fle named TV. Ths contans the data for 15 students on ther fnal year mark and the number of hours they spend watchng TV. 1. Construct a smple lnear regresson lne for ths data and show the graph. 2. What s the correlaton coeffcent for ths data? 3. Is there a negatve or a postve correlaton between the number of hours spent watchng TV and the end of year grade? 4. What s the value of the ntercept wth the regresson lne and the y-axs? 5. What s the slope of ths regresson lne? Assgnments should be handed n at the begnnng of class two weeks from today. Late assgnments wll not be accepted. REVISION SUMMARY After ths lab you should be able to: - Calculate the correlaton coeffcent by hand and n Mntab - Ft a smple lnear regresson lne to data usng Mntab - Make predctons from the least squares regresson lne 6