Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of the damage ( 1000$'s) to resdetal propertes ad the dstace ( mles) to the earest fre stato. 50 Plot of Damage to Property by Dstace from Fre Stato Damage to Property - 1,000s $ 45 40 35 30 5 0 15 10 5 0 0 1 3 4 5 6 7 Dstace from Fre Stato - Mles There s clearly a creasg tred ths plot, as the dstace creases the amout of damage dollars also creases. Obvously there s a correlato betwee the damage dollars ad the dstace from the fre stato. It s also obvous that ths correlato s postve - as the amout of damage crease as the dstace creases. The Pearso coeffcet of correlato allows us to express the stregth of ths lear relatoshp o a scale less measure from a rage from -1 to 1. A Pearso correlato (ofte desgated r) of 1 would sgfy a perfect postve correlato, -1 a perfect egatve correlato ad 0 o correlato at all. Note that a correlato coeffcet close to zero does ot ecessarly mea that there s o relatoshp betwee two varables - merely that o (or very lttle) lear relatoshp exsts betwee two varables. 1
Summary from Lecture Notes The Pearso product movemet coeffcet of correlato, r, s a measure of the stregth of the lear relatoshp betwee two varables x ad y. It s computed (for a sample of measuremets o x ad y) as follows: r xx yy Where xx yy x y ( x)( y) ( x ) ( y ) The data show the plot above s avalable o the class Blackboard as fredamage.mtw. Ope ths data set ad get the correlato coeffcet. Go to Stat - Basc Statstcs - Correlato... ad select 'Dstace' ad 'Damage - $s' as the two varables, the clck OK. What s the correlato coeffcet? I statstcs we are terested s usg a sample correlato coeffcet to estmate the populato correlato coeffcet. Ths volves gettg a stadard error for the correlato coeffcet estmate ad perhaps coductg tests of hypotheses. However as the correlato coeffcet s essetally part of smple lear regresso we wll do ths Part II of ths lab.
PART II: 1. Smple Lear Regresso I smple lear regresso we attempt to model a lear relatoshp betwee two varables wth a straght le ad make statstcal fereces cocerg that lear model. We are assumg here that the varable o the x-axs (the dstace from the fre stato) wll predct the amout of fre damage caused to the house. I ths case therefore, dstace from the fre stato s the predctor varable ad the damage to the property s the respose varable.. Fttg the Le Stll usg the dataset 'fredamage.mtw', create a plot of the data.go to Graph - Scatterplot... ad select 'Damage - $' as the y varable ad 'Dstace' as the x varable, clck OK. Whe fttg a straght-le model we ft what s called the least squares le. Ths s a straght le such that the vertcal dstace betwee the pots ad the le s kept at a mmum. A equato for a straght-le model has two compoets, the tercept ad the slope. Therefore the equato of the least squares le takes the form, Itercept + slope(predctor varable) + ε (the error or resdual term) or more geerally: E(respose varable)β 0 + β 1 (predctor varable) + ε, Where β 0 s the tercept β 1 s the slope of the le ε s the dstace betwee the ftted le ad the data pot (Note: t s the square of ths quatty that we mmse usg the method of least squares.) 3
Summary From Lecture Notes The formulae for the estmates of the slope ad the tercept are; Slope: ˆβ 1 Itercept: β 0 yβ1 x x Where ( x x)( y y) xx ( x x) ˆ ( x ) x ˆ ( x)( y) sample sze 3. Statstcal Iferece The fttg of the least squares le s essetally mathematcal ad of tself does ot have ay stochastc (.e. radom) cotet. However from a statstcal pot of vew the fttg of the least squares le s a statstcal modellg exercse. We are attemptg to estmate the true lear relatoshp (.e. the populato) from sample data. It s possble therefore that the apparet lear tred see the plot s a result of samplg varato ad does ot reflect a actual lear relatoshp betwee the two varables the populato. Therefore we must coduct a hypothess test to compare the amout of varato the data explaed by the lear model wth a estmate of backgroud or samplg varato. The approach take s broadly smlar to that of ANOVA ad deed a ANOVA table s costructed for ths purpose. The hypothess beg tested ths ANOVA s ( the case of smple lear regresso) that the slope of the le 0, versus a alteratve that the slope of the le s ot 0. Ho: β 1 0 Ha: β 1 0 4. Fttg A Regresso Model MINITAB Go to Stat - Regresso - Ftted Le Plot... 1. Select the respose varable here. Select the predctor varable here 3. Esure that the lear model s selected 4
Ths commad wll gve you a plot of the respose versus the predctor wth the least squares le show o the plot. The least squares regresso equato wll be dsplayed over the plot. If you look at the sesso wdow you wll also see the ANOVA table for ths model ad the assocated p value. What s the least squares regresso equato? Is the relatoshp betwee dstace ad damage postve or egatve? Summarse the hypothess that s beg tested the ANOVA table, clud Ho, Ha, α (set to 0.05), the test statstc, the p value ad state your cocluso. 5. Stadard Error of the Slope We ca calculate a stadard error of the slope usg the s, whch s our estmate of σ. Ths wll allow us to test hypotheses about the slope (more geeral test tha that cotaed wth the ANOVA table) ad also allow us to get a cofdece terval for the slope. Summary from Lecture Notes The stadard error of the slope s σ β 1 σ ˆ whch s estmated as xx s ˆβ 1 s xx A hypothess test for the slope Oe-Taled test Two-Taled test Ho: β 1 β 10 Ha: β 1 < β 10 (or β 1 > β 10 ) Ho: β 1 β 10 Ha: β 1 β 10 Test statstc: Rejecto Rego: ˆ β1 β t s ˆ β1 10 ˆ β1 β s xx 10 Oe-Taled test Two-taled test t < -t α t > t α/ (or t < t α ) where t α ad t α/ are based o (-) degrees of freedom. Assumptos: Same assumptos as prevous summary box. 5
Go to Stat - Regresso - Regresso... 1. Select the respose varable. Select the predctor varable What s the stadard error of the slope? MINITAB by default tests the two-taled ull hypothess that the slope s zero. Report the results of ths hypothess test the usual way (use α.05). Calculate the square root of the F test statstc from the ANOVA table. What s the result? What do otce whe you compare ths value to the value of the t test statstc for testg the slope s zero? 6 The Coeffcet of Determato - R How much of the total sample varablty aroud y s explaed by the lear relatoshp betwee x ad y? The aswer to ths s gve by the Coeffcet of Determato or R. The Coeffcet of Determato s the rato betwee the total varato the data ad varato 'explaed' by the lear relatoshp betwee the predctor ad respose varables. Coeffcet of Determato - R R regresso / Total What s R for the regresso model ftted above? Note, that the case of a smple lear regresso model the coeffcet of determato s the correlato coeffcet squared. Calculate the square root of R ad compare t to the correlato coeffcet computed part I. 6
Assgmet: Submt ths assgmet to room 550A the Lbrary buldg before the ed of term. Iclude approprate output from Mtab to support your aswers. Remember to put your ame ad studet umber o your assgmet. From the Mtab class page dowload the fle amed TV. Ths cotas the data for 15 studets o ther fal year mark ad the umber of hours they sped watchg TV. 1. Costruct a smple lear regresso le for ths data ad show the graph.. What s the correlato coeffcet for ths data? 3. Is there a egatve or a postve correlato betwee the umber of hours spet watchg TV ad the ed of year grade? 4. What s the value of the tercept wth the regresso le ad the y-axs? 5. What s the slope of ths regresso le? 6. Is there a sgfcat decrease the fal year mark wth the legth of tme each studet speds watchg TV every ght? REVISION SUMMARY After ths lab you should be able to: - Calculate the correlato coeffcet by had ad Mtab - Ft a smple lear regresso le to data usg Mtab - Uderstad the hypothess the smple lear regresso ANOVA table - Test f the slope of the model s equal to zero or ot - Costruct a cofdece terval for the slope 7