Chapter 2 Supplemental Text Material

-. Models for the Data ad the t-test Chapter upplemetal Text Materal The model preseted the text, equato (-3) s more properl called a meas model. ce the mea s a locato parameter, ths tpe of model s also sometmes called a locato model. There are other was to wrte the model for a t-test. Oe possblt s = µ + τ + ε R T =, =,,, where µ s a parameter that s commo to all observed resposes (a overall mea) ad τ s a parameter that s uque to the th factor level. ometmes we call τ the th treatmet effect. Ths model s usuall called the effects model. ce the meas model s = µ + ε R T =, =,,, we see that the th treatmet or factor level mea s µ = µ + τ; that s, the mea respose at factor level s equal to a overall mea plus the effect of the th factor. We wll use both tpes of models to represet data from desged expermets. Most of the tme we wll work wth effects models, because t s the tradtoal wa to preset much of ths materal. However, there are stuatos where the meas model s useful, ad eve more atural. -. Estmatg the Model Parameters Because models arse aturall examg data from desged expermets, we frequetl eed to estmate the model parameters. We ofte use the method of least squares for parameter estmato. Ths procedure chooses values for the model parameters that mmze the sum of the squares of the errors ε. We wll llustrate ths procedure for the meas model. For smplct, assume that the sample szes for the two factor levels are equal; that s = =. The least squares fucto that must be mmzed s L = = = ( µ ) = Now L L = ( µ ) ad = ( µ ) ad equatg these partal dervatves µ = µ = to zero elds the least squares ormal equatos ε

µ = µ = The soluto to these equatos gves the least squares estmators of the factor level meas. The soluto s µ = ad µ = ; that s, the sample averages at leach factor level are the estmators of the factor level meas. Ths result should be tutve, as we lear earl o basc statstcs courses that the sample average usuall provdes a reasoable estmate of the populato mea. However, as we have ust see, ths result ca be derved easl from a smple locato model usg least squares. It also turs out that f we assume that the model errors are ormall ad depedetl dstrbuted, the sample averages are the maxmum lkelhood estmators of the factor level meas. That s, f the observatos are ormall dstrbuted, least squares ad maxmum lkelhood produce exactl the same estmators of the factor level meas. Maxmum lkelhood s a more geeral method of parameter estmato that usuall produces parameter estmates that have excellet statstcal propertes. We ca also appl the method of least squares to the effects model. Assumg equal sample szes, the least squares fucto s L = = = ( µ τ ) ad the partal dervatves of L wth respect to the parameters are = ε L = L L ( µ τ), = ( µ τ),ad = ( µ τ ) µ τ τ = = = Equatg these partal dervatves to zero results the followg least squares ormal equatos: µ + τ + τ = µ + τ = = µ + τ = = = Notce that f we add the last two of these ormal equatos we obta the frst oe. That s, the ormal equatos are ot learl depedet ad so the do ot have a uque soluto. Ths has occurred because the effects model s overparameterzed. Ths

stuato occurs frequetl; that s, the effects model for a expermet wll alwas be a overparameterzed model. Oe wa to deal wth ths problem s to add aother learl depedet equato to the ormal equatos. The most commo wa to do ths s to use the equato τ + τ =. Ths s, a sese, a tutve choce as t essetall defes the factor effects as devatos from the overall mea µ. If we mpose ths costrat, the soluto to the ormal equatos s µ = τ =, =, That s, the overall mea s estmated b the average of all sample observato, whle each dvdual factor effect s estmated b the dfferece betwee the sample average for that factor level ad the average of all observatos. Ths s ot the ol possble choce for a learl depedet costrat for solvg the ormal equatos. Aother possblt s to smpl set the overall mea equal to a costat, such as for example µ =. Ths results the soluto µ = τ =, =, Yet aother possblt s τ =, producg the soluto µ = τ = τ = There are a fte umber of possble costrats that could be used to solve the ormal equatos. A obvous questo s whch soluto should we use? It turs out that t reall does t matter. For each of the three solutos above (deed for a soluto to the ormal equatos) we have µ = µ + τ =, =, That s, the least squares estmator of the mea of the th factor level wll alwas be the sample average of the observatos at that factor level. o eve f we caot obta uque estmates for the parameters the effects model we ca obta uque estmators of a fucto of these parameters that we are terested. We sa that the mea of the th factor level s estmable. A fucto of the model parameters that ca be uquel estmated regardless of the costrat selected to solve the ormal equatos s called a estmable fucto. Ths s dscussed more detal Chapter 3. -3. A Regresso Model Approach to the t-test The two-sample t-test ca be preseted from the vewpot of a smple lear regresso model. Ths s a ver structve wa to thk about the t-test, as t fts cel wth the geeral oto of a factoral expermet wth factors at two levels, such as the golf

expermet descrbed Chapter. Ths tpe of expermet s ver mportat practce, ad s dscussed extesvel subsequet chapters. I the t-test scearo, we have a factor x wth two levels, whch we ca arbtrarl call low ad hgh. We wll use x = - to deote the low level of ths factor ad x = + to deote the hgh level of ths factor. Fgure -3. below s a scatter plot (from Mtab) of the portlad cemet mortar teso bod stregth data from Chapter. Fgure -3. catter plot of teso bod stregth data 8.5 Bod tregth 7.5 6.5 - Factor Level (x) We wll a smple lear regresso model to ths data, sa = β + β x + ε where β ad βare the tercept ad slope, respectvel, of the regresso le ad the regressor or predctor varable s x = ad x =+. The method of least squares ca be used to estmate the slope ad tercept ths model. Assumg that we have equal sample szes for each factor level the least squares ormal equatos are: β = = β = = = The soluto to these equatos s β = β = ( ) Note that the least squares estmator of the tercept s the average of all the observatos from both samples, whle the estmator of the slope s oe-half of the dfferece betwee the sample averages at the hgh ad low levels of the factor x. Below s the output from the lear regresso procedure Mtab for the teso bod stregth data.

Predctor Coef tdev T P Costat 7.343.636 7.86. Factor L.579.6356 9.. =.843 R-q = 8.% R-q(ad) = 8.% Aalss of Varace ource DF M F P Regresso 6.748 6.748 8.98. Resdual Error 8.4544.88 Total 9 8.59 Notce that the estmate of the slope (gve the colum labeled Coef ad the row labeled Factor L above) s.579 ( ) = ( 7. 9 6. 76) ad the estmate of the tercept s 7.343 ( + ) = ( 7. 9 + 6. 76). (The dfferece s due to roudg the maual calculatos for the sample averages to two decmal places). Furthermore, otce that the t-statstc assocated wth the slope s equal to 9., exactl the same value we gave Table - the text. Now smple lear regresso, the t- test o the slope s actuall testg the hpotheses H: β = H: β ad ths s equvalet to testg H :µ = µ. It s eas to show that the t-test statstc used for testg that the slope equals zero smple lear regresso s detcal to the usual two-sample t-test. Recall that to test the above hpotheses smple lear regresso the t-statstc s where xx = ( x x) = t = β σ xx s the corrected sum of squares of the x s. Now our specfc problem, x =, x = ad x = +, so xx =. Therefore, sce we have alread observed that the estmate of σ s ust p, t = β σ = xx ( ) = p p

Ths s the usual two-sample t-test statstc for the case of equal sample szes. -4. Costructg Normal Probablt Plots Whle we usuall geerate ormal probablt plots usg a computer software program, occasoall we have to costruct them b had. Fortuatel, t s relatvel eas to do, sce specalzed ormal probablt plottg paper s wdel avalable. Ths s ust graph paper wth the vertcal (or probablt) scale arraged so that f we plot the cumulatve ormal probabltes (.5)/ o that scale versus the rak-ordered observatos () a graph equvalet to the computer-geerated ormal probablt plot wll result. The table below shows the calculatos for the umodfed portlad cemet mortar bod stregth data. () (.5)/ z () 7.5.5 -.64 7.63.5 -.4 3 7.75.5 -.67 4 7.86.35 -.39 5 7.9.45 -.3 6 7.96.55.3 7 8..65.39 8 8.5.75.67 9 8..85.4 8.5.95.64 Now f we plot the cumulatve probabltes from the ext-to-last colum of ths table versus the rak-ordered observatos from the secod colum o ormal probablt paper, we wll produce a graph that s detcal to Fgure -a the text. A ormal probablt plot ca also be costructed o ordar graph paper b plottg the stadardzed ormal z-scores z () agast the raked observatos, where the stadardzed ormal z-scores are obtaed from. PZ ( z) = ( z) = 5 Φ where Φ( ) deotes the stadard ormal cumulatve dstrbuto. For example, f (.5)/ =.5, the Φ( z) = 5. mples that z = 64.. The last colum of the above table dsplas the values of the ormal z-scores. Plottg these values agast the raked observatos o ordar graph paper wll produce a ormal probablt plot equvalet to Fgure -a. As oted the text, ma statstcs computer packages preset the ormal probablt plot ths wa.

-5. More About Checkg Assumptos the t-test We oted the text that a ormal probablt plot of the observatos was a excellet wa to check the ormalt assumpto the t-test. Istead of plottg the observatos, a alteratve s to plot the resduals from the statstcal model. Recall that the meas model s = µ + ε R T =, =,,, ad that the estmates of the parameters (the factor level meas) ths model are the sample averages. Therefore, we could sa that the ftted model s =, =, ad =,,, That s, a estmate of the th observato s ust the average of the observatos the th factor level. The dfferece betwee the observed value of the respose ad the predcted (or ftted) value s called a resdual, sa e =, =., The table below computes the values of the resduals from the portlad cemet mortar teso bod stregth data. Observato e = = 6. 76 e = = 7. 9 6.85.9 7.5 -.4 6.4 -.36 7.63 -.9 3 7..45 8.5.33 4 6.35 -.4 8..8 5 6.5 -.4 7.86 -.6 6 7.4.8 7.75 -.7 7 6.96. 8..3 8 7.5.39 7.9 -. 9 6.59 -.7 7.96.4 6.57 -.9 8.5.33 The fgure below s a ormal probablt plot of these resduals from Mtab.

Normal Probablt Plot of the Resduals (respose s Bod tr) Normal core - - -.5. Resdual.5 As oted secto -3 above we ca compute the t-test statstc usg a smple lear regresso model approach. Most regresso software packages wll also compute a table or lstg of the resduals from the model. The resduals from the Mtab regresso model ft obtaed prevousl are as follows: Obs Factor Level Bod tr Ft tdev Ft Resdual t Resd -. 6.85 6.764.899.86.3 -. 6.4 6.764.899 -.364 -.35 3 -. 7. 6.764.899.446.65 4 -. 6.35 6.764.899 -.44 -.54 5 -. 6.5 6.764.899 -.44 -.9 6 -. 7.4 6.764.899.76. 7 -. 6.96 6.764.899.96.73 8 -. 7.5 6.764.899.386.43 9 -. 6.59 6.764.899 -.74 -.65 -. 6.57 6.764.899 -.94 -.7. 7.5 7.9.899 -.4 -.56. 7.63 7.9.899 -.9 -.8 3. 8.5 7.9.899.38. 4. 8. 7.9.899.78.9 5. 7.86 7.9.899 -.6 -.3 6. 7.75 7.9.899 -.7 -.64 7. 8. 7.9.899.98. 8. 7.9 7.9.899 -. -.8 9. 7.96 7.9.899.38.4. 8.5 7.9.899.8.85 The colum labeled Ft cotas the averages of the two samples, computed to four decmal places. The resduals the sxth colum of ths table are the same (apart from roudg) as we computed mauall.

-6. ome More Iformato About the Pared t-test The pared t-test exames the dfferece betwee two varables ad test whether the mea of those dffereces dffers from zero. I the text we show that the mea of the dffereces µ d s detcal to the dfferece of the meas two depedet samples, µ µ. However the varace of the dffereces s ot the same as would be observed f there were two depedet samples. Let d be the sample average of the dffereces. The V( d) = V( ) = V( ) + V( ) Cov(, ) σ ( ρ) = assumg that both populatos have the same varace σ ad that ρ s the correlato betwee the two radom varables ad. The quatt d / estmates the varace of the average dfferece d. I ma pared expermets a strog postve correlato s expected to exst betwee ad because both factor levels have bee appled to the same expermetal ut. Whe there s postve correlato wth the pars, the deomator for the pared t-test wll be smaller tha the deomator for the two-sample or depedet t-test. If the two-sample test s appled correctl to pared samples, the procedure wll geerall uderstate the sgfcace of the data. Note also that whle for coveece we have assumed that both populatos have the same varace, the assumpto s reall uecessar. The pared t-test s vald whe the varaces of the two populatos are dfferet.