The Randomized Block Design

Size: px

Start display at page:

Download "The Randomized Block Design"

Cori King
5 years ago
Views:

1 Statstcal Methods I (EXST 75) Page 131 A factoral s a way of eterg two or more treatmets to a aalyss. The descrpto of a factoral usually cludes a measure of sze, a by, 3 by 4, 6 by 3 by 4, by by, etc. Iteractos were dscussed. Iteractos test addtvty of the ma effects Iteractos are a measure of cosstecy the behavor or the cells relatve to the ma effects. Iteractos are tested alog wth the ma effects Iteractos should ot be gored f sgfcat. Factoral aalyses ca be doe as two-way ANOVAs SAS, or they ca be doe as cotrasts. The Radomzed Block Desg Ths aalyss s smlar may ways to a two-way ANOVA The CRD s defed by the lear model, j j. The smplest verso of the CRD has oe treatmet ad oe error term. The factoral treatmet arragemet dscussed prevously occurred wth a CRD, ad t had several dfferet treatmets, jk 1 j 1 j jk. Ths model has two treatmets ad oe error. It could have may more treatmets, ad t would stll be a factoral desg. Desgs havg a sgle treatmet or multple treatmets ca all occur wth a CRD ad are referred to as dfferet treatmet arragemets. There are other modfcatos of a CRD that could be doe. Istead of multple treatmets we may fd t ecessary to subdvde the error term. Why would we do ths? Perhaps there s some varato that s ot of terest. If we gore t, that varato wll go to the error term. For eample, suppose we had a large agrcultural epermet, ad had to do our epermet 8 dfferet felds, or due to space lmtatos a greehouse epermet we had to separate our epermet to 3 dfferet greehouses or 5 dfferet cubators. Now there s a source of varato that s due to dfferet felds, or dfferet greehouses or cubators! If we do t as a CRD, we put our treatmets the model, but f there s some varato due to feld, greehouse or cubator t wll go to the error term. Ths would flate our error term a make t more dffcult to detect a dfferece (we would lose power). How do we prevet ths? Frst, make sure each treatmet occurs each feld, greehouse or cubator (preferably balaced). The we would factor the ew varato out of the error term by puttg t the model. jk j j jk Ths s ot a ew treatmet. We wll call t a BLOCK. Ths looks lke a factoral, but t s ot because the blocks are ot a source of varato that we are terested dscussg. Also, a factoral the teracto term s lkely to be somethg of terest. I a block desg the teracto s a error term, represetg radom varato of epermetal uts across treatmets. James P. Geagha Copyrght 1

2 Statstcal Methods I (EXST 75) Page 13 Aother dfferece, treatmets ca be ether fed or radom. If both treatmets are fed, the teracto s fed. However, blocks are usually radom, ad the block teracto s always radom. So why are we blockg? It s usually used to add replcato to a epermet. Addtoal replcates are added aother feld, aother greehouse. O the oe had, the larger epermet should add power. O the other had, f we do ot take measures to keep the ew varato out of the error term, we may lose power due to the larger error. So, how does ths affect our aalyss? We stll have treatmets wth the test of treatmets the ANOVA (a F test). We ca stll do post-hoc tests o the treatmets. There s oly oe ew ssue, the error term. To eame ths we wll eed to look at the epected mea squares (EMS) for the Radomzed Block Desg. RBD EMS We wll eame two possble types of models. I the frst model we have treatmets ad blocks ad othg else. Each treatmet occurs each block ONCE. The epermet s smlar to a factoral some regards, but ot may. The model s j j j I ths model the error term ( j ) actually comes from the block by treatmet teractos ( j ). Ths s the oly error avalable, but that s OK. It s usually a good error term because t represets radom varato amog the epermetal uts. Blocks \ Treatmets A1 A A3 Block 1 a 1 b 1 a b 1 a 3 b 1 Block a 1 b a b a 3 b Block 3 a 1 b 3 a b 3 a 3 b 3 Ths looks lke a factoral. The aalyss s the same as the factoral, we get margal sums or meas ad proceed to calculate the SS for blocks ad treatmets ad teracto as before. However, there s oe bg dfferece. If ths was a factoral we would have Treatmet A, Treatmet B ad the A*B teracto. What would you use as a error term? We would ot have oe. A factoral ANOVA must have a error term for testg treatmets ad teracto. However, sce the teracto a block desg s assumed to be radom varato amog epermetal uts, t serves as a error term. So the model works for Block desgs. j j j The teracto term s a useful ad respectable error term. James P. Geagha Copyrght 1

3 Statstcal Methods I (EXST 75) Page 133 We do however, ths case, have oe addtoal assumpto. Assume that there s o teracto betwee the treatmets ad blocks. By teracto here we mea that the treatmet patters are the same each block. We do ot have a treatmet behavg oe way oe block, ad behavg dfferetly aother block. So the term represets radom varato epermetal uts ad ot some teracto the same sese as teracto a factoral desg. So what about those EMS? For the CRD we had two cases Source d.f. EMS Radom EMS Fed Treatmet t 1 t 1 Error t( 1) Total t 1 For the Block desg we have two cases, oe wth just blocks ad treatmets, ad oe wth replcate observatos wth the cells. Source d.f. EMS (o reps) EMS (wth Reps) Treatmet t 1 b b Block b 1 t t Eptl Error (b 1)(t 1) Rep Error Total tb( 1) tb 1 What s the ature of the replcates wth the block by treatmet cells? Suppose the epermetal ut s a plot a feld. We are evaluatg plat heght. The treatmet to be compared s 6 varetes of soy beas. The epermet s doe 3 felds (blocks). The error term s the feld by varety combatos. Ths epermet s ureplcated wth blocks. Addtoal replcato s usually doe oe of two ways. If we have oly oe plot (epermetal ut) for each treatmet each feld, we could sample several tmes wth each plot, samplg plat heght at several places the plot. Our samplg ut s a smaller ut tha the t6 t1 t4 t3 t6 t t3 t5 t1 t5 t t3 t1 t5 t4 t t4 t6 epermetal ut (a plot) so we have samplg error. Replcated wth blocks as multple samples a epermetal ut. Error s samplg error. Feld 1 Feld Feld 3 t6 t5 t1 t t4 t3 t3 t1 t6 t5 t t4 t3 t5 t1 t t4 t6 Feld 1 Feld Feld 3 James P. Geagha Copyrght 1

4 Statstcal Methods I (EXST 75) Page 134 Aother type of error comes from havg several plots wth a gve soybea varety each plot. Here each varety of soybea has several epermetal uts each feld. Feld 1 Feld Feld 3 I ths case the addtoal replcato represets a secod epermetal error, oe for block by t5 t3 t t3 t5 t1 t1 t4 t treatmet combatos ad oe for replcate plots wth a block. t1 t t6 t4 t3 t t6 t5 t4 I ths case we have replcated epermetal uts each block. t6 t5 t1 t4 t4 t3 t6 t4 t5 t1 t t6 t5 t1 t3 t3 t6 t Factoral EMS I have't metoed factorals EMS. Developg EMS ca be pretty smple. Start wth the lowest ut, ad move up the source table addg addtoal varace compoets for each ew term. Source EMS wth Reps Treatmet b Block t Eptl Error Rep Error Iteracto compoets occur o ther ow le, ad o the source le for each hgher effect cotaed the teracto. Each ma effect gets ts ow source. Now cosder whether the effects are fed or radom. Modfy fed effects to show SSEffects stead of varace compoets. Source EMS wth Reps Treatmet b t 1 Block t Eptl Error Rep Error If the model s a RBD we're doe, because the teracto s always a radom varable. For factorals that are radom models ad med models were doe. Cosder what the F test should be for the treatmet. Surprse, SAS always uses the resdual error term! But for factorals there s oe last detal. It s perfectly possble factoral desgs that both effects are fed, ad f both effects are fed the teracto s also fed! James P. Geagha Copyrght 1

5 Statstcal Methods I (EXST 75) Page 135 Source EMS wth Reps b A a 1 Treatmet A a B Treatmet B b 1 Iteracto A*B Error A B j a1b1 Ad a FIXED effect occurs oly o ts ow le, o other!! The fed teracto dsappears from the ma effects!!! Now what s the error term for testg treatmets ad teracto? Maybe SAS s rght? Or maybe SAS just does't kow what s fed ad what s radom. Testg ANOVAs SAS So tell SAS what s radom ad what s fed. Look for the followg addtos to SAS program. How do we tell SAS whch terms to test wth what error term? How do we get SAS to output EMS? How do we get SAS to automagcally test the rght treatmet terms wth the rght error terms? Summary Radomzed Block Desgs modfy the model by factorg a source of varato out of the error term order to reduce the error varace ad crease power. If the bass for blockg s good, ths wll be effectve. If the bass for blockg s ot good, we lose a few degrees of freedom from the error term ad may actually lose power. The block by treatmet combatos (teracto?) provde a measure of varato the epermetal uts ad provde a adequate error term. We have a addtoal assumpto that ths error term represets ONL epermetal error, ad ot some real teracto betwee the treatmets ad blocks. Epected mea squares for the RBD dcate that the epermetal error term s the correct error term, whether there s a samplg ut or ot. Factoral desgs, where effects are radom or med are smlar to RBD EMS. THE TREATMENT INTERACTION IS ACTUALL USED AS AN ERROR TERM! Whe the treatmets are fed, the ma effects do ot cota the teracto term, ad the resdual error term s the approprate error term. James P. Geagha Copyrght 1

6 Statstcal Methods I (EXST 75) Page 136 Sample sze ANOVA Some tetbooks use a slghtly dfferet epresso for the equato, but t s the same as the equato dscussed prevously. Oe mor chage s the epresso ( t t ) S. A d alteratve to the use of d s the epressg of the dfferece as a percetage of the mea. For eample, f we wated to test for a dfferece that was 1% of the mea we could use the S epresso ( t t). Ths epresso ca further be altered to epress the.1 dfferece terms of the coeffcet of varatocv S. Calculatg the sample szed eeded to detect a 1% chage the mea the becomes ( t t ) (1 CV). I aalyss of varace we may also wat to be able to detect a certa dfferece betwee two meas ( 1 ad ) out of the treatmet meas we are studyg, so our dfferece wll be 1. A pror aalyss, or a plot study, may provde us wth a estmate of the varace (MSE ANOVA). From here we ca use a formula pretty much the same as for the t-test dscussed earler. There s oe other lttle detal, however. We are bascally testg H: 1, from the sample t-test. Recall from our lear combatos we have a varace for ths lear combato that s the sum of the dvdual S1 S varaces of the mea. Therefore, the varace wll be.. Sce we are usually poolg varaces (ANOVA) the the formula smplfes to MSE( ) 1. Furthermore, sce we usually attempt to have balaced epermets (equal sample sze each group) for aalyss of varace the formula further smplfes to a epresso smlar to oe see prevously, ecept for the addto of, MSE. The addtoal occurs whe we are testg for dfferece two meas ( H: 1 ) as opposed to testg a mea agast a hypotheszed value ( H: ). Note oe very mportat thg here. I ths formula represets each group or populato beg studed, that s, each treatmet level a aalyss of varace! So for ANOVA or twosample t-test wth equal varace ad equal, the epresso for sample sze s ( t t) MSE. Note that ths s for each treatmet. I a two sample t-test, each d populato would have a sample sze of, so the total umber of observatos would be. I ANOVA we have t treatmets; each would have a sample sze of, so the total umber of observatos would be t. How ofte are we lkely to have stuatos wth equal varace ad equal? Is ths realstc? Actually, yes t s. Frst, ANOVA tradtoally requred equal varaces, though more moder aalytcal techques ca address the lack of homogeety. If ecessary, equal varaces may be acheved by a trasformato or some other f. If varace s ohomogeeous you could use the larger estmates ad get a coservatve estmate of. James P. Geagha Copyrght 1

7 Statstcal Methods I (EXST 75) Page 137 Secod, the most commo applcato for sample sze calculato s plag NEW studes, ad of course plag ew studes you usually do ot PLAN o ubalaced desgs ad o homogeeous varace. So these stuatos are realstc. Summary Fally we saw that ths formula s applcable to two-sample t-tests ad ANOVA, wth some modfcatos the estmate of the varace. These modfcatos are the same oes eeded for the -sample t-test as dctated by our study of lear combatos. However, the calculatos are smplfed by the commo ANOVA assumpto of equal varace ad the prevalece of balaced epermets. Revew of Aalyss of Varace procedures. 1) H : 1 = = 3 = 4 =... = t = ) H 1 : some s dfferet 3a) Assume that the observatos are ormally dstrbuted about each mea, or that the resduals (.e. devatos) are ormally dstrbuted. b) Assume that the observatos are depedet c) Assume that the varaces are homogeeous 4) Set the level of type I error. Usually =.5 5) Determe the crtcal value. For a balaced CRD wth a sgle factor treatmet the test s a F test wth t 1 ad t( 1) degrees of freedom (F =.5, t 1, t( 1) d.f. ). 6) Obta data ad evaluate. The treatmet sum of squares, as developed by Fsher, are coverted to a varace ad tested wth a F test agast the pooled error varace. I practce, the sum of squares are usually calculated ad preseted wth the degrees of freedom a table called a ANOVA table. For a balaced desg (all equal) the calculatos are, The ucorrected SS for treatmets s The ucorrected SS for the total s SS The correcto factor for both terms s ( ) t j j 1 j1 t j1 USS ( ) Treatmets 1 Total CF j j t ( j ) Our ANOVA aalyses wll be doe wth PROC MIXED ad PROC GLM. There s a PROC ANOVA, but t s a subset of PROC GLM. LSMeas calculato The calculatos of LSMeas are dfferet. For a balaced desg, the results wll be the same. However, for ubalaced desgs the results wll ofte dffer. j t. James P. Geagha Copyrght 1

8 Statstcal Methods I (EXST 75) Page 138 The MEANS statemet SAS calculates a smple mea of all avalable observatos the treatmet cells. The LSMeas statemet wll calculate the mea of the treatmet cell meas. Eample: The MEAN of 4 treatmets, where the observatos are 3,4,8 for a1, 3,5,6,7,9 for a, 7,8,6,7 for a3 ad 3,5,7 for a4 s The dvdual cells meas are 5, 6, 7 ad 5 for a1, a, a3 ad a4 respectvely. The mea of these 4 values s Ths would be the LSMea. Raw meas Treatmets a1 a a3 Meas b b Meas LSMeas meas Treatmets a1 a a3 Meas b b Meas Cofdece Itervals o Treatmets Lke all cofdece tervals o ormally dstrbuted estmates, ths wll employ a t-value ad wll be of the form Mea ta S The treatmet mea ca be obtaed from a meas (or LSMeas) statemet, but the stadard devato provded s ot the correct stadard error for the terval. The stadard error a smple CRD wth fed effects s the square root of MSE/, where s the umber of observatos used calculatg the mea. The calculato requres other cosderatos whe radom compoets are volved. For eample, PROC MIXED the use of the Satterthwate ad Keward-Roger appromatos, the use of varous estmato methods (the default s REML) ad specfcatos of covarace structure are all thgs that ca affect degrees of freedom. The use of MSE the umerator s the default PROC GLM, ad f a dfferet error s desred t must be specfed by the user. PROC MIXED s capable of detectg ad usg ad error other tha the MSE where approprate. If there are several error terms (e.g. epermetal error ad samplg error) use the oe that s approprate for testg the treatmets. Whe a error term other tha the resdual s approprate for testg the treatmets, the degrees of freedom for the tabular t value are the d.f. from the error term used for testg. Ths varace term would also be used to calculate the stadard error for treatmet meas. James P. Geagha Copyrght 1

9 Statstcal Methods I (EXST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato were developed by Sr Fracs Galto the late 18 s. The applcato ca be used to test for a statstcally sgfcat correlato betwee the varables. Fdg a relatoshp does ot prove a cause ad effect relatoshp, but the model ca be used to quatfy a relatoshp where oe s kow to est. The model provdes a measure of the rate of chage of oe varable relatve to aother varable.. There s a potetal chage the value of varable as the value of varable X chages. Varable values wll always be pared, oe termed a depedet varable (ofte referred to as the X varable) ad a depedet varable (termed a varable). For each value of X there s assumed to be a ormally dstrbuted populato of values for the varable. The lear model whch descrbes the relatoshp betwee two varables s gve as X 1 X The varable s called the depedet varable or respose varable (vertcal as). X s the populato equato for a straght le. No error s eeded ths y. 1 equato because t descrbes the le tself. The term y. s estmated wth at each value of X wth ˆ. y. = the true populato mea of at each value of X The X varable s called the depedet varable or predctor varable (horzotal as). = the true value of the tercept (the value of whe X = ) 1 = the true value of the slope, the amout of chage for each ut chage X (.e. f X chages by 1 ut, chages by 1 uts). The two populato parameters to abe estmated, ad 1 are also referred to as the regresso coeffcets. All varablty the model s assumed to be due to, so varace s measured vertcally The varablty s assumed to be ormally dstrbuted at each value of X The X varable s assumed to have o varace sce all varablty s (ths s a ew assumpto) The values ad 1 (b ad b 1 for a sample) are called the regressos coeffcets. The value s the value of at the pot where the le crosses the as. Ths value s called the tercept. If ths value s zero the le crosses at the org of the X ad James P. Geagha Copyrght 1

10 Statstcal Methods I (EXST 75) Page 14 aes, ad the lear equato reduces from =b + b 1 X to =b 1 X ad s sad to have o tercept, eve though the regresso le does cross the as. The uts o b are the same uts as for. The 1 value s called the slope. It determes the cle or agle of the regresso le. If the slope s, the le s horzotal. At ths pot the lear model reduced to =b, ad the regresso s sad to have o slope. The slope gves the chage per ut of X. The uts o the slope are the uts per X ut. The populato equato for the le descrbes a perfect le wth o varato. I practce there s always varato about the le. We clude a addtoal term to represet ths varato. X for a populato 1 b b X e for a sample 1 Whe we put ths term the model, we are descrbg dvdual pots as ther posto o the le plus or mus some devato The Sum of Squares of devatos from the le wll form the bass of a varace for the regresso le X Whe we leave the e off the sample model we are descrbg a pot o the regresso le, predcted from the sample estmates. To dcate ths we put a hat o the value, ˆ b b X. 1 Characterstcs of a Regresso Le The le wll pass through the pot, X (also the pot b, ) The sum of squared devatos (measured vertcally) of the pots from the regresso le wll be a mmum. Values o the le for ay value of X ca be descrbed by the equato ˆ b b1x Commo objectves Regresso : there are a umber of possble objectves Determe f there s a relatoshp betwee ad X. Ths would be determed by some hypothess test. The stregth of the relatoshp s, to some etet, reflected the correlato or R value. Determe the value of the rate of chage of relatve to X. Ths s measured by the slope of the regresso le. Ths objectve would usually be accompaed by a test of the slope agast (or some other value) ad/or a cofdece terval o the slope. Establsh ad employ a predctve equato for from X. James P. Geagha Copyrght 1

11 Statstcal Methods I (EXST 75) Page 141 Ths objectve would usually be preceded by a Objectve 1 above to show that a relatoshp ests. The predcted values would usually be gve wth ther cofdece terval, or the regresso wth ts cofdece bad. Assumptos Regresso Aalyss Idepedece The best guaratee of ths assumpto s radom samplg. Ths s a dffcult assumpto to check. Ths assumpto s made for all tests we wll see ths course. Normalty of the observatos at each value of X (or the pooled devatos from the regresso le) Ths s relatvely easy to test f the approprate values are tested (e.g. resduals ANOVA or Regresso, ot the raw values). Ths ca be tested wth the Shapro-Wlks W statstc PROC UNIVARIATE. Ths assumpto s made for all tests we have see ths semester ecept the Ch square tests of Goodess of Ft ad Idepedece Homogeety of error (homogeeous varaces or homoscedastcty) Ths s easy to check for ad to test aalyss of varace (S o mea or tests lke Bartalett s ANOVA). I Regresso the smplest way to check s by eamg the the resdual plot. Ths assumpto s made for ANOVA (for pooled varace) ad Regresso. Recall that sample t-tests the equalty of the varaces eed ot be assumed, t ca be readly tested. X measured wthout error: Ths must be assumed ordary least squares regressos, sce all error s measured a vertcal drecto ad occurs. Assumptos geeral assumptos The varable s ormally dstrbuted at each value of X The varace s homogeeous (across X). Observatos are depedet of each other ad e depedet of the rest of the model. Specal assumpto for regresso. Assume that all of the varato s attrbutable to the depedet varable (), ad that the varable X s measured wthout error. Note that the devatos are measured vertcally, ot horzotally or perpedcular to the le. X James P. Geagha Copyrght 1

12 Statstcal Methods I (EXST 75) Page 14 Fttg the le Fttg the le starts wth a corrected SSDevato, ths s the SSDevato of the observatos from a horzotal le through the mea. The le wll pass through the pot X,. The ftted le s pvoted o ths pot utl t has a mmum SSDevatos. How do we kow the SSDevatos are a mmum? Actually, we solve the equato for e, ad use calculus to determe the soluto that has a mmum of the sum of squared devatos. b b X e 1 e ( b b X ) ˆ 1 X ˆ e [ ( b b1x)] The le has some desrable propertes E(b ) = E(b 1 ) = 1 E( X ) =.X Therefore, the parameter estmates ad predcted values are ubased estmates. Dervato of the formulas ou do ot eed to lear ths dervato for ths class! However you should be aware of the process ad ts objectves. Ay observato from a sample ca be wrtte as b b1 X e. where; e = a devato of the observed pot from the regresso le The dea of regresso s to mmze the devato of the observatos from the regresso le, ths s called a Least Squares Ft. The smple sum of the devatos s zero,, so mmzg wll requre a square or a absolute value to remove the sg. e The sum of the squared devatos s, e ˆ = b b1x The objectve s to select b ad b 1 such that e s a mmum, by usg some techques from calculus. We have prevously defed the ucorrected sum of squares ad corrected sum of squares of a varable. James P. Geagha Copyrght 1

13 Statstcal Methods I (EXST 75) Page 143 The corrected sum of squares of The ucorrected SS s The correcto factor s The corrected SS s CSS S We wll call ths corrected sum of squares S ad the correcto factor C The corrected sum of squares of X We could defe the eact same seres of calculatos for X, ad call t S XX The corrected cross products of ad X We eed a cross product for regresso, ad a corrected cross product. The cross product s X. The ucorrected sum of cross products s X X The correcto factor for the cross products s C The corrected cross product s X CCP S X X X X The formulas for calculatg the slope ad tercept ca be derved as follows X Take the partal dervatve wth respect to each of the parameter estmates, b ad b 1. For b : ( e ) 1 b b b1x, whch s set equal to ad solved for b. 1 ( )(-1) b bx (ths s the frst ormal equato ) 1 Lkewse, for b 1 we obta the partal dervatve, set t equal to ad solved for b 1. ( e ) 1 b 1 ( b b X )(- X ) 1 1 ( X b X bx ) X bx b X ) (secod ormal equato ) 1 1 The ormal equatos ca be wrtte as, b b1x b X bx X 1 At ths pot we have two equatos ad two ukows so we ca solve for the ukow regresso coeffcet values b ad b 1. James P. Geagha Copyrght 1

14 Statstcal Methods I (EXST 75) Page 144 Revew For b the soluto s: b b1 X ad X b b1 b1x. Note that estmatg requres a pror estmate of b 1 ad the meas of the varables X ad. For b 1, gve that, X b b 1 ad X bx b1 X the X X X X b X bx b bx = X X X = 1 1 = 1 X bx b b X - X X SX b = so b 1 1 s the corrected cross products over the corrected X S XX X sum of squares of X The termedate statstcs eeded to solve all elemets of a SLR are, X,, X, X ad. We have ot see used the calculatos yet, but we wll eed t later to calculate varace. We wat to ft the best possble le through some observed data pots. We defe ths as the le that mmzes the vertcally measured dstaces from the observed values to the ftted le. The le that acheves ths s defed by the equatos X b b1 b1x X b - X 1 X X = S S X XX These calculatos provde us wth two parameter estmates that we ca the use to get the equato for the ftted le. ˆ b b1x. Testg hypotheses about regressos The total varato about a regresso s eactly the same calculato as the total for Aalyss of Varace. SSTotal = SSDevatos from the mea = Ucorrected SSTotal Correcto factor The smple regresso aalyss wll produce two sources of varato. SSRegresso the varato eplaed by the regresso SSError the remag, ueplaed varato about the regresso le. These sources of varato are epressed a ANOVA source table. James P. Geagha Copyrght 1

15 Statstcal Methods I (EXST 75) Page 145 Source d.f. Regresso 1 d.f. used to ft slope Error error d.f. Total 1 d.f. lost adjustg for ( correctg for ) the mea Note that oe degree of freedom s lost from the total for the correcto for the mea, whch actually fts the tercept. The sgle regresso d.f. s for fttg the slope. The correcto fts a flat le through the mea X The regresso actually fts the slope. X The dfferece betwee these two models s that oe has o slope, or a slope equal to zero ( b 1 ) ad the other has a slope ftted. Testg for a dfferece betwee these two cases s the commo hypothess test of terest regresso ad t s epressed as H: 1. The results of a regresso are epressed a ANOVA table. The regresso s tested wth a F test, formed by dvdg the MSRegresso by the MSError. Ths s a oe taled F test, as t was wth ANOVA, ad t has 1 ad 1 d.f. It tests the ull hypothess H: 1 versus the alteratve H: 1 1. The R statstc Source df SS MS F Regresso 1 SSRegresso MSRegresso MSRegresso / MSError Error SSError MSError Total 1 SSTotal Ths s a popular statstc for terpretato. The cocept s that we wat to kow what proporto of the corrected total sum of squares s eplaed by the regresso le. Source d.f. SS Regresso 1 SSReg Error SSError Total 1 SSTotal I the regresso the process of fttg the regresso the SSTotal s dvded to two parts, the sum of squares eplaed by the regresso (SSRegresso) ad the remag James P. Geagha Copyrght 1

16 Statstcal Methods I (EXST 75) Page 146 ueplaed varato (SSError). Sce these sum to the SStotal, we ca calculate what fracto of the total was ftted or eplaed the regresso. Ths s ofte epressed as a percetage of the total sum of squares eplaed by the model, ad s gve by R = SSRegresso / SSTotal. Ths s ofte multpled by 1% ad epressed as a percet. We mght state that the regresso eplas 75% of the total varato. Ths s a very popular statstc, but t ca be very msleadg. For some studes a R value of 5% or 35% ca be pretty good. For eample, f you are tryg to relate the abudace of a orgasm to evrometal varables. O the other had, f you are dog mophometrc relatoshps, lke relatg a crabs wdth to ts legth, a R value of less tha 9% s pretty bad. A ote o regresso models appled to trasformed varables. Studes of mophometrc relatoshps, cludg relatoshps of legths to weghts, should be doe wth logarthmc values of both X ad. The log() o log(x) model, called a power model, s a very fleble model used for may purposes. May other models volvg logs, powers, verses are possble. These wll ft curves of oe shape or aother. Whe usg trasformed varables regresso, all tests ad cofdece tervals are placed o the trasformed values. Otherwse, they are used lke ay other smple lear regresso. Numercal Eample : Some freshwater-fsh ectoparastes accumulate o the fsh as t grows. Oce the paraste s o the fsh, t does ot leave. The paraste completes t s lve cycle after the fsh s cosumed by a brd ad fds t way aga to the water. Sce the paraste attaches ad does ot leave, older fsh should accumulate more parastes. We wat to test ths hypothess. Raw data wth squares ad crossproducts Observato Age Parastes Age Paraste Age*Paraste James P. Geagha Copyrght 1

17 Statstcal Methods I (EXST 75) Page 147 Summary data Itermedate Calculatos X = 83 = 8 X = 51 = 368 Mea of X = X = Mea of = = 14.5 X = 1348 = 16 Correcto factors ad Corrected values (Sums of squares ad crossproducts) CF for X C = Corrected SS X S = CF for Cyy = 349 Corrected SS Syy = 379 CF for X Cy = Corrected CP X Sy = ANOVA Table (values eeded): SSTotal = 379 SSRegresso = / = SSError = = Model Parameter Estmates Slope = b 1 = 1. X X. 1 X X. S S y =165.5 / = Itercept = b = -b 1 X = * = Regresso Equato = b + b 1 * X + e = = * X + e Regresso Le Sum Mea Source df SS MS F Regresso Error Total Tabular F.5; 1, 14 = 4.6 Tabular F.1; 1, 14 = 8.86 = b + b 1 * X = = * X Stadard error of b 1 : S b1 1 MSE X X. MSE S so Sb = Cofdece terval o b1 where b 1 = ad t (.5/, 14df) =.145 P( * * ) =.95 P( ) =.95 Testg b1 agast a specfed value: e.g. H: 1 = 5 versus H1: 1 5 James P. Geagha Copyrght 1

18 Statstcal Methods I (EXST 75) Page 148 where b 1 = , S b1 = ad t (.5/, 14df) =.145 = ( ) / = Stadard error of the regresso le (.e. ) : 1 X X. S MSE X ˆ X X. 1 Stadard error of the dvdual pots (.e. ): Ths s a lear combato of ad e, so the varaces are the sum of the varace of these two, where the varace of e s MSE. The stadard error s the S S MSE X = X ˆ 1 XX. 1 XX. MSE MSE MSE 1 XX. XX. 1 1 Stadard error of b s the same as the stadard error of the regresso le where X = Square Root of [ ( /9.4375)] = Cofdece terval o b, where b = ad t (.5/, 14df) =.145 P( * * ) =.95 P( ) =.95 Estmate the stadard error of a dvdual observato for umber of parastes for a te-yearold fsh: b b1x = *X= *1 = Square Root of [ *(1+.65+( )/9.4375)] = Square Root of [ *(1+.65+( )/9.4375)] = Cofdece terval o X=1 P( * X= * ) =.95 P( X= ) =.95 Calculate the coeffcet of Determato ad correlato R = or % r = See SAS output Overvew of results ad fdgs from the SAS program James P. Geagha Copyrght 1

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato