Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Lecture 4 Topic 3: General linear model (GLM), the fundamental of the analyi of variance (ANOVA), and completely randomized deign (CRD) The general linear model One population: An obervation i explained a a mean plu a random deviation ε i (error): Y i = µ + ε i The ε i ' are aumed to be from a population of uncorrelated ε' with mean zero. Independence among ε' i aured by random ampling.

Two population: Each obervation i explained a a grand mean plu an effect of it group (i.e. treatment) plu a random deviation ε i (error): Y ij = µ + τ i + ε ij An equivalent expreion of thi model: µ + τ = µ and µ + τ = µ τ + τ = 0 Yij =Y.. + (Y i. -Y..) + (Yij - Y i.) Example: Imagine an experiment with 0 tomato plant (yum!), each in it own pot. 5 of the pot receive fertilizer and the other 5 do not. The total yield of each plant (in kg) i recorded and the data are preented below: Plant Fertilized Not Fertilized.05..65 0.95 3.76.3 4.08.33 5.84.5 Treatment Mean General Mean Treatment Effect.876.9.534 0.34-0.34 Under the aumption of a general linear model, the yield of each tomato plant in the above experiment ha the following general form: yij = µ + τ + ε j ij For Plant 3 receiving fertilizer, the equation look like thi: y y 3, Fert = µ + τ Fert + ε 3, Fert 3, Fert =.534 + 0.34 0.6 =.76

More than two population (The Model I or fixed model ANOVA). Treatment effect are additive and fixed by the reearcher. τ i = 0 à H 0 : τ = = τ t = 0 H : Some τ i 0. Error are random, independent, and normally ditributed with a common variance about a zero mean. 3. In the cae of a fale H 0 (i.e. ome τ i 0), there will be an additional component of variation due to treatment effect equal to: τ i r t "Significant relative to what?" Significant relative to error. If the effect due to treatment (i.e. ignal) i found to be ignificantly larger than the fluctuation among obervation due to error (i.e noie), the treatment effect i aid to be real and ignificant. 3

The F ditribution From a normally ditributed population (or from two population with equal variance σ ):. Sample n item and calculate their variance. Sample n item and calculate their variance 3. Contruct a ratio of thee two ample variance ( ) / Thi ratio of thi tatitic will be cloe to and it expected ditribution i called the F- ditribution, characterized by two value for df (df = n, df = n )..0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 F(,40) F(8,6) F(6,8) F Figure Three example F-ditribution Value in an F table (Table A6) repreent the area under the curve to the right of the given F- value, with df and df. 3 4 ( n ) σ i ditributed according to χ n i ditributed according to F n, n F (,9), α=0.05 = (t 9, α/ ) ß Analogou to the relationhip 5. =.6 between χ and Z. 4

Example: Pull 0 obervation at random from each of two population. Now tet H 0 : σ = v. H : σ (a two-tailed tet): σ Interpretation: The ratio ( ) F α = 0.05,[ df= 9, df= 9] = 4.03 σ /, taken from ample of 0 individual from normally ditributed population with equal variance, i expected to be larger than 4.03 ( F α / = 0.05,[9,9] ) or lower than 0.4 ( F α / = 0.975,[9,9] ) by chance only 5% of the time. Teting the hypothei of equality of two mean The ratio between two etimate of σ can alo be ued to tet difference between mean: H 0 : µ = µ veru H : µ µ How can we ue variance to tet the difference between mean? By being creative in how we obtain etimate of σ. F = etimate of σ from ample mean etimate of σ from individual The denominator i an etimate of σ provided by the individual within a ample. If there are multiple ample, it i a weighted average of thoe ample variance. The numerator i an etimate of σ provided by the mean among ample. Recall: =, o ( ) n Y n = n Y ( n) F = among = within n Y The fundamental premie underlying ANOVA: When two population have different mean (but the ame variance), the etimate of σ baed on ample mean will include a contribution attributable to the difference among population mean and F will be higher than expected by chance. 5

Example: Yield (00 lb/acre) of wheat varietie and from plot to which the varietie were randomly aigned: Varietie Replication Y i. Y i. i 9 4 5 7 0 85 Y. = 7 6.5 3 9 9 8 00 Y. = 0 4.0 Y.. = 85 Y.. = 8.5 Treatment t = ; Replication n = 5 Begin by auming that the two population have the ame (unknown) variance σ.. Etimate the average variance within ample (the experimental error): = j (Y j Y.) n, = j (Y j Y.) n pooled = (n ) + (n ) = 4*6.5 + 4* 4.0 / (4 + 4) = 5.5 (n )+ (n ) within. Etimate the variance between (or among) ample: Y = t i= (Y i. Y..) t = [(7-8.5) + (0-8.5) ] / (-) = 4.5 n Y = 5 * 4.5 =.5 To tet H 0, we form a ratio of thee two etimate: between F = b / w =.5 / 5.5 = 4.9 Under our aumption (normality, equal variance), thi ratio i ditributed according to an F (t-, t(n-)) = F (,8) ditribution. From Table A.6, we find F 0.05,(,8) = 5.3. F calc = 4.9 < 5.3 = F crit SO, we fail to reject H 0 at α = 0.05. An F value of 4.9 or larger happen jut by chance about 7% of the time for thee degree of freedom. 6

Modern tatitic began in the mind of Ronald Fiher, the firt to recognize that variation i not jut noie drowning ignal, at bet a nuiance to be ignored. Variance itelf i a valid object of tudy, a fingerprint that provide great inight into the mechanim of natural phenomena. In hi word: "The population which are the object of tatitical tudy alway diplay variation in one or more repect. To peak of tatitic a the tudy of variation alo erve to emphaize the contrat between the aim of modern tatitician and thoe of their predeceor. For until comparatively recent time, the vat majority if worker in thi field appear to have had no other aim than to acertain aggregate, or average, value. The variation itelf wa not an object of tudy, but wa recognized rather a a troubleome circumtance which detracted from the value of the average.from the modern point of view, the tudy of the caue of variation of any variable phenomenon, from the yield of wheat to the intellect of [people], hould be begun by the examination and meaurement of the variation which preent itelf." R.A. Fiher Statitical Method for Reearch Worker (95) 7

ANOVA: Single factor deign The Completely Randomized Deign (CRD) CRD i the baic ANOVA deign A ingle factor i varied to form the different treatment Thee treatment are applied randomly to experimental unit There are a total of n = rt independent experimental unit in the experiment H 0 : µ = µ = = µ t veru H : Not all µ i are equal. The reult of the analyi are uually ummarized in an ANOVA table: Source df SS Definition SS MS F Total n - ( Y ij Y.. ) TSS i, j Treatment t r ( Y i. Y.. ) SST SST/(t-) MST/MSE i Error t(r-) = n - t ( Y ij Y i.) TSS - SST SSE/(n-t) i, j The mean quare for error (MSE): The average diperion of the obervation around their repective group mean. It i a valid etimate of a common σ, the experimental error, if the aumption of equal variance i true. The mean quare for treatment (MST): An independent etimate of σ, when the null hypothei i true. The F tet: If there are difference among treatment mean, there will be an additional ource of variation in the experiment due to treatment effect equal to r τ i /(t-). F = MST/MSE Expected MST MSE = σ / (t ) σ + r τ i The F-tet i enitive to the preence of the added component of variation due to treatment effect. In other word, ANOVA permit u to tet whether there are any nonzero treatment effect. 8

Example Inoculation of clover with Rhizobium train [ST&D Table 7.] Treatment 3DOK 3DOK5 3DOK4 3DOK7 3DOK3 Compoite Rep 9.4 7.7 7.0 0.7 4.3 7.3 Rep 3.6 4.8 9.4.0 4.4 9.4 Rep 3 7.0 7.9 9. 0.5.8 9. Rep 4 3. 5..9 8.8.6 6.9 Rep 5 33.0 4.3 5.8 8.6 4. 0.8 Mean 8.8 4.0 4.6 9.9 3.3 8.7 Variance 33.64 4.7 6.94.8.04.56 t = 6, r = 5, overall mean = 9.88 The ANOVA table for thi experiment: Source df SS MS F Treatment 5 847.05 69.4 4.37** Error 4 8.93.79 Total 9 9.98. The mean quare error (MSE =.79) i jut the pooled variance or the average of variance within each treatment (i.e. MSE = Σ i / t).. The F value (4.37) indicate that the variation among treatment i over 4 time larger than the mean variation within treatment. 4.37 > F crit = F (5,4),0.05 =.6, o we reject H 0 9

Expected mean quare and F tet EMS: Algebraic expreion which pecify the underlying model parameter etimated by the calculated mean quare and which are ued to determine the appropriate error term for F tet. EMS table for thi one-way (CRD) claification experiment, featuring t treatment and r replication: Source df MS EMS Trtmt t- MST σ ε + r Error t(r-) MSE σ ε τ t The appropriate tet tatitic (F) i a ratio of mean quare that i choen uch that the expected value of the numerator differ from the expected value of the denominator only by the pecific factor being teted. 0

Teting the aumption aociated with ANOVA. Independence of error: Guaranteed by the random allocation of experimental unit.. Normal ditribution of error: Shapiro-Wilk tet. 3. Homogeneity of variance: Several method are available to tet the aumption that variance i the ame within each of the group defined by the independent factor. Levene' Tet: An ANOVA of the abolute value of the reidual. Y ij = µ + τ i + ε ij. The reidual (ε ij ) are the deviation from the treatment mean. Original data Reidual Treatment A B C Treatment A B C Rep 8 7 6 Rep Rep 9 5 3 Rep 0 - Rep 3 5 3 5 Rep 3 - - Rep 4 6 5 Rep 4-0 - Average 7 5 4 Average 0 0 0 Abolute Value of Reidual Treatment A B C Rep Rep 0 Rep 3 Rep 4 0 Average.5.5

Advantage of the CRD. Simple deign. Can eaily accommodate unequal replication per treatment 3. Lo of information due to miing data i mall 4. Maximum d.f. for etimating the experimental error 5. Can accommodate unequal variance, uing a Welch' variance-weighted ANOVA The diadvantage The experimental error include all the variation in the ytem except for the component due excluively to the treatment. Power The power of a tet i the probability of detecting a nonzero treatment effect. To calculate the power of the F tet in an ANOVA, ue Pearon and Hartley' power function chart (953, Biometrika 38:-30). To begin, calculate φ: φ = r MSE τ t i Thing to notice:. More replication lead to higher φ (and higher power).. Le error in the model (MSE) lead to higher φ. 3. Larger treatment effect (τ i, our "detection ditance") lead to higher φ.

Example: Suppoe an experiment ha t = 6 treatment with r = replication each. Given the MSE and the required α = 5%, you calculate φ =.75. φ = r MSE τ t i To find the power aociated with thi φ, ue Chart v = t- = 5 and the et of curve correponding to α = 5%. Select curve v = t(r-) = 6. The height of thi curve correponding to the abcia of φ =.75 i the power of the tet. In thi cae, the power i lightly greater than 0.55. 3

Sample ize To calculate the number of replication for a given α and deired power: ) Specify the contant ) Start with an arbitrary r to compute φ 3) Ue the appropriate chart to find the power 4) Iterate the proce until a minimum r value i found which atifie the required power for a given α level. We can implify the general power formula if we aume all τ i are zero except the two extreme treatment effect (let' call them τ K and τ L, o that d = µ K - µ L : φ = d * r t * MSE Example: Suppoe that 6 treatment will be involved in a tudy and the anticipated difference between the extreme mean i 5 unit. What i the required ample ize o that thi difference will be detected at α = % and power = 90%, knowing that σ =? (note, t = 6, α = 0.0, β = 0.0, d = 5, and MSE = ). φ = d * r t * MSE r df φ (-β) for α=% 6(-)= 6.77 0. 3 6(3-)=.7 0.7 4 6(4-)= 8.50 0.93 Thu 4 replication are required for each treatment to atify the required condition. 4