1 One-way Analysis of Variance
|
|
- Corey Edgar Pierce
- 5 years ago
- Views:
Transcription
1 1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment T i. A mathematical model the relates the expected response to the effectiveness of the treatment administered is j = 1,,..., q; i = 1,,..., p, whrere F : Y ij = t i + ɛ (i)j t i = mean influence of ith treatment i = 1,,..., p ɛ (i)j = random error ( this is in fact a linear model) We make the further assumptiom that the ɛ (i)j s are independent N(0, σ ) random variables. The average response Ȳi+ = 1 q q j=1 Y ij from the individuals which received the ith treatment is a good estimator of the mean influence t i of the ith treatment T i (it is in fact the least squares estimator of t i ). This follows from the fact that Ȳi+ N(t i, σ ). Consequently the variation between the q Ȳ i+ s i.e. p (Ȳi+ Ȳ++) where Ȳ++ = 1 p p Ȳi+ = 1 q N j=1 Y ij, N = pq, a variation with (p 1) degrees of freedom, will be entirely due to randomness if the treatments are identical i.e. if t 1 = t =... = t p. In particular 1 p 1 (Ȳi+ Ȳ++) estimates σ if the treatments are identical i.e. if t q 1 = t =... = t p or more specifically SS T r p 1 = q (Ȳi+ p 1 Ȳ++) estimates σ if t 1 = t =... = t p. However, if the treatments are not identical must measure (random variability per source + treatment variability per SS T r p 1 1
2 source). Thus SS T r p 1 SS T r p 1 estimates σ if t 1 = t =... = t p (1) estimates σ + q p 1 (t i t) if t 1 t... t p () where t = 1 p p t i. If, for a fixed i, we now consider the responses Y i1, Y i,..., Y iq from the individuals which received the ith treatment only, any differences or variability between them will be exclusively due to error (randomness). Thus q (Y ij Ȳi+) i = 1,,..., p j=1 is part of the pure random variation with (q 1) degrees of freedom for each i = 1,,..., p, irrespective of whether treatments are the same or not. Thus q SS R = (Y ij Ȳi+) j=1 is variation due to the random error unaffected by the presence of treatment differences and has p(q 1) = N p degrees of freedom. Thus SS R N p estimates σ (3) SS R is in fact the Residual sum of squares of the linear model we have assumed, and therefore SS R is the least squares estimator of N p σ. From (1.3),(1.4) and (1.5) we see that E( SS { T r σ p 1 ) = if t 1 = t =... = t p σ + p (t i t) (4) if t 1 t... t p q p 1 and E( SS R N p ) = σ (5) From (3) - (7) we see that if one wishes to test the hypothesis of no differences between the treatments i.e H o : t 1 = t =... = t p (= t say) against the alternative that at least one of the treatments differs from the rest i.e H 1 : t 1 t... t p the appropriate test procedure at α-level of significance is
3 if F = SS T r/(p 1) SS R /(N p) accept H o if F > k accept H 1 where k is determined by the level of significance and will of course depend on the distribution of the statistic F under the null hypothesis. From our knowledge of basic statistics we know that SS T r σ χ p 1 when H o is true and SS R σ χ N p. (Recall from your first elementary statistics course that when sampling from the Normal distribution then the sample variance is an unbiased estimator of the population variance and that (sample size - 1). sample variance population variance is chi-squared distributed with degrees of freedom equal to (sample size - 1)) Since further, SS T r is a function of the l.s.e. of the t i s which in turn are independent of SS R, the models residual sum of squares (see your notes on Linear Models) it follows that F F p 1,N p when H o is true. Consequently k = F p 1,N p;α and the appropriate test procedure is: Note that since if F = SS T r/(p 1) SS R /(N p) p 1,N p;α do not reject H o if F > F p 1,N p;α reject H 0 q q (Y ij Ȳ++) = q (Ȳi+ Ȳ++) + (Y ij Ȳi+) (6) j=1 j=1 we have the identity SS T ot = SS T r + SS R (7) with SS T ot = p q j=1(y ij Ȳ++). It is customary to present the analysis of the data from such a model in the form of a table, the Analysis of Variance (ANOVA) table, and, when calculating the sum of squares, to express them first in terms of totals rather than averages. In particular, if n i T i+ = Y ij = qȳi+ = total response from all individuals treated with treatment T i j=1 3
4 and q T ++ = Y ij = NȲ++ = Total yield from all plots in the experiment j=1 then SS T r = q (Ȳi+ Ȳ++) = Ti+ q T ++ N (8) and q q SS R = (Y ij Ȳi+) = Y Ti+ ij (9) j=1 j=1 q = SS T ot SS T r (10) q q SS T ot = (Y ij Ȳ++) = Yij T ++ j=1 j=1 N (11) (1) The term T ++ N is called the correction factor (C.F). ANOVA table Source Sum of Squares d.f. Mean Squares F EMS Treatments T i+ q T ++ N = SS T r p 1 SS T r p 1 = MS T r MS T r MS R σ + q p (t i t) p 1 Residual SS T ot SS T r = SS R N p SS R N p = MS R σ Total i,j Y ij T ++ N = SS T ot N 1 If the number of individuals treated with each treatment is not the same for all treatments but varies from treatment to treatment with n i individuals treated with the ith treatment, i = 1,,..., p, p n i = N, then the above Analysis of Variance table is still valid with the slight modification that q is replaced by n i and if need be taken inside the summation with respect to i, as shown below. 4
5 ANOVA table Source Sum of Squares d.f. Mean Squares F EMS Treatments T i+ n i T ++ N = SS T r p 1 SS T r p 1 = MS T r MS p T r σ n i (t i t) + MS R p 1 Residual SS T ot SS T r = SS R N p SS R N p = MS R σ Total i,j Y ij T ++ N = SS T ot N 1 where now t = 1 N p n i t i Some practical considerations As the analysis of variance of the one-way classification model is based on the assumption of Normality of the observations constant variance of the observations these assumptions need to be checked..1 Checking Normality: In the case of the assumption Normality this can be checked graphically by producing separate box-plots and/or a histograms and probability plots of the response values for each individual treatment, or by producing a probability plot of the standardised residuals of the one-way classification model. Of course each of these probability plots can be backed by a formal statistical test for Normality such as the Darling-Anderson test. The conclusions of the Analysis of Variance of the one-way classification model are fairly robust to failure of the assumption of Normality as long as the true distribution of the 5
6 responses is not too skewed i.e. the result of the analysis of variance are not affected by small deviations from normality. The data will need to be fairly skewed before the conclusions of the analysis of variance are invalidated. Strongly skewed data will need to be transformed to near symmetric data before an analysis of variance is reliably applied. If you need to transform the data you can one method you can use is the Box-Cox method. However, since the response values come from p different populations the Minitab menu facility of finding the optimal Box-Cox transformation cannot be used. It only works in the case when the data come from the same population. You will need to resort using the boxcox function in R. This allows for the response variable following a linear model. Recall that before you use the boxcox function you need to import library(mass). Checking for constant variance In the case of the assumption of constant variance this needs to be checked more diligently since the analysis of variance of the one way classification model is not robust against failure of equality of variances. Bartlett s test of equality of variances To test for equality of variances you can use Bartlett s test provided you are fairly confident that the response values are Normally distribute i.e. that Y ij N(t i, σ i ) j = 1,,..., n i for each treatment i = 1,,..., p. Bartlett s test is essentially the generalised likelihood ratio test statistic for testing the hypothesis H 0 : σ 1 = σ = = σ p against H 1 : σ 1 σ = σ p using the data Y ij, j = 1,,..., n i from the N(t i, σ i ) distribution, i = 1,,..., p. The test statistic is X = Q c with Q = (N p) log s pooled (n i 1) log s i 6
7 where s i = 1 n i 1 n i (Y ij Y i+ ) j=1 and c = 1 + s pooled = 1 N p (n i 1)s i ( 1 p 1 3(p 1) n i 1 1 ) N p Here N = p n i. When H 0 is true X χ p 1 so that the null hypothesis is rejected at the 100α% level of significance if X > χ p 1,α. The Bartlett test is very sensitive to deviations from Normality and can give unreliable results if the response variable is not Normally distributed. It should therefore be avoided if there is any doubt about the Normality of the response variable. The alternative to Bartlett s test is the Levene test. Levene s test of equality of variances Levene s test for equality of variances of variances which is fairly robust to deviations from Normality uses as its working data the differences D ij = Y ij Ỹi j = 1,,..., n i i = 1,,..., p, where the Y ij s are the response variable values and Ỹi is the median of these values for the ith treatment. It can be shown that under the assumption of Normality of the response values, i.e. under the assumption that Y ij N(t i, σ i ), the mean value of the working data d ij is given by µ Di = E(D ij ) σ i j = 1,,..., n i i = 1,,..., p. Consequently to test the hypothesis H 0 : σ 1 = σ = = σ p against H 1 : σ 1 σ = σ p one tests the equivalent hypothesis H 0 : µ D1 = µ D = = µ Dp against H 1 : µ D1 µ D = µ Dp using the D ij s as the working data. This can be achieved by performing an one way analysis of variance on the D ij s which is exactly what Levene s test 7
8 does. Remark: If you have strong evidence that the response values are Normally distributed or near Normally distributed then the Bartlett test is the more powerful of the two i.e. it has the higher probability of correctly detecting differences between the variances. However the Levene test should be preferred if there doubts about the Normality of the data. To perform either of the two tests, when using Minitab use Stats ANOVA Test for equality of Variance and to perform the Levene test when using SPSS, choose Analyze Compare means One-way ANOVA Options Homogeneity of Variance test If the test for homogeneity of variance indicate that the variance is not constant you will need to transform the response variable to stabilize the variance (make it have constant variance). Usually lack of normality and lack of constant variance go hand in hand and the transformation that rectify Normality may also stabilize the variance. Thus a Box-Cox transformation may stabilize the variance. Alternatively you may exploit the fact that the data are grouped into samples from different treatments as follows: Suppose that the variance is not the same for all treatments i.e. E(Y ij ) = t i, V ar(y ij ) = σi j = 1,,..., n i for i = 1,,..., p and that for some λ i.e. σ i t λ i (13) log σ i = α + λ log t i (14) i = 1,,..., p. This can be verified by plotting s i against Ȳi i = 1,,..., p, where s i = 1 ni n i 1 j=1(y ij Ȳi+) and Ȳi+ = 1 ni n i j=1 Y ij. If the plotted points lie about a straight line that provides visual evidence that the assumption (13) is valid. In fact the slope of the best fitting line through the plotted points provides an estimate ˆλ of the power λ in (13). The transformation of the responses Y ij that stabilizes the variance can be found using the delta 8
9 method. In particular if Z = g(y ) is the required transformation then to a first order approximation g(y ) g(t) + (Y t)g (t) where t is the mean of Y. Taking the variance of both sides in the above expansion we have V ar(g(y )) = V ar(y )[g (t)] where g (.) denotes the first derivative of g(.). Thus if g(.) is the right transformation for stabilizing the variance then V ar(g(y )) =constant and g (t) constant s.d.(y ) = constant t λ i.e. 1 g(t) dt = t1 λ tλ Thus the suggested variance stabilizing transformation can be taken as Y 1 ˆλ with ˆλ the slope of the best fitting line to the points of the scatterplot of log s i against Ȳi+. 9
10 3 Pairwise Comparisons of treatments in a one way analysis of variance model. Consider the Pulse data on 68 workers after they performed one of six different tasks. Task Total T i Ti n i Ti+/n i Ȳ i CF = = Y = ANOVA table Source Sum of Squares df MS F p Tasks = Residuals = Total =
11 The analysis indicates that there is significant evidence of differences in the effects of the tasks on the pulse rate. One would want to know which pairs of treatments (tasks) differ. 3.1 The Fisher s Least Significant Differences (LSD) procedure. Under this procedure one performms 1 p(p 1) different hypothesis tests, each at α-level of significance in order to test the 1 p(p 1) different hypotheses H o(i,j) : t i t j = 0 vs H 1(ij) : t i t j 0 for ALL i j. For the (i, j)th test, the test statistic is t (ij) = Ȳi+ Ȳj+ ˆσ 1 n i + 1 n j and if t (ij) > t ν;α/ we conclude that the ith and jth treatments differ. Here ˆσ = MS R and ν = degrees of freedom of the Residual, both obtainable from the ANOVA table. Thus the (i, j)th pair of treatments are judged to differ if their sample averages Ȳi+ and Ȳj+ differ by more than LSD α = ˆσt ν;α/ 1 n i + 1 n j If of course all sample sizes are equal (to q say) then the α-level Least Significant Differences are the same for all samples and equal to LSD α = ˆσt ν;α/ q Example. For the Pulse data ˆσ = = 5.56 ν = 6 and t 6;0.05 =.00 Hence the LSD 0.05 s are 11
12 Sample Sizes (n i, n j ) (13,1) (13,11) (13,10) (1,11) (1,10) (1,1) (11,10) (10,10 LSD Pairwise comparisons using Fisher s LSD procedure Pair comparison Ȳ i+ Ȳj+ ( * = significant ) 1 vs = vs = vs = * 1 vs =.43 1 vs = vs = vs = * vs = vs =.65 3 vs =.00 3 vs = * 3 vs = 6.98 * 4 vs = * 4 vs = 9.18 * 5 vs = 0.68 It is customary to present the averages in ascending order and underscore the pairs which DONOT differ significantly, as shown below (6) (5) () (1) (3) (4) It must be realised that since we are performing ( ) a lot of tests each at p 5% level of significance we would expect 0.05 of these tests to show significant, even when there are no differences between the treatments. In fact the probability of finding at least one significant difference amongst these 1 p(p 1) tests when the treatments are equal in their effects, increases dramatically with p (= # of treatments) as shown in the table below. 1
13 Probability of at least one significant pair using the LSD 0.05 procedure p Probability It is therefore strongly recommended NOT to use the LSD α unless you first performed an F test for the hypothesis method H o : t 1 = t =... = t p vs H 1 : t 1 t... t p which produced significant results. Note that (Ȳi+ Ȳj+) ± LSD α constitutes a 100(1 α)% confidence interval for (t i t j ). 3. Tukey s Honest Significant Difference (HSD) To get round this difficulty, associated with the Fisher s LSD method, of having a considerable risk of finding significant differences in pairs of treatments when they do not exist, Tukey suggested a single test at α-level for testing whether all possible differences of pairs are non significant or not. His approach also provides simultaneous 100(1 α)% confidence intervals for all pair differences (t i t j ). Tukeys method assumes that the treatment samples are all of equal size (say q). From the theory of linear models we know that the statistics T i = Ȳi+ t i ˆσ q i = 1, 1,..., p where ˆσ = MS R, have distribution independent of t 1, t,..., t p and consequently the distribution of max T i min T i = D 1 i p 1 i p is independent of the t i s. Tukey computed the distribution of D and found its critical values q p,ν;α defined by P r(d > q p,ν;α ) = α 13
14 Here ν = degrees of freedom of the residual. These critical values are tabulated and are called the Upper percentage points of the studentised range. We therefore have that i.e. P r(max T i min T i q p,ν;α ) = 1 α P r((ȳi+ Ȳj+) ˆσ q q p,ν;α t i t j (Ȳi+ Ȳj+)+ ˆσ q q p,ν;α for ALL i, j) = 1 α i.e. (Ȳi+ Ȳj+) ± ˆσ q q p,ν;α i, j = 1,,..., p (15) are 100(1 α)% simultaneous confidence intervals for all pair differences t i t j. Cosequently if max Ȳ i+ min Ȳ i+ 1 i p 1 i p ˆσ q q p,ν;α we reject the hypothesis H o : t i t j = 0 for all i, j, in favour of the alternative that for at least one pair the difference t i t j is non-zero. Any pair of treatments for which Ȳi+ Ȳj+ > ˆσ q q p,ν;α (16) are then declared to be significantly different. If the treatment samples are not of equal size, it is suggested that Tukeys method is adjusted so that, in the set of simultaneous confidence intervals, the confidence interval for the t i t j is replaced by (Ȳi+ Ȳj+) ± ˆσq p,ν;α min( n i, n j ) (17) and that the difference t i t j is declared to be significant if Ȳi+ Ȳj+ > ˆσq p,ν;α min( n i, n j ) Tukey s method is very conservative (very reluctant to find pairs significant) - in fact the Type I error is smaller than the nominal α - In consequence it has a rather low power of detecting true differences between pairs. 14
15 3.3 The Newman-Keuls test for pair comparisons This test attempts to reduce the conservatism of Tukey s HSD test by making the HSD value on the r.h.s of (1.36) dependent on the rank range of the treatment averages involved in the comparison. It is natural that when the treatment averages are ranked in ascending order of magnitude then the further apart the ranks of two such averages are, the bigger the difference in their values should be before we declare them as representing differing treatments. The Newman-Keuls test proceeds as follows: Order the treatment averages in ascending order of magnitude from smallest to largest, Ȳ(1), Ȳ(),..., Ȳ(p). Step 1. If the range Ȳ (p) Ȳ(1) ˆσ q p,ν;α q accept the hypothesis that all pair differences are zero and stop. If Ȳ (p) Ȳ(1) > conclude that t (p) t (1) and go to step Step. (i) If the range Ȳ (p 1) Ȳ(1) ˆσ q q p,ν;α ˆσ q q p 1,ν;α accept the hypothesis that all the pair differences not involving t (p) are zero (ii) If the range Ȳ (p) Ȳ() ˆσ q p 1,ν;α q accept the hypothesis that all pair differences not involving t (1) are zero. If both (i) and (ii) are true then stop; otherwise go to step 3. Step 3 If (i) of step is not true compare the range of the set (Ȳ(1), Ȳ(),..., Ȳ(p )) and the range of the set (Ȳ(), Ȳ(3),..., Ȳ(p 1)) with ˆσ q q p,ν;α. If (ii) in step is not true compare the range of the set (Ȳ(), Ȳ(3),..., Ȳ(p 1)) and the range of set (Ȳ(3), Ȳ(4),..., Ȳ(p)) with ˆσ q q p,ν;α. One continues to examine the ranges of smaller subsets as long as the previous subset has a significant range. Each time a range proves non-significant 15
16 the averages involved in it are included in a single group and no subgroups of it can later be deemed have significant ranges, so no further range tests are conducted on subgroups of this group. The Newman-Keuls requires treatment samples of equal size but it can be used with samples of unequal sizes if q is replaced by the harmonic mean q of the n i s p q = ( 1 n n n p ) The Newman-Keuls procedure, although less conservative that Tukey s procedure, it does, none the less, contain a cerain amount of conservatism. 3.4 Duncan s Multiple Range test Duncan s Multiple range test is similar to the Newman-Keuls test except that it attempts to remove some of the conservatism by changing in each Step the nominal significance level α of the studentised range coefficient q r,ν,αr i.e. α r changes with r (α r = 1 (1 α) r 1 ). The Duncan test, therefore has its own table of coefficients Q α (r, ν) of the studentised range for an overall nominal α-level of significance. Like the Newman-Keuls test, it can be used with unequal sample sizes if q is replaced by the harmonic means q of the sample sizes n i. Further Reading: See D.C.Montgomery Design and Analysis of Experiments pages Example. Consider the pulse rate data obtained on 68 workers after performing one of six tasks. q = 6( ) 1 = 11.3 ν = 6 σ = 5.56 r Q 0.05 (r, 6) Q ˆσ q 0.05 (r, 6)
17 Table of differences Treatment * 8.500* 6.917* 6.077* * 6.300* [3.877] (.43) (0.840) (.90) (1.583) 5 (1.319) Task Average Although the Duncan Multiple range test produced the same results as the Fisher LSD test on this occasion, this is not the case in general. Reference Douglas Montgomery Design and analysis of Experiments John Wiley 17
The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.
One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More information9 One-Way Analysis of Variance
9 One-Way Analysis of Variance SW Chapter 11 - all sections except 6. The one-way analysis of variance (ANOVA) is a generalization of the two sample t test to k 2 groups. Assume that the populations of
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More informationAnalysis of Variance
Analysis of Variance Blood coagulation time T avg A 62 60 63 59 61 B 63 67 71 64 65 66 66 C 68 66 71 67 68 68 68 D 56 62 60 61 63 64 63 59 61 64 Blood coagulation time A B C D Combined 56 57 58 59 60 61
More informationLec 1: An Introduction to ANOVA
Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to
More informationIntroduction. Chapter 8
Chapter 8 Introduction In general, a researcher wants to compare one treatment against another. The analysis of variance (ANOVA) is a general test for comparing treatment means. When the null hypothesis
More informationUnit 12: Analysis of Single Factor Experiments
Unit 12: Analysis of Single Factor Experiments Statistics 571: Statistical Methods Ramón V. León 7/16/2004 Unit 12 - Stat 571 - Ramón V. León 1 Introduction Chapter 8: How to compare two treatments. Chapter
More informationhttp://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationSTAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)
STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationLec 3: Model Adequacy Checking
November 16, 2011 Model validation Model validation is a very important step in the model building procedure. (one of the most overlooked) A high R 2 value does not guarantee that the model fits the data
More informationDESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya
DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample
More informationGroup comparison test for independent samples
Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationDiagnostics can identify two possible areas of failure of assumptions when fitting linear models.
1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important
More informationIntroduction to Analysis of Variance (ANOVA) Part 2
Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable:
More informationSolutions to Final STAT 421, Fall 2008
Solutions to Final STAT 421, Fall 2008 Fritz Scholz 1. (8) Two treatments A and B were randomly assigned to 8 subjects (4 subjects to each treatment) with the following responses: 0, 1, 3, 6 and 5, 7,
More informationW&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant
W&M CSCI 688: Design of Experiments Homework 2 Megan Rose Bryant September 25, 201 3.5 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically.
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationStat 427/527: Advanced Data Analysis I
Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample
More informationMultiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions
Introduction to Analysis of Variance 1 Experiments with More than 2 Conditions Often the research that psychologists perform has more conditions than just the control and experimental conditions You might
More informationDesign & Analysis of Experiments 7E 2009 Montgomery
1 What If There Are More Than Two Factor Levels? The t-test does not directly apply ppy There are lots of practical situations where there are either more than two levels of interest, or there are several
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationA posteriori multiple comparison tests
A posteriori multiple comparison tests 11/15/16 1 Recall the Lakes experiment Source of variation SS DF MS F P Lakes 58.000 2 29.400 8.243 0.006 Error 42.800 12 3.567 Total 101.600 14 The ANOVA tells us
More informationANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment
More informationSTA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03
STA60/03//07 Tutorial letter 03//07 Applied Statistics II STA60 Semester Department of Statistics Solutions to Assignment 03 Define tomorrow. university of south africa QUESTION (a) (i) The normal quantile
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationANOVA Multiple Comparisons
ANOVA Multiple Comparisons Multiple comparisons When we carry out an ANOVA on k treatments, we test H 0 : µ 1 = =µ k versus H a : H 0 is false Assume we reject the null hypothesis, i.e. we have some evidence
More information13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA
More informationMultiple Comparisons
Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when
More informationAnalysis of variance (ANOVA) Comparing the means of more than two groups
Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments
More informationSEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics
SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS
More informationI i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.
Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationDisadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means
Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationQUEEN MARY, UNIVERSITY OF LONDON
QUEEN MARY, UNIVERSITY OF LONDON MTH634 Statistical Modelling II Solutions to Exercise Sheet 4 Octobe07. We can write (y i. y.. ) (yi. y i.y.. +y.. ) yi. y.. S T. ( Ti T i G n Ti G n y i. +y.. ) G n T
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More information4.1. Introduction: Comparing Means
4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly
More informationCuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths
Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their
More information13: Additional ANOVA Topics
13: Additional ANOVA Topics Post hoc comparisons Least squared difference The multiple comparisons problem Bonferroni ANOVA assumptions Assessing equal variance When assumptions are severely violated Kruskal-Wallis
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationRegression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences
Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA
More informationChap The McGraw-Hill Companies, Inc. All rights reserved.
11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationChapter 12. Analysis of variance
Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More information2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23
2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.
More informationThese are all actually contrasts (the coef sum to zero). What are these contrasts representing? What would make them large?
Lecture 12 Comparing treatment effects Orthogonal Contrasts What use are contrasts? Recall the Cotton data In this case, the treatment levels have an ordering to them this is not always the case) Consider
More informationBattery Life. Factory
Statistics 354 (Fall 2018) Analysis of Variance: Comparing Several Means Remark. These notes are from an elementary statistics class and introduce the Analysis of Variance technique for comparing several
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationMultiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota
Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More information3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is
Stat 501 Solutions and Comments on Exam 1 Spring 005-4 0-4 1. (a) (5 points) Y ~ N, -1-4 34 (b) (5 points) X (X,X ) = (5,8) ~ N ( 11.5, 0.9375 ) 3 1 (c) (10 points, for each part) (i), (ii), and (v) are
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationOutline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013
Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent
More informationStatistics for EES Factorial analysis of variance
Statistics for EES Factorial analysis of variance Dirk Metzler June 12, 2015 Contents 1 ANOVA and F -Test 1 2 Pairwise comparisons and multiple testing 6 3 Non-parametric: The Kruskal-Wallis Test 9 1 ANOVA
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationMultiple comparisons The problem with the one-pair-at-a-time approach is its error rate.
Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate. Each confidence interval has a 95% probability of making a correct statement, and hence a 5% probability of making
More informationDiagnostics and Remedial Measures: An Overview
Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.
More informationLecture 4. Checking Model Adequacy
Lecture 4. Checking Model Adequacy Montgomery: 3-4, 15-1.1 Page 1 Model Checking and Diagnostics Model Assumptions 1 Model is correct 2 Independent observations 3 Errors normally distributed 4 Constant
More informationThe Distribution of F
The Distribution of F It can be shown that F = SS Treat/(t 1) SS E /(N t) F t 1,N t,λ a noncentral F-distribution with t 1 and N t degrees of freedom and noncentrality parameter λ = t i=1 n i(µ i µ) 2
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationAnalysis of Variance (ANOVA)
Analysis of Variance ANOVA) Compare several means Radu Trîmbiţaş 1 Analysis of Variance for a One-Way Layout 1.1 One-way ANOVA Analysis of Variance for a One-Way Layout procedure for one-way layout Suppose
More informationMultiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600
Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More information2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018
Math 403 - P. & S. III - Dr. McLoughlin - 1 2018 2 Hand-out 2 Dr. M. P. M. M. M c Loughlin Revised 2018 3. Fundamentals 3.1. Preliminaries. Suppose we can produce a random sample of weights of 10 year-olds
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationLecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationAnalysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.
Analysis of Variance Read Chapter 14 and Sections 15.1-15.2 to review one-way ANOVA. Design of an experiment the process of planning an experiment to insure that an appropriate analysis is possible. Some
More informationMore about Single Factor Experiments
More about Single Factor Experiments 1 2 3 0 / 23 1 2 3 1 / 23 Parameter estimation Effect Model (1): Y ij = µ + A i + ɛ ij, Ji A i = 0 Estimation: µ + A i = y i. ˆµ = y..  i = y i. y.. Effect Modell
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationLinear Combinations of Group Means
Linear Combinations of Group Means Look at the handicap example on p. 150 of the text. proc means data=mth567.disability; class handicap; var score; proc sort data=mth567.disability; by handicap; proc
More informationOne-way ANOVA (Single-Factor CRD)
One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is
More information1 Introduction to One-way ANOVA
Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/minitab/
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationContrasts and Multiple Comparisons Supplement for Pages
Contrasts and Multiple Comparisons Supplement for Pages 302-323 Brian Habing University of South Carolina Last Updated: July 20, 2001 The F-test from the ANOVA table allows us to test the null hypothesis
More informationTA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM
STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationAnalysis of variance
Analysis of variance 1 Method If the null hypothesis is true, then the populations are the same: they are normal, and they have the same mean and the same variance. We will estimate the numerical value
More information