AN EMPIRICAL INVESTIGATION OF TUKEY 1 S HONESTLY SIGNIFICANT DIFFERENCE TEST WITH VARIANCE HETEROGENEITY AND EQUAL SAMPLE SIZES,
|
|
- Lorin Lawrence
- 6 years ago
- Views:
Transcription
1 37<? A/8/i A/o, /S3 AN EMPIRICAL INVESTIGATION OF TUKEY 1 S HONESTLY SIGNIFICANT DIFFERENCE TEST WITH VARIANCE HETEROGENEITY AND EQUAL SAMPLE SIZES, UTILIZING BOX'S COEFFICIENT OF VARIANCE VARIATION DISSERTATION Presented to the Graduate Council of the North Texas State University in Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY By Michael W. Strozeski, B.S., M.Ed. Denton, Texas May, 1980
2 1980 MICHAEL WAYNE STROZESKI ALL RIGHTS RESERVED
3 Strozeski, Michael Wayne, An Empirical Investigation of Tukey's Honestly Significant Difference Test with Variance Heterogeneity and Equal Sample Sizes, Utilizing Box's Coefficient of Variance Variation. Doctor of Philosophy (Educational Research), May, 1980, 1.45 pp., 50 tables, bibliography, 50 titles. This study sought to determine boundary conditions for robustness of the Tukey HSD statistic when the assumptions of homogeneity of variancewereviolated. Box's coefficient of variance variation, C, was utilized to index the degree of variance heterogeneity. Selected numbers of comparison groups and equal sample sizes were evaluated. Tukey's HSD statistic was declared robust if the actual significance level fell within the 95 per cent confidence limits around the corresponding nominal significance level. A Monte Carlo computer simulation technique was employed to generate data under controlled violation of the homogeneity of variance assumption. For each sample size and number of treatment groups condition, an analysis of variance F-test was computed, and Tukey's multiple comparison technique was calculated. This procedure was repeated 4,000 times; the actual level of significance was determined and compared to the nominal significance level of The index of variance variation was systematically adjusted, and this procedure
4 was repeated until the C value was reached, such that any increase in its value would produce an FWI error rate that exceeded the upper limit of the 95 per cent confidence interval about the 0.05 level of significance, thereby establishing a boundary for C. On the basis of the synthesis and analysis of the generated data, the following conclusions were drawn. First, the Tukey HSD statistic was found to be generally robust when the violations of homogeneity of varianceswereof small magni- tude. In all cases, however, as the value of C was increased from zero, a point was reached at which the Tukey HSD statistic was no longer robust and too many FWI errors were produced. Second, when either the violation of the homogeneity of variance assumption was more pronounced (C values were larger) or the number of treatment groups increased, discrepancies between the actual and nominal significance levels occurred. With larger numbers of treatment groups, Tukey's HSD was less robust. The boundary value for C decreased as the number of treatment groups increased. As C values were increased, FWI errors increased. Third, Tukey's HSD was found to be more robust with larger sample sizes. This trend was generally supported in all of the sample size groups proposed for this study. This conclusion was further supported by the addition of fortyeight and seventy-two sample size groups to the five treatment groups experiment. In both of these additional sample size
5 o cases, the C value was greatly increased by the larger sample sizes. A fourth and final conclusion was reached. When the two additional sample size cases were added to investigate the large sample sizes, the Tukey test was found to be conservative when C was set at zero. The actual significance level fell below the lower limit of the 95 per cent confidence interval around the 0.05 nominal significance level. Apparently, large sample sizes decrease the likelihood of an FWI error but may increase the likelihood of Type II error.
6 TABLE OF CONTENTS LIST OF TABLES... Page Chapter I. INTRODUCTION Statement of the Problem Purpose of the Study Hypothesis Mathematical Model of Tukey's HSD Statistic Definition of Terms Delimitations Chapter Bibliography II. SURVEY OF RELATED RESEARCH 9 Chapter Bibliography III. PROCEDURE FOR DATA COLLECTION 3 Procedures for Producing Data Model Validation Statistical Tests of Pseudorandom Numbers Experiment Simulation Procedure Summary of Procedures Chapter Bibliography IV. ANALYSIS OF DATA AND FINDINGS 45 Part 1. k = Three Treatment Groups Part. k = Four Treatment Groups Part 3. k = Five Treatment Groups Part 4. k = Six Treatment Groups Part 5. k = Seven Treatment Groups Part 6. Larger Samples V. SUMMARY, CONCLUSIONS, IMPLICATIONS, AND RECOMMENDATIONS 81 Summary Conclusions Implications Recommendations Chapter Bibliography iii
7 IV Page APPENDIX A 89 APPENDIX B 95 APPENDIX C 116 APPENDIX D 13 APPENDIX E 15 APPENDIX F 137 BIBLIOGRAPHY 139
8 LIST OF TABLES Table Page 1. Actual Levels of Significance Under Conditions of Non-Violation of the Assumptions Underlying the Use of Tukey's HSD Statistic Number of Treatment Groups and Size of Sample Per Experiment Condition Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=3 Groups with n-3 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k =3 Groups with n=6 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=3 Groups with n=1 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=3 Groups with n=14 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in v
9 VI Table Page Simulated Experiments on the Tukey HSD Test for k=4 Groups with n=3 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=4 Groups with n=6 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for fe=4 Groups with n=1 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k-4 Groups with n=4 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=5 Groups with n=3 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=5 Groups with n=6 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test
10 Vll Table Page for k=5 Groups with n=1 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k-5 Groups with n-4 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=6 Groups with n=3 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=6 Groups with n-6 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k-6 Groups with n=j Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k-6 Groups with n=4 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test
11 Vlll Table Page for k=7 Groups with n=3 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=7 Groups with n=6 Observations in Each Group for Varying Degrees of Variance Variation, C 7 1. Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=7 Groups with n=1 Observations in Each Group for Varying Degrees of Variance Variation, C 74. Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test forfc.=7groups with n=4 Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=5 Groups with n=4s Observations in Each Group for Varying Degrees of Variance Variation, C Comparison of Actual Significance Levels for Familywise Type I Error Rates to the Nominal 0.05 Significance Level in Simulated Experiments on the Tukey HSD Test for k=s Groups with n=7 Observations in Each Group for Varying Degrees of Variance Variation, C Degree of Variance Variation, C, Above which the Actual Significance Level Significantly Differed from the Nominal 0.05 Significance Level 80
12 IX Table Page 6. Ninety-Five Per Cent Confidence Limits for a Proportion Corresponding to a Nominal Significance Level Obtained Versus Expected Means and Variances for k=3 Treatment Groups with n=3 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k-3 Treatment Groups with n=6 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=3 Treatment Groups with n=1 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k-3 Treatment Groups with n=4 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=4 Treatment Groups with n=3 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments T01 3. Obtained Versus Expected Means and Variances for k=4 Treatment Groups with n-6 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=4 Treatment Groups with n=1 Observations in Each Group and the Expected Value of a. for Each Computer Run of 4,000 Experiments 10
13 X Table Page 34. Obtained Versus Expected Means and Variances for k=4 Treatment Groups with n=4 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=5 Treatment Groups with n=3 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=5 Treatment Groups with n=6 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k-5 Treatment Groups with n=1 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=5 Treatment Groups with n=4 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=6 Treatment Groups with n=3 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments 10B 40. Obtained Versus Expected Means and Variances for k=6 Treatment Groups with n=6 Observations in Each Group and the Expected Value of a. for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=6 Treatment Groups with n-7 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments 115
14 XI Table Page 4. Obtained Versus Expected Means and Variances for k=6 Treatment Groups with n=4 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Ill 43. Obtained Versus Expected Means and Variances for k=7 Treatment Groups with n=3 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=7 Treatment Groups with n=6 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k = 7 Treatment Groups with n=1 Observations in Each Group and the Expected Value of a for Each Computer Run of 4,000 Experiments Obtained Versus Expected Means and Variances for k=7 Treatment Groups with n=4 Observations in Each Group and the Expected Value of a.for Each Computer Run of 4,000 Experiments Ten Per Cent Intervals for the Normal Distribution with a Mean of Zero and a Standard Deviation of One Expected and Observed Frequencies of One Hundred Numbers in Ten Per Cent Intervals Corresponding to a Normal Distribution A Summary of Variance Heterogeneity as Indexed by C and the Corresponding Ratio of Variances which Resulted in Familywise Type I Error Rates in Excess of the Nominal 0.05 Significance Level Critical F max Values for Corresponding Degrees of Variance Variation 138
15 CHAPTER INTRODUCTION A very frequent concern of educational researchers is determining whether or not k group means differ from one another. The analysis of variance (ANOVA) is often used to test whether or not sample means are indicative of experimental treatment effects or of merely chance variation. Experimenters usually follow a significant F-test in analysis of variance with a multiple comparison statistic when k is greater than two, because ANOVA indicates only the presence of overall treatment effects. Multiple comparison statistics enable the researcher to locate the specific mean differences which have caused the ANOVA F-test to be significant. Tukey's multiple comparison test is a frequently-cited procedure when the researcher's multiple comparison hypotheses are for pairwise differences (Games, 1971; Keselman and Toothaker, 1974; Marascuilo, 1971). Tukey's multiple comparison test specifies the familywise Type I error rate at a for a family of tests on all possible pairs of means allowing the error rate per comparison to decrease as k increases. According to Petrinovich and Hardyck (1969), little has been published on the characteristics and properties of Tukey's Honestly Significant
16 Difference (HSD) procedure. Petrinovich and Hardyck provided more information about Tukey's HSD procedure, but Games (1971) indicated that they provided only limited evidence and that further study is needed. Glass, Peckham, and Sanders (197) indicated that the role of unequal variances in combination with equal sample sizes appears to have boundary conditions which have not been sufficiently probed. Agreeing with Glass, Peckham, and Sanders, authors Rogan, Keselman, and Breen (1977) state that data from their investigations indicate that the degree of variance heterogeneity may play some part in determining those boundary conditions. This study was designed to provide further evidence about the robustness of Tukey's HSD procedure. Whenever populations differ with respect to variances and the means are equal, statistical tests designed to determine the mean difference can be influenced by the difference in variances. The statistical test may yield more or fewer significant results by chance than would be expected. Evidence for the results being influenced by the unequal variances is obtained when significant departures from expected results are found based on the familywise Type I (FWI) error rate at a when means are equal. Any study of robustness of a statistical procedure involves creating differences in parameters other than the parameter for which the statistical procedure was designed
17 to test a difference. The variable manipulated in this study was the population variance. This research was performed to determine the robustness of Tukey's HSD procedure in the presence of variance heterogeneity. Variance heterogeneity was indexed by use of Box's (1954) coefficient of variance variation. Statement of the Problem The problem of this study was the effect of violating the assumptions of homogeneity of variance with equal sample sizes upon Tukey's Honestly Significant Difference (HSD) multiple comparison procedure, utilizing Box's coefficient to index the degree of variance variation. Purpose of the Study The purpose of this study was to empirically evaluate the effects of varying degrees of heterogeneity and equal sample sizes when the degree of variance variation was within a range of 0.00 tofc-1,where k equals the number of treatment groups. Hypothesis The following hypothesis was formulated to carry out the purpose of this study [C = coefficient of variance variation which indexes the degree of heterogeneity; n = sample size; k = number of samples]:
18 Using Tukey's (HSD) procedure, actual significance levels will not differ significantly from nominal significance levels at the 0.05 level of significance when C has a value from 0.00 to fe-1 for experimental conditions of n = 3, 6, 1, and 4, and fe = 3, 4, 5, 6, and 7. Mathematical Model of Tukey's HSD Statistic Tukey's HSD statistic was mathematically defined by Kirk (1968, p. 88) as HSD = q M I,,nx 1 /MS a,v / error ' (1) /J n where HSD = the value to be exceeded for a comparison involving two means to the declared significant? q n a,v = the value determined by entering a table for the percentage points of the studentized range with v degrees of freedom corresponding to the MS error term degrees, of freedom, a level of significance, and the number of treatment groups in the experiment or range of levels in the MS error experiment; = an estimate taken from the one-way analysis of variance mean square within group of the experiment; n = the sample size of each group.
19 If the difference between two groups exceeded the HSD value, then the results were declared significant at the given a level. Definition of Terms Actual Significance Level. The percentage of computed statistical values which exceed the tabled value of the statistic in an empirical investigation. Coefficient of Variance Variation, C. The degree of heterogeneity present in an experimental paradigm as indexed by a coefficient of variance variation, C, in the formula 1 5: k (a t ~ CT ) () t=l - (ar Familywise Error Rate. An error rate that is the ratio of the number of families with at least one statement (comparison) falsely declared significant to the total number of families. Monte Carlo Simulation. A procedure in which random samples are drawn from populations having specified parameters, and then a given statistic is calculated. Nominal Significance Level. The percentage of computed statistical values which exceed the tabled value of the statistic for the theoretical distribution.
20 Pseudorandom Numbers. Pseudorandom numbers are "pseudo" since once the generating sequence is begun, each number is precisely determinedbythe preceding number. Pseudorandom numbers have the basic properties of randomness which makes them quite usable in simulation studies (Lehman and Bailey, 1968). Hereafter in this study, pseudorandom numbers are referred to as random numbers. Robust. When a violation of an assumption underlying a statistical model does not seriously affect the result, then that statistical model is said to be robust. Significant Difference between Nominal and Actual Significance Levels. An actual significance level which fails to fall within a 95 per cent confidence interval about the nominal significance level is said to be statistically different from the nominal significance level. Limitations This study was subject to experimental limitations due to experimental conditions simulated with the following conditions : 1. A selected number (3 to 7) of treatment groups was considered.. Selected equal sample sizes were employed, varying from three to twenty-four. 3. Degrees of variance heterogeneity were selected, ranging from 0.00 to a possible maximum of fe-1.
21 CHAPTER BIBLIOGRAPHY Box, G. E. P. Some theoremson quadratic forms applied in the study of variance problems. I. Effect of inequality of variances in the one-way classification. Annals of Mathematical Statistics, 1954, 5^, Games, Paul A. Multiple comparison of means. American Educational Research Journal, 1971, 8_{3), Glass, G. V., Peckham, P. D., and Sanders, J. R. Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 197, (3), Keselman, H. J., and Toothaker, L. E. Comparison of Tukey's t-method and Scheffe's s-method for various numbers of all possible differences of averages contrast under violation of assumptions. Educational and Psychological Measurement, 1974, 3, Kirk, Roger E. Experimenta1 design: procedures for the behavioral sciences. Belmont, California: Brooks/Cole Publishing Company, Lehman, R., and Bailey, D. E. Digital computing, fortran n and its applications to the behavioral sciences. New York: John Wiley and Sons, 1963.
22 Marascuilo, L. A. Statistical methods for behavioral science research. New York: McGraw-Hill, Petrinovich, L. R., and Hardyck, C. D. Error rates for multiple comparison methods: some evidence concerning the frequency of erroneous conclusions. Psychological Bulletin, 1961, 71, Rogan, J. C., Keselman, H. J., and Breen, L. J. Assumption violations and rates of type I error for the Tukey multiple comparison test: a review and empirical investigation via a coefficient of variance variation. Journal of Experimental Education, 1977, A6_(1), 0-5.
23 CHAPTER II SURVEY OF RELATED RESEARCH The effects of violating the assumptions underlying the fixed-effects analysis of variance (ANOVA) on Type I error rate have been of great concern to educational researchers and statisticians since before 1930 (Pearson, 199). For the most part, the major effects of violation of assumptions underlying ANOVA are now quite well known. Concern about whether or not ANOVA assumptions are satisfied is not unfounded. Assumptions of most mathematical models are almost always false to some extent. The important question to be asked is not whether these assumptions have been exactly met, but whether violations of these assumptions have had any serious effects on the probability statements that have been formulated based on the standard assumptions. Applied statistics in education and the social sciences experienced a largely unnecessary hegira to non-parametric statistics during the 1950s. Increasingly during the 1950s and early 1960s the fixed effects, normal theory ANOVA was replaced by such comparable nonparametric techniques as the Wilcoxon test, Mann-Whitney U-test, Kruskal-Wallis one-way ANOVA, and the Friedman two-way ANOVA for ranks
24 10 [Siegel, 1956]. The change to non-parametrics was unnecessary primarily because researchers asked, 'Are normal theory ANOVA assumptions met?' instead of 'How important are the inevitable violations of normal theory ANOVA assumptions?' (Glass, Peckham, and Sanders, 197, p. 37). The following assumptions were made for the simple oneway fixed effects model ANOVA in this study: 1. XL. = y + r. + e. y (3). e.. ~ NID(0,a ) (4) <-3 3. It. = 0 (5) 3 The first assumption was that of additivity. Any observation was taken to be the simple sum of three components. First, \i, the population mean; second, x., the effect of 3 treatment 3 on the dependent variable for all of the observations in group 3; and third, e.., the error of the ^c,jth *-5 observation. The second assumption was that the e^_y s have a normal distribution with a population mean of zero and a variance of a and that they were independent. According to Glass, Peckham, and Sanders (197), the third assumption need be of little concern; it is merely a consequence of choosing to express X- in three terms (y,x., 3 j e..) instead of two, for example, 3- = y + x- and e
25 11 Three different violations of assumption have been considered in the past: (a) non-normality, (b) different variances from different groups, and (c) non-independence. The thrust of this study was to investigate the (b) violation, i.e., heterogeneity of variances. Hsu (1938) was one of the first to obtain concise mathematical results in the study of the effects of heterogeneous variances. Hsu determined the actual significance level of a result tested at the 0.05 level for different values of the ratio of a to a in a two-tailed t test. Scheffe (1959) and 1 ~~ Pratt (1964) addressed the same problem. Box (1954) studied the effect on alpha level of heterogeneous variances in the one-way ANOVA. One example of Box's findings was that if three treatments were compared and n l =9, n =5 ' and n 3~ 1 ' and the P P ulation variances are in the ratio of 1:1:3, the probability of a Type I error was actually 0.17,when the experimenter would expect it to be Of particular interest for this study was that Box's results agreed quite closely with those of Hsu. When n's were equal the actual and the nominal significance levels agreed quite closely. Also of special interest for this study was the finding that with seven groups of n=3 (equal n's) and a variance ratio of 1:1:1:1:1:1:7, Box found an actual significance level of 0.1 when the nominal significance level of 0.05 was expected.
26 1 One of the most significant and comprehensive studies was made by Dee W. Norton at the State University of Iowa in 195 (Lindquist, 1956). From Norton's investigations, it appeared that marked heterogeneity of variance has a small but real effect on the form of the F-distribution. Kohr and Games (1974) indicated that the F-test was robust with regard to heterogeneity of variance with equal sample sizes but more susceptible to error when sample sizes are unequal. The F-test and the analysis of variance have been investigated (Atiqullah, 196; Norton, 195 [found in Lindquist]; Pearson, 1931; and Scheffe, 1959), with the conclusion that they have a high degree of robustness. The result of robustness for the analysis of variance has precipitated similar questions concerning assumptions underlying multiple comparison procedures. Hypotheses about mean differences from a set of k means (fe>) may provide a situation that requires the use of some multiple comparison technique. According to Kirk (1968), the analysis of variance is equivalent to a simultaneous test of the hypothesis that all possible comparisons among means are equal to zero.... If an over-all test of significance using an F-ratio is significant, an experimenter can be certain that some set of orthogonal comparisons contains at least one significant comparison among means.... It remains for an experimenter to carry out followup tests [multiple comparisons] to determine what has happened (p. 73).
27 13 One solution for determining the location of a significant difference was to use multiple t tests; but according to Games (1971), although this procedure has been found to be powerful, it allowed the familywise (FWI) error rate to increase as the number of t tests increased, sometimes resulting in an unacceptable error rate. In order to locate significant differences in means without producing high FWI rate, other multiple comparison procedures have been developed. Games (1971) reported twelve different multiple comparison procedures. Games discussed and compared the multiple t test, Scheffe's least significant difference test, Bonferroni's t statistic, Tukey's procedure, and Dunnett's test. Games also reviewed sequential multiple comparison techniques,including the Newman-Keuls test and Duncan's multiple range test. Evidence that multiple comparison procedures are controversial topics in statistics was presented by Petrinovich and Hardyck (1969) when they stated that Textbook authors at least in the area of psychological statistics have not been particularly helpful. Authors such as Edwards [1960], Federer [1955], Hays [1963], McNemar [195], and Winer [196] either offer no evaluation as to which method is preferable, or preface their remarks with a cautionary statement to the effect that mathematical statisticians are not entirely in agreement concerning the preferred
28 14 method. Similarly, disagreement exists as to when these methods may be used. Some discussions state that a significant F ratio over all conditions must be obtained before multiple comparison methods can be used; other discussions make no mention of such a requirement, or deny that it is necessary at all (p. 44). Hopkins and Chadbourn (1967) [found in Games, 1971, p.559] suggested that the overall F-test be routinely run first, then if it is found to be significant, a multiple comparison procedure should follow. According to them, this second stage should be the Bonferroni t procedure, the Newman-Keuls, the Tukey wholly significant difference test (WSD), or the Scheffe, depending on certain factors. According to Games (1971), There seems to be little point in applying the overall F-test prior to running C contrasts by procedures that set P(EI>0) alpha (method 3 and the Bonferroni t's). If the C contrast express the experimental interests directly, they are justified whether the overall F is significant or not and P(EI>0) is still controlled. The Newman-Keuls and WSD also control P(EI>0), so do not need a significant F to justify them (p. 560). Here, Games used the symbols P(EI>0) to represent the familywise risk of Type I error. The familywise rate was the risk of making one of more Type I errors in the entire set of contrasts that comprise a family.
29 15 Tukey's multiple comparison test has been a frequently cited procedure when the researcher's multiple comparison hypotheses are for pairwise differences (Games, 1971; Keselman and Toothaker, 1974; Marasculio, 1971). Evidence of interest in Tukey's HSD procedure has been presented in papers published in education, psychology, and statistics journals (Howell and Games, 1973a, 1973b; Keselman, Murray, and Rogan, 1976; Keselman, Toothaker, and Shooter, 1975; Petrinovich and Hardyck, 1969; Steel and Torrie, 1966). For the most part, these papers have investigated the effects of the violation of the assumptions under which the Tukey test was derived. The importance of these studies has been related to the validity of the use of the Tukey test in actual educational situations because these actual educational situations seldom, if ever, meet the assumptions under which the Tukey test was developed. Just as in the case of the ANOVA F-test, Tukey's HSD test was derived under the assumptions that the observations of each of the populations under study are independently and normally distributed with equal variances. Further, Tukey's HSD method was derived under the restriction that the variances of the sample means be equal; hence, each sample mean must be based on an equal number of observations. When the requirement of equal sample sizes cannot be met, several unequal n forms of the Tukey procedure have been suggested. Winer (196, p. 101) suggested that the estimated variance, S /n, should be replaced with the average of the
30 16 variances of the means when sample sizes do not differ a great deal. Steel and Torrie (1966, p. 114) suggested the use of the Kramer method with the Tukey test. Kramer's method only employed the sample sizes of the means actually involved in the simple contrast. Miller (1966, p. 48) suggested the use of an average or median value of the group sizes as an approximate value of n. Smith (1971) compared Kramer's method, Winer's harmonic mean, and Miller's unequal n forms of the Tukey test for unequal sample sizes under conditions of homogeneous population variances. Smith recommended the use of the Kramer method. Keselman, Murray, and Rogan (1976) reported that the Tukey test did not have to be restricted to comparisons having equal n 1 s. They recommended Kramer's unequal n procedure. Howell and Games (1973) investigated the robustness of the harmonic mean form of the Tukey test under conditions of unequal sample sizes coupled with various patterns of population variance heterogeneity. They found that when the smallest sample size was selected from the population with the smallest variance, and the largest sample size was selected from the population with the largest variance, the Tukey test was conservative, i.e., the empirical significance level was less than the nominal significance level. When the smallest sample size was sampled from the population with the largest variance, and the largest sample size was sampled from the population with the smallest variance, the Tukey test was found to be liberal; i.e., the empirical significance level was found to
31 17 be greater than the nominal significance level. Petrinovich and Hardyck (1969) and Keselman and Toothaker (1974) examined the robustness of the harmonic mean form of the Tukey test and reported results similar to Howell and Games (1973). Also, the Tukey test was found to be robust to conditions of non-normality. Ramseyer and Tcheng (1973) investigated three different multiple comparison procedures that make use of the studentized range statistic, q. The procedures they studied were the Tukey HSD test, the Newman-Keuls test, and the Duncan multiple range test. They studied the effect on the Type I error rate of assumption violations on these three procedures. In Ramseyer and Tcheng's investigation, homogeneity of variance was violated with variance ratios of (a) 1:1: [fe=3], (b) 1:1:4 [fe=3], (c) 1:1:1:: [fe=5], and (d) 1:1:1:4:4 [fe=5]. Normality was violated with populations that were positively and negatively exponentially skewed and rectangularly distributed. A combination of the violation of the normality assumption and the homogeneous variance assumption was also studied. Ramseyer and Tcheng concluded that q is robust to the violation of homogeneity of variance and normality. They also reported that violation of normality produced Type I error rates lower than nominal levels. Carmer and Swanson (1978) used computer simulation techniques to study the Type I and Type III error rates for ten pairwise multiple comparison procedures, including the Tukey
32 18 statistic. Their results indicated that Scheffe.'s test, Tukey's test, and Newman-Keuls' test were less appropriate than a restricted least-significant-difference (LSD) test, some Bayesian modifications of the LSD,and Duncan's multiple range test. Carmer and Swanson (1978) stated that the inferiority of Scheffe's test, Tukey's test, and Student'sNewman- Keuls' test was even more apparent with sets of ten and twenty treatments. This was, according to them, due to the critical values of these procedures being dependent on the number of treatments. Keselman, Toothaker, and Shooter (1975) studied the harmonic mean and the Kramer unequal n forms of the Tukey HSD statistic. In their study, unequal sample sizes and unequal variances were combined in varied patterns that included normal and skewed population shapes and population variances in the ratios of (a) 1:1:4:4, (b) 1:1:1:, (c) 1:.5:.5:4, and (d) 1::3:4. Their findings indicated a close agreement between the two unequal n forms of the Tukey statistic. Both methods were adversely affected when unequal sample sizes were combined with unequal variances in the way reported by Howell and Games (1973), Petrinovich and Hardyck (1969), and Keselman and Toothaker (1974). Keselman and Rogan (1978) investigated five modifications of Tukey's statistic and compared them with Scheffe's test in controlling Type I errors and sensitivity to unequal sample sizes, variance heterogeneity, and sampling from non-normal
33 19 populations. They utilized a coefficient of variance variation to index the degree of variance heterogeneity. All of their investigations used k=4 groups,and sample sizes varied from a low of sixteen to a high of eighty-nine. Keselman and Rogan reported that a Games and Howell (1976) modification of Tukey's test controlled the Type I error rate at or below the nominal level for all conditions they investigated. Keselman and Rogan selected values of 0.0, 0.40, 0.80, and 1.00 for values of C (Keselman and Rogan's index of variance variation was C, not C ) since they felt this selection of C values represented those likely to be encountered in actual research. Based on the results of their investigation, Keselman and Rogan recommended the Games and Howell modification of the Tukey multiple comparison test for pairwise comparisons of means. According to Winer (1971, p. 198), there are two popular versions of the Tukey multiple comparison procedure. Winer labeled the more popular of the two procedures as Tukey A. The Tukey A procedure has been frequently labeled as Tukey's Honestly Significant Difference test (Winer, 1971; Kirk, 1968; Games, 1971). Tukey A has been known as the T-Method (Glass and Stanley, 1970; Scheffe, 1959), and the WSD test (Games, 1971). Apparently, Games and Kirk do not agree that the WSD test and the HSD test are one and the same,because Kirk states "The WSD test merits consideration but is more complex than the HSD test" (1968, p. 90). Therefore, Kirk has indicated
34 0 that the HSD and the WSD are two different procedures. The form of the statistic utilized in this investigation is that found in Kirk (1968, p. 88): HSD = ^av v / /MS error /V n (6) HSD is the value that must be exceeded in order for a comparison involving two means to be declared significant. The value of a is determined by entering a table for the studentized ^av range distribution with v degrees of freedom that correspond terms degrees of freedom and a level of signifi- ^ to the MS error cance. Another factor that determines q is the number of treatment levels in the experiment. MS error -*- san est i mate taken from the one way analysis of variance mean square withingroup of the experiment. Group sample size is designated by n and the number of treatment levels is designated by k. Tukey's Honestly Significant Difference (HSD) test was designed to make all pairwise comparisons among means (Kirk, 1968). According to Winer (1971) in 1953, Tukey extended an approach originally suggested by Fisher to control FWI error rate. It was this procedure that has been called the HSD test. The basic assumptions of the HSD test are normality, homogeneity of variance, randomization, and equal sample sizes (Kirk, 1968, p. 88). Ryan (1959) introduced two general issues involving multiple comparisons. These were a versus a poitzfiyiofia,
35 1 comparisons and the concept of error rate. According to Ryan, an a pkloh.*. test is one in which "the experimenter states in advance all possible conclusions and the rules by which these conclusions will be drawn" (p. 38). A po&to.n-ioti'l tests are those which are suggested by data. These typesof tests have been known as data snooping or as post-mortem comparisons. Ryan indicated that there were several types of error rates, but Kirk (1968) has defined six kinds of error rates: (a) error rate per comparison, (b) error rate per hypothesis, (c) error rate per experiment, (d) error rate experimentwise, (e) error rate per family, and (f) error rate familywise. "It should be noted that the various error rates are all identical for an experiment involving a single comparison. The error rates become more divergent as the number of comparisons and hypotheses evaluated in an experiment are increased"(kirk, 1968, p. 83). The error rate conceptualized for the HSD test was "familywise." In the one-dimensional case, "per family" and "per experiment," and "familywise" and "experimentwise," are equivalent terms (Ryan, 1959). Therefore, in the one-way analysis of variance, Tukey's terms "family" and "familywise" took on the more simple definition of "experiment" and "experimentwise." According to Kirk (1968), error rate per experiment (i.e., family in the one-dimensional case) was defined as (p. 84) number of comparisons falsely declared significant total number of experiments.
36 Error rate experimentwise (familywise in the one-dimensional case) was defined as (p. 84) number of experiments with at least one statement falsely declared significant total number of experiments. Kirk concluded that... it should be observed that once an experimenter has specified an error rate and has decided on an appropriate conceptual unit for error rate, he can compute the corresponding rate for any other conceptual unit. Basically, the problem facing an experimenter is that of choosing, prior to the conduct of an experiment, a test statistic that provides the kind of protection desired (p. 86). Much research has been conducted on the robustness of the F-test and multiple comparison procedures under the violation of the assumptions of homogeneity of variance. For the most part, the research supports the theory that when sample sizes are equal, the F-test and multiple comparison procedures are robust. According to Box (1954), "It appears that if the groups are equal, moderate inequality of variance does not seriously affect the test" (p. 98). However, "moderate inequality of variance" was not specifically defined. Box's results under extreme conditions (k=7, n's equal, variance ratio = 1:1:1:1:1:1:7, and nominal alpha = 0.05, the empirical alpha = 0.1) indicated that the question has not been
37 3 fully investigated. Therefore, the focus of this study was to investigate the robustness of the Tukey HSD procedure under conditions of equal sample size and heterogeneous variances. In 197, Glass, Peckham>and Sanders stated the following: Whatever the cause, we find it significant to note that subsequent investigators have not extended Box's work in the direction of this curious finding. The conventional conclusion that heterogeneous variances are not important when Kt's are equal seems to have boundary conditions like all other conclusions in this area, and the boundary conditions may have not been sufficiently probed (p. 45). In 1977, Rogan, Keselman, and Breen reported: Of special interest was the finding that large degrees of variance heterogeneity produced liberal Type I error rates even in the presence of equal sample sizes. Although Box found serious distortions in the Type I error of the ANOVA F-test under similar conditions, this finding is contrary to the conventional conclusion that heterogeneous variances are not important when sample sizes are equal. The authors agree with Glass, Peckham, and Sanders in that this conclusion regarding the role of unequal variances in combination with equal sample sizes appears to have boundary conditions which have not been sufficiently probed. The data
38 4 from this investigation suggests that the degree of variance heterogeneity may play a role in determining these boundary conditions (p. 5). Box (1954) developed a method for the indexing of the degree of heterogeneity by a coefficient of variance variation symbolized by C, where 1/ fe C = -1 (N-fe) E v fe ( k ~ ) (7) _ fe and a = Zv^aj^ is the weighted mean of the fe variances, v fe v^ = n^-1 represents the degrees of freedom associated with each of the fe variances, fe represents the number of treatment groups, and N represents the total number of observations. Rogan, Keselman, and Breen (1977) demonstrated that very different ratios of unequal variances and unequal sample sizes may be identical with respect to their degree of heteroge- neity or the C value. For example, consider the two sets of sample sizes and variances presented on the following page.
39 5 Case A Case B n k 4, 3, 36, 40, 48, 60 6, 1, 14, 16, 4, 48.05,.,.35,.5, 16, ,.64,.64,.64,.64,.79 a ratios 1:4:7:10:1:6 1:1:1:1:1:4.35 o ro Both of the above cases involve very different ratios of unequal variance yet are similar with respect to their degree of heterogeneity as indexed by Box's coefficient of variance variation. Though different ratios of variances have been manipulated in other studies, the degree of variance heterogeneity may in some cases not have been varied. Also, a simpler form of the C equation for equal n's was derived by Box (1954): k C = 1 E (a -a ) k t=l t - (a \ ) () C is the variance of the variances divided by the square of the mean variance. If the variances range from a lower value a to an upper value ao (where a is a coefficient of a and
40 6 a>l), then the largest possible value for C is attained when fe-1 of the variances are equal to a and the remaining variance is equal to ao. In this case, C = (fc-1)(a-1) (8) (a-l+k). Values of C greater than one or at most two probably would be extremely rare in reality (Box, 1954). This study was limited to values of C less than fe-1. Tamhane (1979) used Box's coefficient of variation as a measure of unbalance in the values of var (x.) = T - = o^ /n. [i. = 1,..., fc). (9) Tamhane indicated that although Keselman and Rogan (1978) had used this index for measuring variance variation, he believed that t was a more relevant parameter in his study than was a. The purpose of this study was to further investigate the question regarding the effects of variance heterogeneity and equal sample sizes by utilizing Box's coefficient of variance variation to index heterogeneity in order to determine whether boundary conditions existed where the Tukey HSD procedure was no longer robust. Results of this investigation should provide researchers in the behavioral sciences with additional information regarding the proper use of the Tukey HSD statistic.
41 CHAPTER BIBLIOGRAPHY Atiqullah, M. The robustness of the covariance analysis of a one-way classification. Biometrika, 1964, 51_, Box, G. E. P. Some theoremson quadratic forms applied in the study of variance problems. I. Effect of inequality of variances in the one-way classification. Arinals of Mathematica1 Statistics, 1954, 5, Carmer, S. G., and Swanson, M. R. An evaluation of ten pairwise multiple comparison procedures by Monte Carlo methods. Journal of the American Statistical Association, 1978, 6, Games, P. A. Multiple comparisons of means. American Educational Research Journal, 1971, B_(3), Games, P. A. Inverse relation between the risks of type I and type II errors and suggestions for the unequal n case in multiple comparisons. Psychological Bulletin, 1971, 7_5 (), Games, P. A., and Howell, J. F. Pairwise multiple comparison procedures with unequal n's and/or variances: A Monte Carlo study. Journal of Educational Statistics, 1976, 1, Glass, G. V., Peckham, P. D., and Sanders, J. R. Consequences of failure to meet assumptions underlying the fixed 7
42 8 effects analysis of variance and covariance. Review of Educational Research, 197, (3), Glass, G. V., and Stanley, J. C. Statistical methods in education and psychology. Englewood Cliffs, N. J.: Prentice Hall, Howell, J. F., and Games, P. A. The effects of variance heterogeneity on simultaneous multiple comparison procedures with equal sample size. Paper presented at the American Educational Research Association Convention, February, 1973 (ERIC document ED ). (a) Howell, J. F., and Games, P. A. The robustness of the analysis of variance and the Tukey WSD test under various patterns of heterogeneous variances. Journal of Experimental Education, 1973b, 41(4), (b) Hsu, P. L. Contributions to the theory of students t-test as applied to the problem of two samples. Statistical Research Memoirs, II, 1938, 1-4. Keselman, H. J., Murray, R., and Rogan, J. Effect of very unequal group sizes on Tukey's multiple comparison test. Educational and Psychological Measurement, 1976, 36, Keselman, H. J., and Rogan, J. C. A comparison of the modified Tukey and Scheff^ methods of multiple comparisons for pairwise contrasts. Journal of the American Statistical Association, 1978, 73 (361), 47-5.
Presented to the Graduate Council of the. North Texas State University. in Partial. Fulfillment of the Requirements. For the Degree of.
AN EMPIRICAL INVESTIGATION OF TUKEY'S HONESTLY SIGNIFICANT DIFFERENCE TEST WITH VARIANCE HETEROGENEITY AND UNEQUAL SAMPLE SIZES, UTILIZING KRAMER'S PROCEDURE AND THE HARMONIC MEAN DISSERTATION Presented
More informationMultiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600
Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without
More informationA Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions
Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationMultiple Comparison Methods for Means
SIAM REVIEW Vol. 44, No. 2, pp. 259 278 c 2002 Society for Industrial and Applied Mathematics Multiple Comparison Methods for Means John A. Rafter Martha L. Abell James P. Braselton Abstract. Multiple
More informationDESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective
DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,
More informationChapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons
Explaining Psychological Statistics (2 nd Ed.) by Barry H. Cohen Chapter 13 Section D F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons In section B of this chapter,
More informationA posteriori multiple comparison tests
A posteriori multiple comparison tests 11/15/16 1 Recall the Lakes experiment Source of variation SS DF MS F P Lakes 58.000 2 29.400 8.243 0.006 Error 42.800 12 3.567 Total 101.600 14 The ANOVA tells us
More informationMultiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale
Multiple Comparison Procedures 1 Multiple Comparison Procedures, Trimmed Means and Transformed Statistics Rhonda K. Kowalchuk Southern Illinois University Carbondale H. J. Keselman University of Manitoba
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationIntroduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique
More information13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA
More informationLec 1: An Introduction to ANOVA
Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to
More informationMultiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions
Introduction to Analysis of Variance 1 Experiments with More than 2 Conditions Often the research that psychologists perform has more conditions than just the control and experimental conditions You might
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationB. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1)
B. Weaver (18-Oct-2006) MC Procedures... 1 Chapter 1: Multiple Comparison Procedures 1.1 Introduction The omnibus F-test in a one-way ANOVA is a test of the null hypothesis that the population means of
More informationApplication of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption
Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,
More informationChapter 14: Repeated-measures designs
Chapter 14: Repeated-measures designs Oliver Twisted Please, Sir, can I have some more sphericity? The following article is adapted from: Field, A. P. (1998). A bluffer s guide to sphericity. Newsletter
More informationDegrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large
Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different
More informationIntroduction to the Analysis of Variance (ANOVA)
Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more
More informationA comparison of Type I error rates for the Bootstrap Contrast with the t test and the Roburst Rank Order test for various sample sizes and variances
Lehigh University Lehigh Preserve Theses and Dissertations 1993 A comparison of Type I error rates for the Bootstrap Contrast with the t test and the Roburst Rank Order test for various sample sizes and
More informationTHE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED
THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los Angeles,
More informationConventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates
Journal of Modern Applied Statistical Methods Volume Issue Article --3 Conventional And And Independent-Samples t Tests: Type I Error And Power Rates Katherine Fradette University of Manitoba, umfradet@cc.umanitoba.ca
More informationU.S. Department of Agriculture, Beltsville, Maryland 20705
AN EVALUATION OF MULTIPLE COMPARISON PROCEDURES D. R. Waldo 1'2 U.S. Department of Agriculture, Beltsville, Maryland 20705 SUMMARY Least significant difference, Duncan's multiple range test, Student-Newman-Keuls,
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More informationsphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19
additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA
More informationSEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics
SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS
More informationMultiple Comparisons
Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when
More informationTransition Passage to Descriptive Statistics 28
viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of
More informationH0: Tested by k-grp ANOVA
Analyses of K-Group Designs : Omnibus F, Pairwise Comparisons & Trend Analyses ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures
More informationThe entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.
One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine
More informationLaboratory Topics 4 & 5
PLS205 Lab 3 January 23, 2014 Orthogonal contrasts Class comparisons in SAS Trend analysis in SAS Multiple mean comparisons Laboratory Topics 4 & 5 Orthogonal contrasts Planned, single degree-of-freedom
More informationLinear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1
Linear Combinations Comparison of treatment means Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 6 1 Linear Combinations of Means y ij = µ + τ i + ǫ ij = µ i + ǫ ij Often study
More informationMultiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota
Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is
More informationIntroduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.
Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of
More informationPSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests
PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution
More informationSPSS Guide For MMI 409
SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino
More informationHypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -
Hypothesis Testing with One-Way ANOVA Statistics Arlo Clark-Foos Conceptual Refresher 1. Standardized z distribution of scores and of means can be represented as percentile rankings. 2. t distribution
More informationGroup comparison test for independent samples
Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations
More informationpsychological statistics
psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,
More informationGENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE
DOI 10.1007/s11018-017-1141-3 Measurement Techniques, Vol. 60, No. 1, April, 2017 GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY
More informationSTAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)
STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationJournal of Educational and Behavioral Statistics
Journal of Educational and Behavioral Statistics http://jebs.aera.net Theory of Estimation and Testing of Effect Sizes: Use in Meta-Analysis Helena Chmura Kraemer JOURNAL OF EDUCATIONAL AND BEHAVIORAL
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationIntroduction to Analysis of Variance (ANOVA) Part 2
Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable:
More informationH0: Tested by k-grp ANOVA
Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures Alpha estimation reconsidered H0: Tested by k-grp ANOVA Regardless
More informationNONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY
NONPARAMETRICS Statistical Methods Based on Ranks E. L. LEHMANN University of California, Berkeley With the special assistance of H. J. M. D'ABRERA University of California, Berkeley HOLDEN-DAY, INC. San
More informationTWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED DESIGN
Libraries Annual Conference on Applied Statistics in Agriculture 1995-7th Annual Conference Proceedings TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationExamining Multiple Comparison Procedures According to Error Rate, Power Type and False Discovery Rate
Journal of Modern Applied Statistical Methods Volume 11 Issue 2 Article 7 11-1-2012 Examining Multiple Comparison Procedures According to Error Rate, Power Type and False Discovery Rate Guven Ozkaya Uludag
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationApplication of Variance Homogeneity Tests Under Violation of Normality Assumption
Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com
More informationOn Selecting Tests for Equality of Two Normal Mean Vectors
MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department
More informationBasic Statistical Analysis
indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,
More informationSTAT22200 Spring 2014 Chapter 5
STAT22200 Spring 2014 Chapter 5 Yibi Huang April 29, 2014 Chapter 5 Multiple Comparisons Chapter 5-1 Chapter 5 Multiple Comparisons Note the t-tests and C.I. s are constructed assuming we only do one test,
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationIntroduction. Chapter 8
Chapter 8 Introduction In general, a researcher wants to compare one treatment against another. The analysis of variance (ANOVA) is a general test for comparing treatment means. When the null hypothesis
More informationChapter 6 Planned Contrasts and Post-hoc Tests for one-way ANOVA
Chapter 6 Planned Contrasts and Post-hoc Tests for one-way NOV Page. The Problem of Multiple Comparisons 6-. Types of Type Error Rates 6-. Planned contrasts vs. Post hoc Contrasts 6-7 4. Planned Contrasts
More informationCOMPARING SEVERAL MEANS: ANOVA
LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons
More informationINTRODUCTION TO INTERSECTION-UNION TESTS
INTRODUCTION TO INTERSECTION-UNION TESTS Jimmy A. Doi, Cal Poly State University San Luis Obispo Department of Statistics (jdoi@calpoly.edu Key Words: Intersection-Union Tests; Multiple Comparisons; Acceptance
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationMultiple Comparison Procedures for Trimmed Means. H.J. Keselman, Lisa M. Lix and Rhonda K. Kowalchuk. University of Manitoba
1 Multiple Comparison Procedures for Trimmed Means by H.J. Keselman, Lisa M. Lix and Rhonda K. Kowalchuk University of Manitoba Abstract Stepwise multiple comparison procedures (MCPs) based on least squares
More information1 One-way Analysis of Variance
1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment
More informationAnalysis of variance (ANOVA) Comparing the means of more than two groups
Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationIncreasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University
Power in Paired-Samples Designs Running head: POWER IN PAIRED-SAMPLES DESIGNS Increasing Power in Paired-Samples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton
More informationChapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics
Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely
More information4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures
Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures
More informationBasic Business Statistics, 10/e
Chapter 1 1-1 Basic Business Statistics 11 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Basic Business Statistics, 11e 009 Prentice-Hall, Inc. Chap 1-1 Learning Objectives In this chapter,
More informationAn Overview of the Performance of Four Alternatives to Hotelling's T Square
fi~hjf~~ G 1992, m-t~, 11o-114 Educational Research Journal 1992, Vol.7, pp. 110-114 An Overview of the Performance of Four Alternatives to Hotelling's T Square LIN Wen-ying The Chinese University of Hong
More informationStatistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data
Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data 1999 Prentice-Hall, Inc. Chap. 10-1 Chapter Topics The Completely Randomized Model: One-Factor
More informationOne-Way Analysis of Covariance (ANCOVA)
Chapter 225 One-Way Analysis of Covariance (ANCOVA) Introduction This procedure performs analysis of covariance (ANCOVA) with one group variable and one covariate. This procedure uses multiple regression
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More informationNAG Library Chapter Introduction. G08 Nonparametric Statistics
NAG Library Chapter Introduction G08 Nonparametric Statistics Contents 1 Scope of the Chapter.... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric Hypothesis Testing... 2 2.2 Types
More informationStatistics and Measurement Concepts with OpenStat
Statistics and Measurement Concepts with OpenStat William Miller Statistics and Measurement Concepts with OpenStat William Miller Urbandale, Iowa USA ISBN 978-1-4614-5742-8 ISBN 978-1-4614-5743-5 (ebook)
More informationAPPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV
DOI 10.1007/s11018-017-1213-4 Measurement Techniques, Vol. 60, No. 5, August, 2017 APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY OF VARIANCES. PART IV B. Yu. Lemeshko and T.
More informationhttp://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences
More informationIntuitive Biostatistics: Choosing a statistical test
pagina 1 van 5 < BACK Intuitive Biostatistics: Choosing a statistical This is chapter 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc.
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationINFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT
Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationTHE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook
BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationHypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true
Hypothesis esting Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true Statistical Hypothesis: conjecture about a population parameter
More informationTEST POWER IN COMPARISON DIFFERENCE BETWEEN TWO INDEPENDENT PROPORTIONS
TEST POWER IN COMPARISON DIFFERENCE BETWEEN TWO INDEPENDENT PROPORTIONS Mehmet MENDES PhD, Associate Professor, Canakkale Onsekiz Mart University, Agriculture Faculty, Animal Science Department, Biometry
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationPsicológica ISSN: Universitat de València España
Psicológica ISSN: 0211-2159 psicologica@uv.es Universitat de València España Zimmerman, Donald W.; Zumbo, Bruno D. Hazards in Choosing Between Pooled and Separate- Variances t Tests Psicológica, vol. 30,
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationPreface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of
Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures
More informationComparison of Two Samples
2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More informationOutline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013
Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical
More informationContents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47
Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................
More informationDESIGN AND ANALYSIS OF EXPERIMENTS Third Edition
DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition Douglas C. Montgomery ARIZONA STATE UNIVERSITY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore Contents Chapter 1. Introduction 1-1 What
More informationInferential Statistics
Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free
More information