Multiple Comparison Procedures for Trimmed Means. H.J. Keselman, Lisa M. Lix and Rhonda K. Kowalchuk. University of Manitoba

Size: px
Start display at page:

Download "Multiple Comparison Procedures for Trimmed Means. H.J. Keselman, Lisa M. Lix and Rhonda K. Kowalchuk. University of Manitoba"

Transcription

1 1 Multiple Comparison Procedures for Trimmed Means by H.J. Keselman, Lisa M. Lix and Rhonda K. Kowalchuk University of Manitoba

2 Abstract Stepwise multiple comparison procedures (MCPs) based on least squares and trimmed estimators were compared for their rates of Type I error and their ability to detect true pairwise group differences. The MCPs were compared in unbalanced one-way completely randomized designs when normality and homogeneity of variance assumptions were violated. Results indicated that MCPs based on trimmed means and Winsorized variances controlled rates of Type I error, whereas MCPs based on least squares estimators typically could not, particularly when the data were highly skewed. However, MCPs based on least squares estimators were substantially more powerful than their counterparts based on trimmed means and Winsorized variances when the data were only moderately skewed, a finding which qualifies recommendations on the use of trimmed estimators offered in the literature.

3 Multiple Comparison Procedures for Trimmed Means Tests of Trimmed Means 3 Pairwise multiple comparison procedures (MCPs) are frequently adopted by educational and psychological researchers to detect specific differences among treatment groups. Not surprisingly, therefore, a substantial body of literature exists which examines the validity of these procedures when derivational assumptions are not satisfied (e.g., see Hochberg & Tamhane, 1987; Shaffer, 1994; Toothaker, 1991; Wilcox, 1993). The validity studies have predominately examined the MCPs under conditions of variance heterogeneity since it is well known that they are sensitive to departures from the homogeneity of variance assumption (e.g., see Toothaker, 1991). However, MCPs that use an F, t, or q statistic are also dependent upon the normality assumption, which unfortunately, is unlikely to be met with educational and psychological data. Specifically, recent surveys indicate that the data collected by educational and psychological researchers rarely if ever come from populations that are characterized by the normal density function (Micceri, 1989; Wilcox, 1990). Hence, the validity of statistical procedures that assume this underlying structure to the data is seriously in question. Specifically, the effect of using the classical procedures (i.e., the analysis of variance [ANOVA] F-test, Student's t-test, the Studentized range q-test) when the data are nonnormal is a distortion in the rates of Type I and Type II errors (or, the power of the test), particularly when other derivational assumptions are not satisfied. These results are predictable on theoretical grounds. Cressie and Whitford (1986) have shown that, unless population variances or group sizes are equal, Student's two-sample t test is not asymptotically correct when the group distributions have unequal third cumulants; therefore, Type I error inflation is expected. Reductions in the power to detect differences between groups occur because the usual population standard deviation ( 5) is greatly influenced by the presence of extreme observations (outliers) in a distribution of scores. Consequently, the standard error of the mean, 5/ Èn, (where n is the sample size)

4 4 can become seriously inflated when the underlying distribution has heavy tails (Tukey, 1960; Wilcox, 1995a, 1995b). Thus, standard errors of t and F are smaller than they should be and power, accordingly, will be depressed. The literature contains numerous potential solutions to circumvent the effects that these assumption violations have on the classical procedures for assessing mean equality (e.g., see Glass, Peckham & Sanders, 197; Harwell, 199; Lix, Keselman & Keselman, 1996; Wilcox, 1995a, 1995b). Of greatest interest are robust methods, which, as the name implies, are intended to provide valid tests of mean equality when assumptions are not satisfied. For example, Welch's (1951) test for mean equality in one-way completely randomized designs, is intended to be insensitive to variance heterogeneity because it does not rely on a pooled estimate of error variability and because sample variances and sizes are used in estimating the error degrees of freedom (df). Welch's test, and others like it, however, can not always effectively control the Type I and II error rates when variance heterogeneity occurs in combination with nonnormality, particularly when group sizes are unequal, that is, when the design is unbalanced (Coombs, Algina & Oltman, 1996; Lix et al., 1996; Oshima & Algina, 199; Wilcox, 1990, 1995a, 1995b). Wilcox (1995a, 1995b), Yuen (1974), and Yuen and Dixon (1973), among others, have discussed another class of estimation and testing methods that are intended to be robust to the combined effects of nonnormality and variance heterogeneity. Specifically, these authors indicate that by using trimmed means and Winsorized variances with the usual robust tests, one can obtain tests which are insensitive to the combined effects of variance heterogeneity and nonnormality. Consequently, of late, a number of papers have discussed and examined methods involving trimmed means and Winsorized variances for testing omnibus hypotheses for treatment group equality (Lix & Keselman, 1996; Wilcox, 1995a, 1995b). While these methods have been shown to be highly effective, we caution that these measures should only be adopted if one is interested in testing for treatment effects across groups using a measure of location that more accurately reflects

5 5 the typical score within a group when working with heavy-tailed distributions. As an illustration of how a trimmed mean may provide a better estimate of the typical score than the usual mean, consider the example given by Wilcox (1995a, p. 57) in which a single score in a chi-square distribution with four df (hence. œ 4) is multiplied by 10 (with probability.1). Because this single deviant score causes a shift in the distribution, the usual mean now equals 7.6, a value closer to the upper tail of the distribution. A trimmed mean based on censoring 0% of the data in each tail of the distribution however, equals 4., a value that is closer to the bulk of scores, hence closer to the typical score in the distribution. Nonetheless, readers should note that the hypothesis tested when the usual mean is used as an estimate of location is not the same as that tested when the trimmed mean is employed unless distributions are symmetric (in the nonnormal conditions we investigated they were not symmetric). Consequently, we stress that the researcher needs to be clear on the goals of data analysis prior to choosing a particular method of statistical inference. The number of papers discussing and examining MCPs for treatment group equality using trimmed means and Winsorized variances are, however, extremely limited (Dunnett, 198; Wilcox, in press). The Dunnett paper compared a number of robust procedures, including a test of trimmed means, but did not examine the effects of sampling from skewed distributions, which according to Micceri (1989) and Wilcox (1990), characterize behavioral science data. The Wilcox paper, on the other hand, did examine the combined effects of nonnormality and variance heterogeneity in a limited fashion, showing that tests using trimmed means and Winsorized variances can provide improved Type I error control and greater power when both normality and homogeneity assumptions are ointly violated. In his investigation, Wilcox (in press) compared Dunnett's (1980) T3 method, a simultaneous MCP that uses one critical value (CV) to assess the statistical significance for a set of pairwise comparisons, with two stepwise MCPs for pairwise comparisons that

6 6 use a succession of CVs in assessing statistical significance. One stepwise procedure was an analogue of Hayter's (1986) two-stage test and the second was a multiple range procedure that begins with an omnibus test (see Shaffer, 1979). The Type I error rates of the MCPs that used trimmed means and Winsorized variances were less affected by nonnormality and variance heterogeneity than the MCPs which did not incorporate trimming; as well, these procedures were more powerful for detecting true differences in the population trimmed means. Wilcox's (in press) investigation was quite limited, as he only examined a few MCPs under a small number of variance heterogeneity conditions. As well, only a fourgroup design was used. His work can be extended to a plethora of MCPs that can be used with trimmed means and Winsorized variances. Furthermore, additional conditions of variance heterogeneity need to be investigated, particularly for one-way layouts with varied number of groups, in order to gain a greater understanding of the effects of using this method of robust estimation. Consequently, the purpose of this investigation was to compare numerous MCPs for pairwise comparisons of trimmed and nontrimmed means in one-way completely randomized designs under conditions of nonnormality and variance heterogeneity deemed characteristic of behavioral science research. Definition of The MCPs Suppose n independent random observations X, X, ÞÞÞ, X are sampled from 1 n population ( œ "ß ÞÞÞ ß J). The classical procedures for testing the omnibus null hypothesis, H :. œ. œ á œ., assume that the X s are obtained from a normal! " J i # # # w population with mean. and unknown variance 5, with 5 œ 5w ( Á ). Let _ X œ DX /n and s œ D (X X ) /(n 1), where X is the estimate of. and s is the i i i i usual unbiased estimate of the variance for population. Further, let the standard error of the mean be denoted as S œ as /n b " # # #, and a œ "ÎS Î( D 1/S ).

7 7 Omnibus Tests The MCPs investigated in this study were stepwise procedures for pairwise comparisons; for some of these MCPs an omnibus test statistic is used as the first stage test. The omnibus procedures used in the present investigation were the Welch (1951), James (1951) second-order, Box (1954), and Alexander and Govern (1994) tests, which are subsequently denoted by W, J, BOX, and AG, respectively. These tests were used in the current investigation because Lix & Keselman (1997) found that they typically provided good Type I error protection when computed with trimmed means and Winsorized variances even when populations were nonnormal and variances were heterogeneous. The W omnibus test may be defined as œ1 F œ, W J! µ w(x X) /(J 1) J (J ) (1 w/ 1! Dw) (J 1) n 1 œ1 (1) µ where, w œ n /s, and X œ Dw X / Dw. The statistic is approximately distributed as an F variate and is referred to the CV, F[(1! ); /, / ], the (1! ) quantile of the F distribution with / 1 and / W df, where / 1 œ J " and 1 W / W œ J 1 J (1 w/ w) 3! D. () n 1 œ1 The J second-order statistic can be defined as

8 8 J J œ! µ w (X X ). (3) W =1 This statistic is approximately distributed as a chi-square variate. See Wilcox (1996, p. 183) for an enumeration of the CV. The BOX test was used in this investigation because Rubin (1983) found that, compared to other robust omnibus tests, it provided better Type I error protection. That is, in addition to demonstrating that the robust procedure given by Brown and Forsythe (1974) is not asymptotically correct, Rubin found that a better test for mean equality could be obtained by incorporating Box's procedure of adopting a corrected numerator df, as well as the usual denominator df correction given by Brown and Forsythe. This w statistic (F ) is defined as w œ J! n(x X) œ1 F J (4)![1 (n /N)]s œ1, _ J _ J where X œ! n X /N and N œ! n. œ1 =1 w According to Box (1954), F is approximately distributed as an F variable with w / 1 and / w df, where / w 1 œ J J! (1 f )s œ1 Œ! sf! s(1 f) œ1 œ1 J, and (5)

9 / w œ J Œ!(1 f )s œ1 J s 4(1 f )! (n 1) œ1 Tests of Trimmed Means 9, and (6) f œ n/n. The AG procedure [as well as the procedures given by James (1951), and Welch (1951); see Alexander & Govern, 1994] for testing the omnibus null hypothesis, in the presence of variance heterogeneity, may be obtained from a single general result. That is, a one-sample statistic can be computed for each group as where \. s > œ S, Ð7Ñ N _.s œ! a X, the variance weighted grand mean. In Alexander and Govern's =1 solution, a normalizing transformation is first applied to each >. These normalized values (i.e., D scores) are then used to derive a statistic ( D ) that is approximately distributed D as ; [(1! ); /], the (1! ) quantile of the ; distribution with / œ J 1. Pairwise Comparison and Range Test The pairwise tests and the range tests were conducted with the nonpooled approximate df statistic given by Welch (1938) which can be expressed as t w œ X X w. (8) Í s s Ì n w n w

10 10 This statistic is approximated as a t variate with CV t[1! /); / ], the (1! /) quantile of Student's t distribution with df W s s n w n Ÿ w / W œ Þ s /n s Š /n Š w w n 1 n 1 w (9) Multiple Comparison Methods The MCPs investigated in this study used pairwise and omnibus test statistics which are not dependent on the assumption of variance homogeneity with various stepwise methods for controlling the overall (familywise) rate of Type I error. In particular, the MCPs examined were: (a) Welsch's (1977) step-up procedure, (b) the Ryan (1960)-Welsch (1977) multiple range procedure, (c) the Peritz (1970) procedure, (d) Shaffer's (1986) sequentially reective Bonferroni procedure, (e) Shaffer's (1986) sequentially reective Bonferroni procedure that begins with an omnibus test, (f) Hochberg's (1988) step-up sequentially acceptive Bonferroni procedure, (g) a multiple range procedure that begins with an omnibus test (see Shaffer 1979, 1986), and (h) Hayter's (1986) two-stage modified Least Significant Difference (LSD) procedure. Range procedures were conducted in the usual manner and with the modification given by Duncan (1957). With the usual range procedure, the means are rank ordered from smallest to largest and the difference between the smallest and largest means is first subected to a statistical test using a J-range criterion. If this difference is not significant, testing stops and all pairwise differences are regarded as null. If, on the other hand, this first range test is statistically significant, one `steps-down' to examine the two J 1 subsets of ordered means, that is, the smallest mean versus the next-to-largest mean and the largest mean versus the next-to-smallest mean. At each stage of testing, only subsets

11 11 of ordered means that are statistically significant are subected to further testing. With Duncan's procedure (see Hochberg & Tamhane, 1987, pp ; Shaffer, 1974), the maximally significant pairwise statistic in a set of J means is first evaluated for significance. If this stage one test is significant, one steps down to compare the maximally significant statistic in each of the J 1 subsets to the appropriate J 1 CV, where the subsets are obtained by separating the two significantly different means into two distinct sets. Continuing in this fashion (i.e., J, J 3, etc.), the maximally significant statistic is tested only if it is contained within a subset of means previously declared to be nonnull. MCPs incorporating Duncan's (1957) method never have less power than the usual range MCPs. The Welsch (1977) procedure, a step-up MCP, begins by examining the adacent -range comparisons and only steps up to examine larger range comparisons when the smaller range tests fail to reach statistical significance. The Welsch MCP is designated WELSCH. The next MCP investigated was the usual Ryan (1960)-Welsch (1977) multiple range procedure, which begins by examining the J range, and steps down to examine successively smaller ranges only when a larger range test is declared significant. According to Ryan and Welsch, the overall rate of Type I error is controlled at! (when assumptions are satisfied) for a set of p (p œ, á,j) means if each test is assessed for significance at a level equal to p! œ 1 (1! ) J [ Ÿ p Ÿ J ],! œ! œ!. p J-1 J The designations q and q(d), where D is an abbreviation for the Duncan (1957) version of the range test, are used to denote the two forms of this MCP. The MCP of Peritz (1970) follows the same step-down logic of the usual range procedure, but assesses the significance of the pairwise contrast with Newman (1939)-Keuls (195) and/or Ryan- Welsch CVs. This MCP is designated as PERITZ. Shaffer's (1986) sequentially reective Bonferroni procedure uses probability (p)-values in assessing the pairwise

12 1 hypotheses taking into account the number of hypotheses reected at earlier stages in the sequence in arriving at decisions regarding significance. The abbreviation for this MCP is SRB. Shaffer's (1986) modified Bonferroni procedure begins with an omnibus test, and, if reected, assesses significance of the pairwise contrasts using Bonferroni levels of significance which reflect the number of true pairwise hypotheses remaining given previous reections. These MCPs are designated as W/SRB, BOX/SRB, J/SRB, and AG/SRB. Hochberg's (1988) sequentially acceptive Bonferroni procedure uses the p- values associated with the pairwise tests to arrive at accept/reect decisions; these are determined sequentially and hypotheses can be reected by implication. Hochberg's MCP is designated as HOCHBERG. Another group of MCPs were based on the modified range procedure due to Shaffer (1979, 1986) which starts with a test of the omnibus hypothesis and, only upon reection, moves on to test range hypotheses with Ryan- Welsch CVs, modifying the J-range CV to one based on J 1 means. The abbreviations for the eight Shaffer MCPs investigated are: W/q, W/q(D), BOX/q, BOX/q(D), J/q, J/q(D), AG/q, and AG/q(D). Finally, Hayter's (1986) modified LSD begins with a test of the omnibus hypothesis, which if reected, leads to the stage two tests of the pairwise contrasts using a Studentized range CV for J 1 means. Four MCPs based on Hayter's method were examined: W/HAYTER, BOX/HAYTER, J/HAYTER, and AG/HAYTER. Detailed descriptions of all procedures can be found in Keselman (1994), and in the original authors' papers. Robust Estimation Another consideration in this paper was the application of robust estimates of the group means and variances to these various test procedures, specifically the use of trimmed means and Winsorized variances. When trimmed means are being compared the null hypothesis(ses) pertain to the equality of population trimmed means, i.e., the. t s.

13 13 That is, the omnibus null hypothesis is H!:. t1 œ. t œ ÞÞÞ œ. tj, while the null hypothesis for each of the pairwise comparisons is H :. œ. w. 0 t t Let X Ÿ X Ÿ á Ÿ X represent the ordered observations associated with a (1) () (n) group. Let g œ [# n], where # represents the proportion of observations that are to be trimmed in each tail of the distribution and [ B] is notation for the largest integer not exceeding B. Wilcox (1995a, 1995b) suggests that 0% trimming should be used. The effective sample size becomes h œ n g. Then the sample trimmed mean is q X œ!. (10) n g X(i) t 1 n g iœg 1 An estimate of the standard error of the trimmed mean is based on the Winsorized mean and Winsorized sum of squares. The sample Winsorized mean is q X œ 1[(g 1)X X á X (g 1)X ], w n (g 1) (g ) (n g 1) (n g) and the sample Winsorized sum of squared deviations is q q q SSD œ (g 1)(X X ) (X X ) á (X X ) w (g 1) w (g ) w (n g 1) w (g 1)(X q X ). (n g) w Accordingly, the squared standard error of the mean is estimated as (Staudte & Sheather, 1990) d œ SSD w h (h 1). (11)

14 14 Under robust estimation, the trimmed group means, winsorized group variances, winsorized group standard errors of the means, and effective sample sizes were substituted in the appropriate equation for a particular test statistic. For example, the W q q test would be obtained as follows. Let u œ 1/d, U œ Du, and X œ 1/U( u X ). Accordingly, the test is defined as D t t J! u ÐX X Ñ ÎÐJ 1Ñ œ1 t t F [ t œ ß (1) 1 J J 1 u U Ð Ñ Ð Î Ñ! ÐJ 1Ñ h 1 œ1 and / W is estimated by / W œ J 1 J (1 u /U) 3! h 1 œ1. (13) q q To test H 0:. t œ. t w, compute X t and d for the th group, label the results X t and d. The robust pairwise test becomes with estimated df q t W œ X q t X w t, (14) Éd dw / W œ (d d w ). (15) d /(h 1) d /(h w 1) w Method

15 15 A total of MCPs for examining pairwise comparisons of means were compared for their rates of Type I error and statistical power under conditions of nonnormality and variance heterogeneity in one-way independent groups designs. Eight variables were manipulated in the study: (a) number of groups (3 and 6), (b) the group size equality/inequality (C œ 0, C œ.163 and C œ.37, where C denotes a coefficient of group size variation, defined as ( D (n q " n) /J) / q n, where q # n is the average group size), (c) degree/pattern of variance heterogeneity (moderate and large/all (mostly) unequal and all but one equal), (d) pairing of variances and group sizes (seven cases), (e) population distribution (normal and nonnormal), (f) the type of null hypothesis (complete and partial), (g) the configuration of the nonnull means (minumum, maximum, equal variability of treatment means) and (h) the effect size (ranging from 0.75 to 5.00). We chose to investigate completely randomized designs containing three and six groups in order to see if there was an effect on the Type I error and power rates of the MCPs related to the size of the set of pairwise comparisons. In the former case, only three comparisons are made, while in the latter, 15 pairs of means are tested. The MCPs were investigated when the number of observations in each group were equal or unequal. When equal (i.e., C œ 0), group sizes were set at 0 in both the J œ 3 and J œ 6 designs. Wilcox (in press) recommended this number in order to obtain robust Type I error rates for MCPs that use trimmed means when distributions are nonnormal. For the J œ 3 design, two cases of group size inequality were examined: n œ 16, 0, and 4 (C œ.163) and n œ 1, 0, and 8 (C œ.37). The corresponding J œ 6 sample sizes were n œ 16, 16, 0, 0, 4, and 4 (C œ.163) and n œ 1, 1, 0, 0, 8, and 8 (C œ.37). The third variable manipulated was the degree/pattern of variance heterogeneity. Specifically, two patterns were examined: (a) all (most) variances unequal (Pattern 1) and (b) all variances equal but one (Pattern ). Thus, when J œ 3, Pattern 1 was œ 1, 9, 16 5

16 16 and Pattern was 5 œ 1, 1, 16. The corresponding J œ 6 values for Patterns 1 and were, respectively, 5 œ 1, 1, 4, 9, 9, 16 and 5 œ 1, 1, 1, 1, 1, 16. We chose to investigate these patterns since others found that they affect rates of Type I error (see Wilcox, Charlin & Thompson, 1986). In addition, according to Wilcox et al. and Fenstad (1983) it is not uncommon to have populations with standard deviations that are in a 4:1 ratio. The fourth variable manipulated was the nature of the pairing of the group sizes and variances. We chose to investigate cases in which group sizes were both equal and unequal and paired with equal or unequal variances. Specifically, the combinations that w were investigated were: a) equal n ; equal 5, b/b') equal n ; unequal 5, c/c ) unequal n ; unequal 5 (positively paired), and d/d ) unequal n ; unequal 5 (negatively paired). w The b/c/d conditions designates the combinations incorporating Pattern 1 variances while w w the b'/c /d conditions refers to the combinations incorporating Pattern variances Insert Figure 1 About Here With respect to the effects of distributional shape, we chose to investigate the normal distribution as well as conditions in which the data were obtained from two skewed distributions. In addition to generating data from the ; 3 distribution we also used the method described in Hoaglin (1985) to generate a distribution with a more extreme degree of skewness (# 1) and kurtosis (# ). Specifically, we chose to investigate a g œ 1 and h œ 0 (notated throughout the paper as g œ 1/h œ 0) g- and h-distribution. These particular types of nonnormal distributions were selected since educational and psychological research data typically have skewed distributions (Micceri, 1989; Wilcox, 1994). Furthermore, Sawilowsky and Blair (199) investigated the effects of eight nonnormal distributions identified by Micceri on the robustness of Student's t test and

17 17 found that only distributions with the most extreme degree of skewness which were investigated (e.g., # 1 œ 1.64) were found to affect the Type I error control of the independent sample t statistic. Thus, since the statistics we investigated have operating characteristics similar to those reported for the t statistic, we felt that our approach to modeling skewed data would adequately reflect conditions in which those statistics might not perform optimally. For the ; 3 distribution, skewness and kurtosis values are # œ 1.63 and # œ 4.00, respectively. Corresponding values for the g œ 1/h œ 0 1 distribution are 6.0 and 114, respectively. Finally, it should be noted that though the g œ 1/h œ 0 distribution is extremely skewed, it is, according to Wilcox, representative of psychometric measures (see Figure 1 for a graph of this distribution). With respect to rates of Type I error we investigated the MCPs under a complete and partial null hypothesis. The partial null hypotheses for J œ 3 and J œ 6 were. 1 œ. Á. (or. œ. Á. ) and. œ. œ. Á. œ. œ. (or. œ. 3 t1 t t t1 t œ. t3 Á. t4 œ. t5 œ. t6), respectively. To investigate the statistical power of the MCPs, different configurations of nonnull means were examined. That is, we selected means such that there was (a) a minimum range configuration, which occurs when half the means in the range are equal and the remaining half are also equal but different from the first half, (b) a maximum range configuration, which occurs when there is one mean at each extreme of the range and the remaining means equal the average of these two extreme values, and (c) an equally spaced range configuration, which occurs when the means are equally spaced across the range. For each mean configuration different effect sizes were investigated. Effect sizes were chosen for each distribution such that floor and ceiling effects were avoided across the seven combinations of group sizes and variance conditions investigated; thus, the effect sizes investigated were not constant across the distributions

18 18 nor the mean configurations. Accordingly, the MCPs can not be compared across distributions. Two definitions of power were used in this investigation: (a) the average per-pair power rate, where per-pair power is the probability of detecting a true pairwise difference, and (b) the all-pairs power rate, or the probability of detecting all true pairwise differences. The any-pairs power rate, or the probability of detecting at least one true pairwise difference was not collected since we agree with Ramsey (1978) and Keselman (1994) that for this type of power, an omnibus test rather than a MCP would be best. To generate pseudo-random normal variates, we used the SAS generator RANNOR (SAS Institute, 1989). If Z is a standard normal variate, then i X œ. 5 Z is a normal variate with mean equal to. and variance equal to 5. i i To generate pseudo-random variates having a ; distribution with three df, three standard normal variates were squared and summed. The variates were standardized, and 3 then transformed to ; variates having mean. (when comparing the tests based on the least squares estimates) or. t (when comparing the tests based on trimed means) and variance 5. (see Hastings & Peacock, 1975, pp , for further details on the generation of data from these distributions). To generate data from a g- and h-distribution, standard unit normal variables (Z) were converted to the random variable X i œ exp (g Z i) 1 g expœ h Zi, according to the values of g and h selected for investigation. To obtain a distribution with standard deviation 5, each X ( œ 1, á,j) was multiplied by a value of 5. It is i important to note that this does not affect the value of the null hypothesis when g œ 0 (see Wilcox, 1994, p. 97). However, when g 0, the population mean for a g- and h-

19 distributed variable is 1. gh œ g(1 h) ( /(1 h) 1) " eg # Tests of Trimmed Means 19 (see Hoaglin, 1985, p. 503). Thus, for those conditions where g 0,. gh was first subtracted from X before multiplying by 5. When working with trimmed means,. i t was first subtracted from each observation. Lastly, it should be noted that the standard deviation of a g- and h-distribution is not equal to one, and thus the values previously enumerated reflect only the amount that each random variable is multiplied by and not the actual values of the standard deviations (see Wilcox, 1994, p. 98). As Wilcox notes, the values for the variances (standard deviations) more aptly reflect the ratio of the variances (standard deviations) between the groups. We always applied symmetric trimming, removing 0% of the observations from each tail of a groups' set of scores, since this degree of trimming has been studied by other investigators and is believed to have desirable properties (see Wilcox, 1995a, 1995b). Five thousand replications of each condition were performed using a.05 significance level. Results Type I Error Rates Preliminary analysis of the data indicated that the effects due to the degree of sample size imbalance were extremely small. That is, for each value of J, the rates of Type I error of the MCPs did not differ by more than one percent for the two unbalanced designs which were investigated. Thus, rates were initially averaged over the two cases in which C 0, for both J œ 3 and J œ 6. Furthermore, due to the large number of MCPs investigated and the variety of assumption violation conditions under which each was assessed, we have only tabled the average (Mean) and minimum and maximum (M-M) Type I error rates across the seven

20 0 w combinations of group sizes and variances (i.e., conditions a through d ). Throughout the following section, we discuss findings related to these combinations which are noteworthy. We used Bradley's (1978) liberal criterion of robustness to assess the MCPs. According to this criterion, in order for a test to be considered robust, its empirical rate of Type I error (! s) must be contained in the interval 0.5! Ÿ! s Ÿ 1.5!. Therefore, for the five percent level of significance used in this study, a test was considered robust in a particular condition if its empirical rate of Type I error fell within the interval.05 Ÿ s! Ÿ.075. Correspondingly, a test was considered to be nonrobust if, for a particular condition, its Type I error rate was not contained in this interval. In the tables, bolded entries are used to denote liberal values, those that exceed Bradley's upper limit, while conservative values, which are less than the lower limit, are underlined. Since the mean does not convey information concerning the extreme values that occur in the data, the bolding and underlining has been applied to the M-M values. Finally, it should be noted that only Type I error rates associated with the complete null hypothesis are tabled. However, shading has been employed in the tables to indicate those conditions in which a test was liberal under the partial null hypothesis but not under the complete null hypothesis. J œ 3 When J œ 3, a number of the investigated MCPs are identical for the complete null hypothesis. Specifically, the Hayter (1986) two-stage and Shaffer (1986) SRB that begins with an omnibus test are identical procedures. (The reader should note that Hayter's MCP is the usual Fisher (1935) LSD test when J œ 3.) Consequently, the results for these MCPs have been combined into one table entry with the following abbreviations: (a) W/SRB and W/HAYTER œ W/*, (b) BOX/SRB and BOX/HAYTER œ BOX/*, (c) J/SRB and J/HAYTER œ J/*, and (d) AG/SRB and

21 1 AG/HAYTER œ AG/*. Finally, for J œ 3, the Ryan (1960)-Welsch (1977) range and Peritz (1970) procedures are identical. These MCPs therefore also are tabled as one procedure with the following new notation: q and PERITZ œ q/peritz. Least Squares Estimation. The Type I error rates (%) for MCPs based on least squares estimators are contained in Table 1. When the simulated data were obtained from the normal distribution, all but the WELSCH procedure provided effective Type I error control over all variance heterogeneity conditions for both the complete and partial null hypotheses Insert Table 1 About Here When the simulated data were obtained from the ; 3 distribution, only the q/peritz MCP had rates consistently below the upper bound of Bradley's (1978) liberal limit, with an average rate of 5.3% and M-M values of 3.90% and 6.01%. Four MCPs, SRB, W/q, J/q, and AG/q, were not able to control their partial null rates of Type I error though they effectively controlled their rates under the complete null hypothesis. When the data were obtained from the g œ 1/h œ 0 distribution, the MCPs that used least squares estimators resulted in values which were always outside of Bradley's (1978) interval. Note that all mean rates of Type I error were liberal. Values were very conservative or very liberal depending on the particular combination of group size/variance equality/inequality investigated. Specifically, values were frequently conservative when group sizes and variances were equal (condition a); the most conservative value was.16% (SRB). Similar to the omnibus test results reported by Lix and Keselman (1997), the MCPs typically produced liberal rates of error for all remaining combinations of unequal group sizes and unequal variances; that is, the rates

22 were typically liberal even for the positive pairings cases (c/c'), and in one instance exceeded 18%. Robust Estimation. Utilizing trimmed means and Winsorized variances with the MCPs resulted in very similar Type I error rates when the data were normally distributed (see Table ). That is, once again only the WELSCH MCP resulted in a liberal rate Insert Table About Here Compared to their least squares counterparts, all but one (WELSCH) of the MCPs provided good Type I error control when the data were obtained from the ; 3 distribution. The average Type I error rate for the 16 robust MCPs was 5.4%. That is, with the noted exception, the rates of Type I error for the trimmed means MCPs were quite close in value to the.05 significance level. The results for the MCPs based on trimmed means and Winsorized variances were quite different from the results when least squares estimators were used when sampling from the g œ 1/h œ 0 distribution. Specifically, when using robust estimators there were many MCPs which were insensitive to the combined effects of variance heterogeneity and nonnormality. Specifically, the q/peritz, SRB, HOCHBERG, W/q, BOX/q, BOX/q(D), J/q, and AG/q had rates of error which never exceeded Bradley's (1978) upper bound. On the other hand, the WELSCH, q(d), W/q(D), W/*, BOX/*, J/q(D), J/*, AG/q(D), and AG/* MCPs had liberal rates of error for the complete null hypothesis. The average rate of Type I error for the robust MCPs was 5.00%. J œ 6 For J œ 6, the rates of Type I error for MCPs based on least squares estimators are contained in Table 3 while the rates for MCPs based on robust estimators are contained

23 3 in Table 4. The SRB and HOCHBERG values were equivalent across the seven combinations of group sizes/variances equality/inequality investigated and hence have been combined into one row value, designated as SRB/HOCHBERG Insert Tables 3 and 4 About Here Least Squares Estimation. The pattern of results for normally distributed data was basically equivalent to that found with the J œ 3 data, although rates for J œ 6 were generally smaller in magnitude. Nonetheless, the WELSCH MCP, still was liberal for the partial null hypothesis. When J œ 6, the combined effects of nonormality (; 3 ) and variance heterogeneity resulted in many MCPs having rates of error above Bradley's (1978) upper limit of 7.50%. The largest complete null empirical rates occurred in condition d (unequal n ; unequal 5 [negatively paired/pattern 1 variances]), attaining values frequently above 8.00%. However, more MCPs had rates within the interval than was the case when J œ 3. Specifically, for J œ 6, the PERITZ, q, W/q, BOX/q, BOX/q(D), BOX/SRB, BOX/HAYTER, J/q, AG/q, AG/q(D), and AG/SRB MCPs had rates which never exceeded Bradley's upper interval value over the seven combinations of group sizes/variances equality/inequality investigated for the complete and partial null hypotheses cases; the BOX MCPs, however; resulted in conservative minimum values. The rates obtained when sampling from the g œ 1/h œ 0 distribution were similar to their J œ 3 counterparts. That is, when sampling from this skewed distribution, all of the MCPs using least squares estimators resulted in nonrobust rates of Type I error. The liberal values for the complete null hypothesis ranged from approximately 8% to 1%, w occurring predominately in conditions b, c, d and d ; the liberal partial null rates fluctuated around 8% and were always obtained from condition d w.

24 4 Robust Estimation. The MCPs based on trimmed means and Winsorized variances resulted in rates of Type I error that were typically similar, though smaller, than their J œ 3 counterparts (See Table 4). The following similarities between the J œ 3 and J œ 6 rates should be noted: (a) the WELSCH test was liberal across all investigated distributions; and (b) for ; $ # data, all procedures, except WELSCH, resulted in nonliberal values. However, differences between the J œ 3 and J œ 6 rates did exist. When J œ 6, the q(d), W/q(D), BOX/SRB, BOX/HAYTER, J/q(D), J/SRB, AG/q(D), and AG/SRB did not result in liberal rates of error when data were obtained from the g œ 1/h œ 0 distribution. Type I Error Results Summary Results from Tables 1-4 indicate that the shape of the parent population very much affected whether a particular MCP adequately controlled its Type I error rate. When the data was normally distributed, MCPs that were based on the usual least squares estimates of treatment means and variances generally performed well under conditions of variance heterogeneity. For skewed data, however, the number of robust MCPs based on the least squares estimators declined dramatically. For ; 3 data, which exhibits only a moderate degree of skewness, only one MCP (q/peritz) maintained its rate below Bradley's (1978) upper bound of 7.50% when J œ 3, while when J œ 6, eleven MCPs (PERITZ, q, W/q, BOX/q, BOX/q(D), BOX/SRB, BOX/HAYTER, J/q, AG/q, AG/q(D), AG/SRB) provided acceptable Type I error protection. When the data were g œ 1/h œ 0 distributed and thus more highly skewed, all MCPs resulted in liberal rates of Type I error for both the J œ 3 and J œ 6 data. It is also important to note that MCPs based on least squares estimators could not consistently control their Type I error rates within the Bradley (1978) interval across the w investigated conditions when group sizes were equal (i.e., combinations a, b, and b ). Thus, even though designs are typically unbalanced in behavioral science research

25 5 according to a recent survey (Lix, Cribbie & Keselman, 1997), and therefore our findings apply to the maority of one-way investigations, researchers working with balanced designs should still be wary about adopting MCPs based on least squares estimators. On the other hand, MCPs that used trimmed means and Winsorized variances provided much better Type I error control when distributions were skewed and variances were heterogeneous. That is, for the J œ 3 data that was distributed as ; 3, 16 MCPs provided acceptable Type I error protection while the number for g œ 1/h œ 0 data were 8. For the J œ 6 design, the corresponding values were 0 and 16, respectively. Power Rates The stepwise MCPs which did not limit the rate of Type I error below Bradley's (1978) upper value of 7.50% within each examined distribution were not included in the power comparison phase of the study. On the other hand, conservative procedures were included in this phase of the study since such conservativeness may not have substantial effects on relative power comparisons. Since experimenters are not likely to know the population state of affairs with regard to effect size, mean configuration, and pairing of group sizes and variances, deriving recommendations based upon specific combinations of these factors is of little use. Accordingly, the tabled power rates (per-pair, all-pairs) have been averaged over these factors. In order to evaluate the magnitude of differences in these power values, the guidelines offered by Einot and Gabriel (1975) were used. That is, these authors have suggested that power differences greater than 0% be considered substantial while those less than 10% be regarded as negligible. J œ 3 Per-Pair Power. Table 5 contains the power rates (%) for the MCPs which were able to limit their Type I error rates below 7.50%. Most apparent from Table 5 is that

26 6 within each investigated distribution, there were negligible differences among the MCPs with regard to their sensitivity for detecting pairwise differences between the groups. The difference between the most and least sensitive MCP for each distribution investigated was 7 percentage points, (Normal-Least Squares (LS)), 8 percentage points (Normal-Robust Estimators (RE)), 4 percentage points ( ; 3 ), and 5 percentage points (g œ 1/h œ 0). Though the differences between the MCPs was negligible, it is worth noting that the q/peritz MCP was always the least powerful procedure (or one among many least powerful MCPs) while the W/q(D), W/*, J/q(D), J/*, AG/q(D), and AG/* MCPs were always the most powerful, with the always robust SRB MCP always close in value to the two-stage procedures. One additional finding is worth noting regarding the results presented in Table 5. That is, the power differences between the MCPs based on least squares estimators and those based on trimmed estimators for the normal distribution were negligible; differences ranged from 4 percentage points to 6 percentage points, with an average difference of 4.7 percentage points. Finally, as the note to Table 5 indicates, when data were ; 3 distributed, the q/peritz MCP based on least squares estimators, which was robust to the moderate degree of nonnormality which characterized this distribution, had a power value of 51%; this was substantially larger than the rates that were obtained with any of the MCPs based on trimmed means and Winsorized variances Insert Table 5 About Here All-Pairs Power. The power differences between the MCPs, not surprisingly, was considerably less when power was defined as the probability to detect all-true pairwise differences (see Table 5). The difference between the most and least sensitive MCP, within each distribution investigated, equalled 4 percentage points, (Normal-LS), 4

27 percentage points (Normal-RE), percentage points ( ; 3 Tests of Trimmed Means 7 ), and 3 percentage points (g œ 1/h œ 0). Once again, the differences between the MCPs based on trimmed estimators and those based on least squares estimators were small (3-4 percentage points) when the data were normally distributed. Lastly, substantially more power was obtained with the q/peritz procedure (30%) as compared to the MCPs using trimmed means and Winsorized variances when data were J œ 6 ; 3 - distributed. Per-Pair Power. Table 6 contains the per-pair J œ 6 power rates. The power rates for the MCPs based on least squares estimators were quite similar when the data were normally distributed. The difference between the most (q/(d), W/q(D), BOX/q(D), J/q(D), AG/q(D)) and least (q, BOX/q) powerful procedures was 7 percentage points. When the MCPs were based on robust estimators, the range (W-J-AG/q(D) q) was 11 percentage points. The power values of the MCPs based on robust estimators were also smaller in size then their least squares counterparts, though not too dissimilar. In particular, the power differences between the least squares and robust MCPs ranged from 9 to 5 percentage points with an average difference of 6.7 percentage points. When the simulated data were obtained from the ; 3 -distribution, a similar pattern was identified. That is, there was very little variability between the rates for the MCPs based on least squares estimators and greater variability when the MCPs were based on trimmed means and Winsorized variances. Furthermore, the MCPs based on least squares estimators were also substantially more powerful than their counterparts based on trimmed means and Winsorized variances when the data were sampled from the ; 3 -distribution. That is, power values for the MCPs employing least squares estimators were larger than those for the trimmed procedures by as much as 58 percentage points. When sampling from the g œ 1/h œ 0 distribution power comparisons between the least squares and robust estimators were not performed since none of the least squares MCPs controlled their Type I error rate.

28 Insert Table 6 About Here All-Pairs Power. The all-pairs power rates for the J œ 6 design are contained in Table 6. As was reported for the J œ 3 data, the differences among the rates were generally small. However, the differences between the most and least powerful procedures was approximately 7.5 percentage points. That is, the range for the five conditions reported in Table 6 was 8, 6, 13, 4, and 6 percentage points, respectively. In all of the tabled cases, the PERITZ MCP was the most powerful procedure while a version of the HAYTER MCP was least powerful. For normally- and ; 3 -distributed data, MCPs based on the usual least squares estimators were more powerful than their counterparts based on trimmed means and Winsorized variances. For normally distributed data, the power advantage was small; however, for ; 3 -distributed data the differences were quite substantial. For example, the most powerful least squares MCP (PERITZ) was 655% more powerful than its corresponding MCP based on trimmed means and Winsorized variances. When the data were more highly skewed (g œ 1/h œ 0), the MCPs based on trimmed means and Winsorized variances could not be compared to their least squares counterparts since the latter again failed to control their Type I error rates. Discussion The results from our investigation clearly indicate that researchers should become very familiar with their data if they are to make the best decisions regarding the use of stepwise MCPs for pairwise comparisons of treatment groups in one-way completely randomized designs. Specifically, researchers should attempt to know both the shape and degree/pattern of variability of the populations that describe their treatment groups. Such information could determine the pairwise multiple comparison strategy to adopt. That is, when data appear to be normally distributed, researchers may use stepwise MCPs that

29 9 rely on the usual least squares estimators of treatment group means and variances in order to test hypotheses regarding pairwise equality. These tests will not only control the rate of Type I error but as well will be more powerful than their counterparts based on trimmed means and Winsorized variances. Furthermore, our data indicate that by knowing the degree of skewness present in the population(s), researchers may select a MCP that will provide substantially more power instead of relying on previous recommendations that advocate the use of MCPs with trimmed means and Winsorized variances when data are nonnormal. Specifically, our data indicate that for ; 3 type skewed data (i.e., # 1 œ 1.64), MCPs (i.e., q/peritz) based on least squares estimators of group means and variances will provide substantially more power to detect pairwise differences than any of the MCPs based on trimmed means and Winsorized variances. Indeed, our results indicate about a 100% and 700% increase in per-pair power and allpairs power, respectively, compared to the MCPs based on trimmed estimators. Finally, by knowing that populations are characterized by distributions with substantial degrees of skewness (e.g., # 1 ) researchers would avoid using MCPs based on least squares estimators and instead would use MCPs based on trimmed means and Winsorized variances thereby avoiding spurious reections. The reader should remember, however, that the tests based on trimmed means and Winsorized variances do not generally test the same hypothesis as those that are based on the usual least squares estimates of central tendency and variability. However, though these statistics test a null hypothesis which stipulates that the population trimmed means are equal we believe this is a reasonable hypothesis to examine since trimmed means, as oppposed to the usual least squares means, provide better estimates of the typical individual in distributions that contain outliers or are skewed in shape (see Wilcox, 1995a, 1995b). Thus, information regarding the shape of the treatment populations will lead researchers to adopt MCPs with either the usual least squares estimators or those based on trimmed means and Winsorized variances. Not withstanding the preceding caveat, we

30 30 averaged the power rates over the conditions presented in Tables 5 and 6 and rank ordered these average power values in order to offer a simplified recommendation regarding the best" MCP. Table 7 enumerates the rank ordering of the MCPs for each definition of power for each value of J investigated. It is important to note that only MCPs that controlled their Type I error rate across all conditions investigated were rank ordered. For each definition of power investigated, MCPs that appeared in both conditions of J are demarcated with shaded entries in the table. An examination of these shaded entries indicates that a number of MCPs were consistently ranked across the four conditions enumerated in this table. Our recommendation of best" is based on consistency of results from Table 7. Accordingly, we recommend that researchers adopt any one of the Shaffer (1979) modified range procedures. In particular, these modified range procedures could be based either on the omnibus Welch (1951), James (1951), Box (1954), or Alexander and Govern (1994) tests. The modified range procedure beginning with the Box (1954) omnibus test, however, should be based on the Duncan (1957) version of the range procedure. Depending on what the researcher knows about the population shape, the MCP should either use least squares or robust estimates of the group means and variances Insert Table 7 About Here In summary, MCPs based on trimmed means and Winsorized variances may be preferable to the usual MCPs based on least squares estimators under certain conditions. Our results and those reported in the literature (e.g., Wilcox, in press), clearly indicate that this method of testing can not, however, be uniformly recommended. First, the findings reported by Wilcox indicate that MCPs based on trimmed means and Winsorized variances are prone to inflated rates of Type I error in light tailed distributions (e.g., uniform, exponential) unless sample sizes are even larger than those

Multiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale

Multiple Comparison Procedures, Trimmed Means and Transformed Statistics. Rhonda K. Kowalchuk Southern Illinois University Carbondale Multiple Comparison Procedures 1 Multiple Comparison Procedures, Trimmed Means and Transformed Statistics Rhonda K. Kowalchuk Southern Illinois University Carbondale H. J. Keselman University of Manitoba

More information

TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK?

TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK? TESTS FOR MEAN EQUALITY THAT DO NOT REQUIRE HOMOGENEITY OF VARIANCES: DO THEY REALLY WORK? H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los

More information

TO TRIM OR NOT TO TRIM: TESTS OF LOCATION EQUALITY UNDER HETEROSCEDASTICITY AND NONNORMALITY. Lisa M. Lix and H.J. Keselman. University of Manitoba

TO TRIM OR NOT TO TRIM: TESTS OF LOCATION EQUALITY UNDER HETEROSCEDASTICITY AND NONNORMALITY. Lisa M. Lix and H.J. Keselman. University of Manitoba 1 TO TRIM OR NOT TO TRIM: TESTS OF LOCATION EQUALITY UNDER HETEROSCEDASTICITY AND NONNORMALITY Lisa M. Lix and H.J. Keselman University of Manitoba Correspondence concerning this manuscript should be sent

More information

THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED

THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED THE 'IMPROVED' BROWN AND FORSYTHE TEST FOR MEAN EQUALITY: SOME THINGS CAN'T BE FIXED H. J. Keselman Rand R. Wilcox University of Manitoba University of Southern California Winnipeg, Manitoba Los Angeles,

More information

Conventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates

Conventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates Journal of Modern Applied Statistical Methods Volume Issue Article --3 Conventional And And Independent-Samples t Tests: Type I Error And Power Rates Katherine Fradette University of Manitoba, umfradet@cc.umanitoba.ca

More information

Comparing Measures of the Typical Score Across Treatment Groups. Katherine Fradette. University of Manitoba. Abdul R. Othman

Comparing Measures of the Typical Score Across Treatment Groups. Katherine Fradette. University of Manitoba. Abdul R. Othman Robust Testing Comparing Measures of the Typical Score Across Treatment Groups by Katherine Fradette University of Manitoba Abdul R. Othman Universiti Sains Malaysia H. J. Keselman University of Manitoba

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba 1 Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects by H.J. Keselman University of Manitoba James Algina University of Florida and Rhonda K. Kowalchuk University of Manitoba

More information

Robust Means Modeling vs Traditional Robust Tests 1

Robust Means Modeling vs Traditional Robust Tests 1 Robust Means Modeling vs Traditional Robust Tests 1 Comparing Means under Heteroscedasticity and Nonnormality: Further Exploring Robust Means Modeling Alyssa Counsell Department of Psychology Ryerson University

More information

Trimming, Transforming Statistics, And Bootstrapping: Circumventing the Biasing Effects Of Heterescedasticity And Nonnormality

Trimming, Transforming Statistics, And Bootstrapping: Circumventing the Biasing Effects Of Heterescedasticity And Nonnormality Journal of Modern Applied Statistical Methods Volume Issue Article 38 --00 Trimming, Transforming Statistics, And Bootstrapping: Circumventing the Biasing Effects Of Heterescedasticity And Nonnormality

More information

An Examination of the Robustness of the Empirical Bayes and Other Approaches. for Testing Main and Interaction Effects in Repeated Measures Designs

An Examination of the Robustness of the Empirical Bayes and Other Approaches. for Testing Main and Interaction Effects in Repeated Measures Designs Empirical Bayes 1 An Examination of the Robustness of the Empirical Bayes and Other Approaches for Testing Main and Interaction Effects in Repeated Measures Designs by H.J. Keselman, Rhonda K. Kowalchuk

More information

Multiple Comparison Methods for Means

Multiple Comparison Methods for Means SIAM REVIEW Vol. 44, No. 2, pp. 259 278 c 2002 Society for Industrial and Applied Mathematics Multiple Comparison Methods for Means John A. Rafter Martha L. Abell James P. Braselton Abstract. Multiple

More information

A Generally Robust Approach To Hypothesis Testing in Independent and Correlated Groups Designs. H. J. Keselman. University of Manitoba. Rand R.

A Generally Robust Approach To Hypothesis Testing in Independent and Correlated Groups Designs. H. J. Keselman. University of Manitoba. Rand R. Robust Estimation and Testing 1 A Generally Robust Approach To Hypothesis Testing in Independent and Correlated Groups Designs by H. J. Keselman University of Manitoba Rand R. Wilcox University of Southern

More information

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600 Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without

More information

A Test of Symmetry. Abdul R. Othman. Universiti Sains Malaysia. H. J. Keselman. University of Manitoba. Rand R. Wilcox

A Test of Symmetry. Abdul R. Othman. Universiti Sains Malaysia. H. J. Keselman. University of Manitoba. Rand R. Wilcox Symmetry A Test of Symmetry by Abdul R. Othman Universiti Sains Malaysia H. J. Keselman University of Manitoba Rand R. Wilcox University of Southern California Katherine Fradette University of Manitoba

More information

A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements. H.J. Keselman University of Manitoba

A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements. H.J. Keselman University of Manitoba 1 A Comparison of Two Approaches For Selecting Covariance Structures in The Analysis of Repeated Measurements by H.J. Keselman University of Manitoba James Algina University of Florida Rhonda K. Kowalchuk

More information

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions Journal of Modern Applied Statistical Methods Volume 12 Issue 1 Article 7 5-1-2013 A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions William T. Mickelson

More information

Trimming, Transforming Statistics, and Bootstrapping: Circumventing the Biasing Effects of Heterescedasticity and Nonnormality. H. J.

Trimming, Transforming Statistics, and Bootstrapping: Circumventing the Biasing Effects of Heterescedasticity and Nonnormality. H. J. Robust Testing Trimming, Transforming Statistics, and Bootstrapping: Circumventing the Biasing Effects of Heterescedasticity and Nonnormality by H. J. Keselman University of Manitoba Rand R. Wilcox University

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs) The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or

More information

THE ANALYSIS OF REPEATED MEASUREMENTS: A COMPARISON OF MIXED-MODEL SATTERTHWAITE F TESTS AND A NONPOOLED ADJUSTED DEGREES OF FREEDOM MULTIVARIATE TEST

THE ANALYSIS OF REPEATED MEASUREMENTS: A COMPARISON OF MIXED-MODEL SATTERTHWAITE F TESTS AND A NONPOOLED ADJUSTED DEGREES OF FREEDOM MULTIVARIATE TEST THE ANALYSIS OF REPEATED MEASUREMENTS: A COMPARISON OF MIXED-MODEL SATTERTHWAITE F TESTS AND A NONPOOLED ADJUSTED DEGREES OF FREEDOM MULTIVARIATE TEST H. J. Keselman James Algina University of Manitoba

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test

Comparing the performance of modified F t statistic with ANOVA and Kruskal Wallis test Appl. Math. Inf. Sci. 7, No. 2L, 403-408 (2013) 403 Applied Mathematics & Information Sciences An International ournal http://dx.doi.org/10.12785/amis/072l04 Comparing the performance of modified F t statistic

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

More information

Presented to the Graduate Council of the. North Texas State University. in Partial. Fulfillment of the Requirements. For the Degree of.

Presented to the Graduate Council of the. North Texas State University. in Partial. Fulfillment of the Requirements. For the Degree of. AN EMPIRICAL INVESTIGATION OF TUKEY'S HONESTLY SIGNIFICANT DIFFERENCE TEST WITH VARIANCE HETEROGENEITY AND UNEQUAL SAMPLE SIZES, UTILIZING KRAMER'S PROCEDURE AND THE HARMONIC MEAN DISSERTATION Presented

More information

Testing For Aptitude-Treatment Interactions In Analysis Of Covariance And Randomized Block Designs Under Assumption Violations

Testing For Aptitude-Treatment Interactions In Analysis Of Covariance And Randomized Block Designs Under Assumption Violations Journal of Modern Applied Statistical Methods Volume 4 Issue 2 Article 11 11-1-2005 Testing For Aptitude-Treatment Interactions In Analysis Of Covariance And Randomized Block Designs Under Assumption Violations

More information

GOTEBORG UNIVERSITY. Department of Statistics

GOTEBORG UNIVERSITY. Department of Statistics GOTEBORG UNIVERSITY Department of Statistics RESEARCH REPORT 1994:5 ISSN 0349-8034 COMPARING POWER AND MULTIPLE SIGNIFICANCE LEVEL FOR STEP UP AND STEP DOWN MULTIPLE TEST PROCEDURES FOR CORRELATED ESTIMATES

More information

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Testing homogeneity of variances with unequal sample sizes

Testing homogeneity of variances with unequal sample sizes Comput Stat (2013) 28:1269 1297 DOI 10.1007/s00180-012-0353-x ORIGINAL PAPER Testing homogeneity of variances with unequal sample sizes I. Parra-Frutos Received: 28 February 2011 / Accepted: 14 July 2012

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

A posteriori multiple comparison tests

A posteriori multiple comparison tests A posteriori multiple comparison tests 11/15/16 1 Recall the Lakes experiment Source of variation SS DF MS F P Lakes 58.000 2 29.400 8.243 0.006 Error 42.800 12 3.567 Total 101.600 14 The ANOVA tells us

More information

Chapter 14: Repeated-measures designs

Chapter 14: Repeated-measures designs Chapter 14: Repeated-measures designs Oliver Twisted Please, Sir, can I have some more sphericity? The following article is adapted from: Field, A. P. (1998). A bluffer s guide to sphericity. Newsletter

More information

Assessing Normality: Applications in Multi-Group Designs

Assessing Normality: Applications in Multi-Group Designs Malaysian Journal of Mathematical Sciences 9(1): 53-65 (2015) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal 1* Abdul R. Othman, 2 H. J. Keselman and 3 Rand

More information

ABSTRACT. Between-Subjects Design under Variance. Heterogeneity and Nonnormality. Evaluation

ABSTRACT. Between-Subjects Design under Variance. Heterogeneity and Nonnormality. Evaluation ABSTRACT Title of dissertation: Robust Means Modeling: An Alternative to Hypothesis Testing Of Mean Equality in the Between-Subjects Design under Variance Heterogeneity and Nonnormality Weihua Fan, Doctor

More information

Lec 1: An Introduction to ANOVA

Lec 1: An Introduction to ANOVA Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to

More information

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions Introduction to Analysis of Variance 1 Experiments with More than 2 Conditions Often the research that psychologists perform has more conditions than just the control and experimental conditions You might

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

10/31/2012. One-Way ANOVA F-test

10/31/2012. One-Way ANOVA F-test PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 1. Situation/hypotheses 2. Test statistic 3.Distribution 4. Assumptions One-Way ANOVA F-test One factor J>2 independent samples

More information

COMPARING SEVERAL MEANS: ANOVA

COMPARING SEVERAL MEANS: ANOVA LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons

More information

Preliminary Testing for Normality: Is This a Good Practice?

Preliminary Testing for Normality: Is This a Good Practice? Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 2 11-1-2013 Preliminary Testing for Normality: Is This a Good Practice? H. J. Keselman University of Manitoba, Winnipeg, Manitoba,

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA

More information

The Analysis of Repeated Measures Designs: A Review. H.J. Keselman. University of Manitoba. James Algina. University of Florida.

The Analysis of Repeated Measures Designs: A Review. H.J. Keselman. University of Manitoba. James Algina. University of Florida. Repeated Measures Analyses 1 The Analysis of Repeated Measures Designs: A Review by H.J. Keselman University of Manitoba James Algina University of Florida and Rhonda K. Kowalchuk University of Manitoba

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5) STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

An Overview of the Performance of Four Alternatives to Hotelling's T Square

An Overview of the Performance of Four Alternatives to Hotelling's T Square fi~hjf~~ G 1992, m-t~, 11o-114 Educational Research Journal 1992, Vol.7, pp. 110-114 An Overview of the Performance of Four Alternatives to Hotelling's T Square LIN Wen-ying The Chinese University of Hong

More information

Introduction to the Analysis of Variance (ANOVA)

Introduction to the Analysis of Variance (ANOVA) Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more

More information

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons Explaining Psychological Statistics (2 nd Ed.) by Barry H. Cohen Chapter 13 Section D F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons In section B of this chapter,

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

PLSC PRACTICE TEST ONE

PLSC PRACTICE TEST ONE PLSC 724 - PRACTICE TEST ONE 1. Discuss briefly the relationship between the shape of the normal curve and the variance. 2. What is the relationship between a statistic and a parameter? 3. How is the α

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Robustness. James H. Steiger. Department of Psychology and Human Development Vanderbilt University. James H. Steiger (Vanderbilt University) 1 / 37

Robustness. James H. Steiger. Department of Psychology and Human Development Vanderbilt University. James H. Steiger (Vanderbilt University) 1 / 37 Robustness James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 37 Robustness 1 Introduction 2 Robust Parameters and Robust

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions

Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions Power Comparison of Exact Unconditional Tests for Comparing Two Binomial Proportions Roger L. Berger Department of Statistics North Carolina State University Raleigh, NC 27695-8203 June 29, 1994 Institute

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth

One-way Analysis of Variance. Major Points. T-test. Ψ320 Ainsworth One-way Analysis of Variance Ψ30 Ainsworth Major Points Problem with t-tests and multiple groups The logic behind ANOVA Calculations Multiple comparisons Assumptions of analysis of variance Effect Size

More information

Introduction to Analysis of Variance (ANOVA) Part 2

Introduction to Analysis of Variance (ANOVA) Part 2 Introduction to Analysis of Variance (ANOVA) Part 2 Single factor Serpulid recruitment and biofilms Effect of biofilm type on number of recruiting serpulid worms in Port Phillip Bay Response variable:

More information

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs) The One-Way Repeated-Measures ANOVA (For Within-Subjects Designs) Logic of the Repeated-Measures ANOVA The repeated-measures ANOVA extends the analysis of variance to research situations using repeated-measures

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions

8/23/2018. One-Way ANOVA F-test. 1. Situation/hypotheses. 2. Test statistic. 3.Distribution. 4. Assumptions PSY 5101: Advanced Statistics for Psychological and Behavioral Research 1 1. Situation/hypotheses 2. Test statistic One-Way ANOVA F-test One factor J>2 independent samples H o :µ 1 µ 2 µ J F 3.Distribution

More information

B. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1)

B. Weaver (18-Oct-2006) MC Procedures Chapter 1: Multiple Comparison Procedures ) C (1.1) B. Weaver (18-Oct-2006) MC Procedures... 1 Chapter 1: Multiple Comparison Procedures 1.1 Introduction The omnibus F-test in a one-way ANOVA is a test of the null hypothesis that the population means of

More information

Aligned Rank Tests As Robust Alternatives For Testing Interactions In Multiple Group Repeated Measures Designs With Heterogeneous Covariances

Aligned Rank Tests As Robust Alternatives For Testing Interactions In Multiple Group Repeated Measures Designs With Heterogeneous Covariances Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 17 11-1-2004 Aligned Rank Tests As Robust Alternatives For Testing Interactions In Multiple Group Repeated Measures Designs With Heterogeneous

More information

Multiple Comparisons

Multiple Comparisons Multiple Comparisons Error Rates, A Priori Tests, and Post-Hoc Tests Multiple Comparisons: A Rationale Multiple comparison tests function to tease apart differences between the groups within our IV when

More information

Contrasts and Multiple Comparisons Supplement for Pages

Contrasts and Multiple Comparisons Supplement for Pages Contrasts and Multiple Comparisons Supplement for Pages 302-323 Brian Habing University of South Carolina Last Updated: July 20, 2001 The F-test from the ANOVA table allows us to test the null hypothesis

More information

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1 Linear Combinations Comparison of treatment means Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 6 1 Linear Combinations of Means y ij = µ + τ i + ǫ ij = µ i + ǫ ij Often study

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Type I Error Rates of the Kenward-Roger Adjusted Degree of Freedom F-test for a Split-Plot Design with Missing Values

Type I Error Rates of the Kenward-Roger Adjusted Degree of Freedom F-test for a Split-Plot Design with Missing Values Journal of Modern Applied Statistical Methods Volume 6 Issue 1 Article 8 5-1-2007 Type I Error Rates of the Kenward-Roger Adjusted Degree of Freedom F-test for a Split-Plot Design with Missing Values Miguel

More information

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC

AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC Journal of Applied Statistical Science ISSN 1067-5817 Volume 14, Number 3/4, pp. 225-235 2005 Nova Science Publishers, Inc. AN IMPROVEMENT TO THE ALIGNED RANK STATISTIC FOR TWO-FACTOR ANALYSIS OF VARIANCE

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Introduction. Chapter 8

Introduction. Chapter 8 Chapter 8 Introduction In general, a researcher wants to compare one treatment against another. The analysis of variance (ANOVA) is a general test for comparing treatment means. When the null hypothesis

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

1 One-way Analysis of Variance

1 One-way Analysis of Variance 1 One-way Analysis of Variance Suppose that a random sample of q individuals receives treatment T i, i = 1,,... p. Let Y ij be the response from the jth individual to be treated with the ith treatment

More information

9 One-Way Analysis of Variance

9 One-Way Analysis of Variance 9 One-Way Analysis of Variance SW Chapter 11 - all sections except 6. The one-way analysis of variance (ANOVA) is a generalization of the two sample t test to k 2 groups. Assume that the populations of

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University

Increasing Power in Paired-Samples Designs. by Correcting the Student t Statistic for Correlation. Donald W. Zimmerman. Carleton University Power in Paired-Samples Designs Running head: POWER IN PAIRED-SAMPLES DESIGNS Increasing Power in Paired-Samples Designs by Correcting the Student t Statistic for Correlation Donald W. Zimmerman Carleton

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

The Components of a Statistical Hypothesis Testing Problem

The Components of a Statistical Hypothesis Testing Problem Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one

More information

A NEW ALTERNATIVE IN TESTING FOR HOMOGENEITY OF VARIANCES

A NEW ALTERNATIVE IN TESTING FOR HOMOGENEITY OF VARIANCES Journal of Statistical Research 0, Vol. 40, No. 2, pp. 5-3 Bangladesh ISSN 025-422 X A NEW ALTERNATIVE IN TESTING FOR HOMOGENEITY OF VARIANCES Mehmet MENDEŞ Departmanet of Animal Science, University of

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

The analysis of repeated measures designs: A review

The analysis of repeated measures designs: A review British Journal of Mathematical and Statistical Psychology (2001), 54, 1±20 2001 The British Psychological Society Printed in Great Britain 1 The analysis of repeated measures designs: A review H. J. Keselman*

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study A comparison of efficient permutation tests for unbalanced ANOVA in two by two designs and their behavior under heteroscedasticity arxiv:1309.7781v1 [stat.me] 30 Sep 2013 Sonja Hahn Department of Psychology,

More information

Orthogonal, Planned and Unplanned Comparisons

Orthogonal, Planned and Unplanned Comparisons This is a chapter excerpt from Guilford Publications. Data Analysis for Experimental Design, by Richard Gonzalez Copyright 2008. 8 Orthogonal, Planned and Unplanned Comparisons 8.1 Introduction In this

More information

One-way between-subjects ANOVA. Comparing three or more independent means

One-way between-subjects ANOVA. Comparing three or more independent means One-way between-subjects ANOVA Comparing three or more independent means Data files SpiderBG.sav Attractiveness.sav Homework: sourcesofself-esteem.sav ANOVA: A Framework Understand the basic principles

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Multiple comparisons - subsequent inferences for two-way ANOVA

Multiple comparisons - subsequent inferences for two-way ANOVA 1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of

More information

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600

One-Way ANOVA Cohen Chapter 12 EDUC/PSY 6600 One-Way ANOVA Cohen Chapter 1 EDUC/PSY 6600 1 It is easy to lie with statistics. It is hard to tell the truth without statistics. -Andrejs Dunkels Motivating examples Dr. Vito randomly assigns 30 individuals

More information

Comparison of nonparametric analysis of variance methods a Monte Carlo study Part A: Between subjects designs - A Vote for van der Waerden

Comparison of nonparametric analysis of variance methods a Monte Carlo study Part A: Between subjects designs - A Vote for van der Waerden Comparison of nonparametric analysis of variance methods a Monte Carlo study Part A: Between subjects designs - A Vote for van der Waerden Version 5 completely revised and extended (13.7.2017) Haiko Lüpsen

More information

http://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences

More information

Examining Multiple Comparison Procedures According to Error Rate, Power Type and False Discovery Rate

Examining Multiple Comparison Procedures According to Error Rate, Power Type and False Discovery Rate Journal of Modern Applied Statistical Methods Volume 11 Issue 2 Article 7 11-1-2012 Examining Multiple Comparison Procedures According to Error Rate, Power Type and False Discovery Rate Guven Ozkaya Uludag

More information

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

PSYC 331 STATISTICS FOR PSYCHOLOGISTS PSYC 331 STATISTICS FOR PSYCHOLOGISTS Session 4 A PARAMETRIC STATISTICAL TEST FOR MORE THAN TWO POPULATIONS Lecturer: Dr. Paul Narh Doku, Dept of Psychology, UG Contact Information: pndoku@ug.edu.gh College

More information

DISTRIBUTIONS USED IN STATISTICAL WORK

DISTRIBUTIONS USED IN STATISTICAL WORK DISTRIBUTIONS USED IN STATISTICAL WORK In one of the classic introductory statistics books used in Education and Psychology (Glass and Stanley, 1970, Prentice-Hall) there was an excellent chapter on different

More information