A nonparametric two-sample wald test of equality of variances

Similar documents
Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Empirical Power of Four Statistical Tests in One Way Layout

Testing equality of variances for multiple univariate normal populations

Research Article Tests of Fit for the Logarithmic Distribution

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Nonparametric analysis of blocked ordered categories data: some examples revisited

NAG Library Chapter Introduction. G08 Nonparametric Statistics

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Research Article Statistical Tests for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

A Review of the Behrens-Fisher Problem and Some of Its Analogs: Does the Same Size Fit All?

Simulated Sample Behaviour of a Dissimilarity Index When Sampling from Populations Differing by a Location Parameter Only

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE

Research Article Improved Estimators of the Mean of a Normal Distribution with a Known Coefficient of Variation

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

arxiv:math/ v1 [math.pr] 9 Sep 2003

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

Inferences About the Difference Between Two Means

1 One-way Analysis of Variance

Testing homogeneity of variances with unequal sample sizes

Construction of Combined Charts Based on Combining Functions

On Selecting Tests for Equality of Two Normal Mean Vectors

POWER AND TYPE I ERROR RATE COMPARISON OF MULTIVARIATE ANALYSIS OF VARIANCE

Review. December 4 th, Review

Statistical Procedures for Testing Homogeneity of Water Quality Parameters

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

Published online: 17 May 2012.

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Sample size determination for logistic regression: A simulation study

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

NAG Library Chapter Introduction. g08 Nonparametric Statistics

Confidence Intervals for the Process Capability Index C p Based on Confidence Intervals for Variance under Non-Normality

Finite-sample quantiles of the Jarque-Bera test

Frequency Distribution Cross-Tabulation

Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function

Bootstrap Procedures for Testing Homogeneity Hypotheses

HANDBOOK OF APPLICABLE MATHEMATICS

11. Bootstrap Methods

3 Joint Distributions 71

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT

Lec 3: Model Adequacy Checking

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Specification Tests for Families of Discrete Distributions with Applications to Insurance Claims Data

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Stat 427/527: Advanced Data Analysis I

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Maximum-Likelihood Estimation: Basic Ideas

A Note on Cochran Test for Homogeneity in Two Ways ANOVA and Meta-Analysis

Simulating Uniform- and Triangular- Based Double Power Method Distributions

A NEW ALTERNATIVE IN TESTING FOR HOMOGENEITY OF VARIANCES

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

Central Limit Theorem ( 5.3)

Contextual Effects in Modeling for Small Domains

Does k-th Moment Exist?

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Confidence intervals for kernel density estimation

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Characterizing Log-Logistic (L L ) Distributions through Methods of Percentiles and L-Moments

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

IENG581 Design and Analysis of Experiments INTRODUCTION

Compare several group means: ANOVA

Testing Statistical Hypotheses

Discovery significance with statistical uncertainty in the background estimate

Inferential Statistics

Research Article Inference for the Sharpe Ratio Using a Likelihood-Based Approach

The Nonparametric Bootstrap

A Signed-Rank Test Based on the Score Function

GARCH Models Estimation and Inference

BOOTSTRAP METHODS FOR TESTING HOMOGENEITY OF VARIANCES. Dennis D. Boos and Cavell Brownie. 1&/'0 Institute of Statistics Mimeo Series No.

Lecture 5: Hypothesis tests for more than one sample

Theory and Methods of Statistical Inference

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

Contents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

The Robustness of the Multivariate EWMA Control Chart

Confidence Intervals, Testing and ANOVA Summary

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

BTRY 4090: Spring 2009 Theory of Statistics

NONPARAMETRIC TESTS. LALMOHAN BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-12

Introduction to General and Generalized Linear Models

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

A process capability index for discrete processes

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Precision of maximum likelihood estimation in adaptive designs

Two-Sample Inferential Statistics

Recall the Basics of Hypothesis Testing

Testing for Regime Switching in Singaporean Business Cycles

Chapter 1 Statistical Inference

6 Single Sample Methods for a Location Parameter

Lectures on Statistics. William G. Faris

Transcription:

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David Allingham University of Newcastle J. C. W Rayner University of Wollongong, reklaw@uow.edu.au Publication Details Allingham, D. & Rayner, J. (211). A nonparametric two-sample wald test of equality of variances. Advances in Decision Sciences, 211 (N/A), 1-8. Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: research-pubs@uow.edu.au

A nonparametric two-sample wald test of equality of variances Abstract We develop a test for equality of variances given two independent random samples of observations. The test can be expected to perform well when both sample sizes are at least moderate and the sample variances are asymptotically equivalent to the maximum likelihood estimators of the population variances. The test is motivated by and is here assessed for the case when both populations sampled are assumed to be normal. Popular choices of test would be the twosample F test if normality can be assumed and Levene s test if this assumption is dubious. Another competitor is thewald test for the difference in the population variances.we give a nonparametric analogue of this test and call it the R test. In an indicative empirical study when both populations are normal, we find that when both sample sizes are at least 25 the R test is nearly as robust as Levene s test and nearly as powerful as the F test. Keywords wald, test, nonparametric, sample, equality, two, variances Disciplines Physical Sciences and Mathematics Publication Details Allingham, D. & Rayner, J. (211). A nonparametric two-sample wald test of equality of variances. Advances in Decision Sciences, 211 (N/A), 1-8. This journal article is available at Research Online: http://ro.uow.edu.au/infopapers/1264

Hindawi Publishing Corporation Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner 2, 3 1 Centre for Computer-Assisted Research Computation and Its Applications, School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, NSW 238, Australia 2 Centre for Statistical and Survey Methodology, School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW 2522, Australia 3 School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, NSW 238, Australia Correspondence should be addressed to David Allingham, david.allingham@newcastle.edu.au Received 31 August 211; Accepted 17 December 211 Academic Editor: YanXia Lin Copyright q 211 D. Allingham and J. C. W. Rayner. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We develop a test for equality of variances given two independent random samples of observations. The test can be expected to perform well when both sample sizes are at least moderate and the sample variances are asymptotically equivalent to the maximum likelihood estimators of the population variances. The test is motivated by and is here assessed for the case when both populations sampled are assumed to be normal. Popular choices of test would be the twosample F test if normality can be assumed and Levene s test if this assumption is dubious. Another competitor is the Wald test for the difference in the population variances. We give a nonparametric analogue of this test and call it the R test. In an indicative empirical study when both populations are normal, we find that when both sample sizes are at least 25 the R test is nearly as robust as Levene s test and nearly as powerful as the F test. 1. Introduction: Testing Equality of Variances for Two Independent Samples In the two-sample problem, we are given two independent random samples X 11,...,X 1n1 and X 21,...,X 2n2. The location problem attracts most attention. Assuming that the samples are from normal populations, the pooled t-test is used to test equality of means assuming equal variances and Welch s test can be used when equality of variances is suspect but normality is not. When normality is in doubt, the Wilcoxon test is often used. The corresponding dispersion problem is of interest to confirm the validity of, for example, the pooled t-test, and when dispersion differences are of direct interest. For example, testing for reduced variability is of interest in confirming natural selection see, e.g., 1, Section 5.5 and if some processes are in control. In exploratory data analysis, it is

2 Advances in Decision Sciences sensible to assess if one population is more variable than another. If it is, the cause may be that one population is bimodal and the other is not; the consequences of this in both the scenario and the model can then be explored in depth. The study here introduces a new test of equality of variances. The asymptotic null distribution of the test statistic is χ 2 1, but, depending on the populations sampled from, this may or may not be satisfactory for small to moderate sample sizes. To assess this, and other aspects of the proposed test, an indicative empirical study is undertaken. We give comparisons when both populations sampled are normal, when both samples have the same sample size, and for 5% level tests. First, we derive a finite sample correction to the critical values based on the asymptotic null distribution when sampling from normal populations. Different corrections would be needed when sampling from different distributions. Next, we show that in moderate samples the new test is nearly as powerful as the F test when normality may be assumed and, finally, that it is nearly as robust as Levene s test when normality is in doubt. Moore and McCabe 2, page 519 claim that the F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice. The new test gives a counterexample to that proposition. We acknowledge that the new test is the most effective for at least moderate sample sizes. In the normal case, each random sample should have at least 25 observations, which is what we would expect of a serious study aiming at reasonable power that cannot be hoped for with samples of size 1 or so. See Section 4. We are aware of more expansive comparative studies such as 3, 4. Our goal here is not to emulate these studies but to show that the new test is competitive in terms of test size, robustness, and power. In Section 2, the new test is introduced. Sections 3, 4, and 5 give the results of an empirical investigation when the populations sampled are assumed to be normal. In Section 3, we investigate test size, showing that the asymptotic χ 2 critical values should only be used for moderate to large sample sizes. For smaller sample sizes, a finite sample correction to the asymptotic 5% χ 2 critical value is given. This results in actual test sizes between 4.6% and 5.3%. In Section 4, we show that when normality holds, the new test is not as powerful as the Levene test for smaller sample sizes but overtakes it for moderate samples of about 25. The new test is always inferior to the optimal F test. However, the R test has power that approaches that of the F test, being at least 95% that of the F test throughout most of the parameter space in samples of at least 8. In Section 5, we show that if we sample from t-distributions with varying degrees of freedom, the F test is highly nonrobust for small degrees of freedom, as is well known for fat-tailed distributions. When both populations sampled are gamma distributions, chosen to model skewness differences from normality, or t-distributions, chosen to model kurtosis differences, the new test performs far better than the F test, and is competitive with the Levene test. That the new test gives a good compromise between power and robustness and is valid when normality does not hold are strong reasons for preferring the new test for sample sizes that are at least moderate, and normality is dubious. 2. Competitor Tests for Equality of Variance Initially we assume that we have two independent random samples X i1,...,x ini from normal populations, N μ i,σ 2 i for i 1 and 2. We wish to test H : σ 2 1 σ 2 2 against the

Advances in Decision Sciences 3 alternative K : σ 2 1 / σ 2 2, with the population means being unknown nuisance parameters. If S 2 i n i j 1 X ij X i. 2 / n i 1, in which X i. n i j 1 X ij/n i, i 1, 2, then the S 2 i are the unbiased sample variances, and the so-called F test is based on their quotient, S 2 2 /S2 1 F, say. It is well known and will be confirmed yet again in Section 5 that the null distribution of F, F n1 1,n 2 1,is sensitive to departures from normality. If F n1 1,n 2 1 x is the cumulative distribution function of the F n1 1,n 2 1 distribution, and if c p is such that F n1 1,n 2 1 c p p, then the F test rejects H at the 1 α% level when F c α/2 and when F c 1 α/2. Common practice when normality is in doubt is to use Levene s test or a nonparametric test such as Mood s test. Levene s test is based on the ANOVA F test applied to the residuals. There are different versions of Levene s test using different definitions of residual. The two most common versions use residuals based on the group means, X ij X i.,andthe group medians, X ij X i., in which X i. is the median of the ith sample. The latter is called the Brown-Forsythe test. Again it is well known that these tests are robust in that when the population variances are equal but the populations themselves are not normal, they achieve levels close to nominal. However, this happens at the expense of some power. As the empirical study in this paper is intended to be indicative rather than exhaustive, we will henceforth make comparisons only with the Levene test, based on a test statistic we denote by L. We now construct a new test that we call the R test. For univariate parameters θ, a Wald test statistic of H : θ θ against the alternative K : θ / θ is based on θ, the maximum likelihood estimator of θ, usually via the test statistic θ θ 2 /est var θ, where est var θ is the asymptotic variance of θ evaluated when θ θ. Under the null hypothesis, this test statistic has an asymptotic χ 2 1 distribution. As well as being equivalent to the likelihood ratio test, the F test is also a Wald test for testing H : θ σ 2 2 /σ2 1 1 against K : θ / 1. Rayner 5 derived the Wald test for testing H : θ σ 2 2 σ2 1 against K : θ /. The test statistic is ( S 2 1 ) 2 S2 2 2S 4 1 / n 1 1 2S 4 2 / n 2 1 W, say. 2.1 Being a Wald test, the asymptotic distribution of W is χ 2 1, while its exact distribution is not immediately obvious. However, W is a 1-1 function of F, so the two tests are equivalent. Since the exact distribution of F is available, the F test is the obvious test to use. In W, the variances var S 2 j are estimated optimally using the Rao-Blackwell theorem. This depends very strongly on the assumption of normality. If normality is in doubt, then we can estimate var S 2 1 S2 2 using results given, for example, in 6. For a random sample, Y 1,...,Y n with population and sample central moments μ r and m r n j 1 Y j Y r /n, r 2, 3,..., 6 gives that ( E m r μ r O n 1), ( μ4 μ 2 2) var m 2 n O n 2. 2.2 Applying 6, 1.5 to the numerator of var m 2, μ 2 2 may be estimated to O n 1 by m 2 2,or, equivalently, by nm 2 2 / n 1 S4, where S 2 is the unbiased sample variance. It follows

4 Advances in Decision Sciences that, to order O n 2, var m 2 may be estimated by m 4 m 2 2 /n. We thus propose a robust alternative to W, given by ) 2 ( S 2 1 S2 2 ( m14 S 4 ) 1 /n1 ( R, say, 2.3 m 24 S 4 2) /n2 in which m i4 are the fourth central sample moments for the ith sample, i 1, 2. We call the test based on R the R test. As the sample sizes increase, the distributions of the standardised sample variances approach standard normality, the denominator in R will approximate var S 2 1 S2 2,andRwill have asymptotic distribution χ2 1.Thus,ifc α is the point for which the χ 2 1 distribution has weight α in the right hand tail, then the R test rejects H at approximately the 1 α% level when R c α. We emphasise that although the motivation for the derivation of R is under the assumption of sampling from normal populations, it is a valid test statistic for testing equality of variances no matter what the populations sampled. If the sample variances are equal to or asymptotically equivalent to the maximum likelihood estimators of the population variances, as is the case when sampling from normal populations, then the R test is a Wald test for equality of variances in the sense described above. Since it does not depend on any distributional assumptions about the data, it can be thought of as a nonparametric Wald test. As such, it can be expected to have good properties in large samples. We note that all the above test statistics are invariant under transformations a X ij b i, for constants a, b 1,andb 2 and for j 1,...,n i and i 1, 2. The next three sections report an empirical study when the distributions sampled are assumed to be normal. As this is an indicative study, we fix the samples sizes to be equal, n 1 n 2 n, say, and the significance level to be 5% throughout. 3. Test Size under Normality Under the null hypothesis, the distribution of F is known exactly, the distribution of L is known approximately, and, as mentioned above, the distribution of R is known asymptotically. In analysing data, these distributions are used to determine P values and critical values. We now investigate their use in determining test size, the probability of rejecting the null hypothesis when it is true. Two empirical assessments of test size will now be undertaken. Since the test statistics are scale invariant, it is sufficient under the null hypothesis to take both population variances to be one. In the first assessment, we assume normality. For various values of the common sample size n, we estimate the 5% critical points for each test by generating 1, pairs of random samples of size n, calculating the test statistics, ordering them, and hence identifying the.95th percentile. The estimated critical points of R approach the χ 2 1 5% critical point 3.841. These estimated critical points will subsequently be used in the power study to give tests with test the size of exactly 5%. To see the extent of the error caused by using the asymptotic critical point 3.841, Figure 1 gives the proportion of rejections in 1, pairs of random samples for sample sizes up to 1. For n 1, the proportion of rejections is nearly 2% and although for n 4 this has dropped to nearly 7%, most users would hope for observed test sizes closer to 5%.

Advances in Decision Sciences 5.4.35 Proportion of rejections.3.25.2 5.5 1 2 3 4 5 6 7 8 9 1 Sample size Figure 1: Proportion of rejections of the R test using the χ 2 1 5% critical point 3.841 for sample sizes up to 1. The application of the test is improved by the use a Bartlett-type correction. This was found by plotting the estimated 5% critical points against n and using standard curve fitting techniques to find that the exact critical points were well approximated between n 1 and 1 by c n,.5 3.84146 1.339 4.953/ n 24.171/n. For larger n,itissufficient to use the asymptotic 5% value 3.84146, the error being at most.8%. We checked the exact probabilities of rejection under the null hypothesis when applying the test with critical value c n,.5, and all were between 4.6% and 5.3%. For levels other than 5%, and when sample sizes are unequal, further empirical work needs to be done to find critical values. However, as this study was intended to be indicative, we leave extensive tabulation of critical values for another time. 4. Power under Normality For the F, Levene, and R tests, we estimate the power as the proportion of rejections from 1, pairs of random samples of size n when the first sample is from an N, 1 population and the second is from an N,σ 2 population with σ 2 1. To compare like with like, we use estimated critical values that give exact 5% level tests. It is apparent that for sample sizes less than about 2 the Levene test is more powerful than the R test and that for a sample size between approximately 2 and 3 the R test takes over from the Levene test; thereafter the R test is always more powerful than the Levene test. This is shown in Figure 2. Both the Levene and R tests are always less powerful than the F test. This is explored in Figure 3 that compares the Levene test to the F test in the left hand panel and the R test to the F test in the right hand panel. What is given is a contour plot of the regions in which the ratios of the power of the stated test to the F test are less than 95%, between 95% and 99.99%, and greater than 99.99%. Generally, for any given n and σ 2, it is clear that the power of the Levene test is at most that of the R test. For example, it appears that for n 1 n 2 6 approximately the power of the R test is always at least 95% of that of the F test, whereas there is a considerable region where the power of the Levene test are less than 95% of that of the F test.

6 Advances in Decision Sciences Proportion rejected 1.9.8.7.6.5.4.3.2 n = 8 n = 5 n = 3 n = 25 n = 2 1 1.5 2 2.5 3 3.5 4 4.5 5 Variance of second sample, σ 2 n = 15 n = 1 Figure 2: Power of the 5% level L test solid line and R test dashed line for various sample sizes. n = 5 2 2 18 18 16 16 Sample size 14 12 1 >.9999 Sample size 14 12 1 >.9999 8 8 6 4 >.95 6 4 >.95 2 <.95 2 <.95 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 σ 2 σ 2 a b Figure 3: Contour plots of the L test a and R test b relative to the F test showing regions in which the power ratios are less than 95%, between 95% and 99.99%, and greater than 99.99%. 5. Robustness Even if the R test has good power, the test is of little value unless it is robust in the sense that when the distributions from which we are sampling are not from the nominated population here, the normal distribution, thep values are reasonably accurate. It is thus of interest to estimate the proportion of rejections when the null hypothesis is true and both the populations from which we sample are not normal. We have looked at variable kurtosis and skewness. First, variable kurtosis was considered via t-distributions with varying degrees of freedom ν, say. Second, variable skewness was considered through gamma distributions

Advances in Decision Sciences 7 Test Size 5.5 n = 1 Test Size 5.5 n = 2 5 5 1 15 a ν n = 1 2 5 1 15 2 ν b 5 n = 2 Test Size.5 Test Size.5 1 2 3 c c 1 2 3 d c Figure 4: Test sizes for the F dots, L dashes, andr solid line tests for sample sizes of 1 and 2 for a, b t-distributions with varying degrees of freedom, ν, from 1 to 2, and c, d gamma distributions with scale parameter 1 and varying shape parameter, c, from to 3. with probability density functions b c 1 exp x /Γ c for x >. Thus, the scale parameter is set at one and the scale parameter c was varied. As ν and c increase, the distributions become increasingly more normal, the distribution sampled will be close enough to normal that we could expect the proportion of rejections to be close to the nominal. In Figure 4, we plot the proportion of rejections for the Levene, F, andr tests when sampling from t ν distributions, for ν 1,...,2, and from gamma distributions with scale parameter 1 and shape parameter c, forc,...,3. We show curves for each test with common sample sizes n 1 and 2. The critical values used for the R test are the c n,.5 from Section 3; the critical values used for the Levene test are those estimated by simulation to give exactly the nominal level when normality holds. Thus, this comparison favours the Levene test. For these distributions, samples are increasingly nonnormal as ν and c decrease. It is apparent that the F test performs increasingly poorly as these parameters decrease. When sampling from the t-distribution, the Levene test generally has exact level closer to the nominal level than the R test except for very nonnormal samples small ν. However, the level of the R test is almost always reasonable, and while for very small ν, the level is not as close to the exact level as perhaps we may prefer, the same is the case for the Levene test. When sampling from the gamma distribution when n 1, the R test outperforms the Levene test at small values of c and is slightly inferior for larger values of c, although both

8 Advances in Decision Sciences tests have exact test sizes acceptably close to the nominal test size. When n 2, the R test is uniformly closer to the nominal test size than the Levene test. Although we only display results for n 1 and n 2, sample sizes of more than 2 yield very similar conclusions. 6. Conclusion First, we reflect on testing for equality of variances when it is assumed that the populations sampled are normal. The F test is both the likelihood ratio test and a Wald test, and is the appropriate test to apply. When normality does not hold, the F test is no longer an asymptotically optimal test, and its well-known nonrobustness means that tests such as the Levene are more appropriate for small to moderate sample sizes. However, for sample sizes of about 25 or more, the R test is more powerful than the Levene, and with the small sample corrected critical values, it holds its nominal significance level well. For these sample sizes, it can be preferred to the Levene test. Second, consider testing for equality of variances when both samples are drawn from the same population. If that population is nominated, then the R test may be applied after determining critical values or using P values calculated by Monte Carlo methods. When the sample variances are asymptotically equivalent to the maximum likelihood estimators of the population variances as, e.g., is the case when sampling from normal populations but not Poisson populations,thertest is a nonparametric Wald test and hence will have good power in sufficiently large samples. If the population is not specified, the R test can be confidently applied when the sample sizes are large, using the asymptotic χ 2 1 null distribution to calculate P values or critical values. References 1 B. F. J. Manly, The Statistics of Natural Selection on Animal Populations, Chapman and Hall, London, UK, 1985. 2 D. S. Moore and G. P. McCabe, Introduction to the Practice of Statistics, W.H.Freeman,NewYork,NY, USA, 5th edition, 26. 3 W. J. Conover, M. E. Johnson, and M. M. Johnson, A comparative study of tests of homogeneity of variances, with applications to the outer continental shelf bidding data, Technometrics, vol.23,no.4, pp. 351 361, 1981. 4 D. D. Boos and C. Brownie, Bootstrap methods for testing homogeneity of variances, Technometrics, vol. 31, no. 1, pp. 69 82, 1989. 5 J. C. Rayner, The asymptotically optimal tests, Journal of the Royal Statistical Society Series D, vol. 46, no. 3, pp. 337 346, 1997. 6 A. Stuart and J. K. Ord, Kendall s Advanced Theory of Statistics, vol. 1, Edward Arnold, London, UK, 6th edition, 1994.