Antonietta Mira. University of Pavia, Italy. Abstract. We propose a test based on Bonferroni's measure of skewness.

Similar documents
ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

p(z)

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Lecture 21. Hypothesis Testing II

Can we do statistical inference in a non-asymptotic way? 1

Heteroskedasticity-Robust Inference in Finite Samples

Distribution-Free Tests for Two-Sample Location Problems Based on Subsamples

Non-parametric Inference and Resampling

Rapporto n Median estimation with auxiliary variables. Rosita De Paola. Dicembre 2012

In Chapter 2, some concepts from the robustness literature were introduced. An important concept was the inuence function. In the present chapter, the

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Statistical Inference

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

Contents 1. Contents

Generalized nonparametric tests for one-sample location problem based on sub-samples

simple if it completely specifies the density of x

A Signed-Rank Test Based on the Score Function

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

A comparison study of the nonparametric tests based on the empirical distributions

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Maximum Likelihood Estimation

Minimum distance tests and estimates based on ranks

Non-parametric tests, part A:

The Nonparametric Bootstrap

GMM tests for the Katz family of distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Stochastic Convergence, Delta Method & Moment Estimators

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

ON STATISTICAL INFERENCE UNDER ASYMMETRIC LOSS. Abstract. We introduce a wide class of asymmetric loss functions and show how to obtain

The Bootstrap is Inconsistent with Probability Theory

A scale-free goodness-of-t statistic for the exponential distribution based on maximum correlations

A Monte Carlo Simulation of the Robust Rank- Order Test Under Various Population Symmetry Conditions

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Statistical Inference

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Asymptotic Statistics-VI. Changliang Zou

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University

Economics 620, Lecture 8: Asymptotics I

1 Introduction Suppose we have multivariate data y 1 ; y 2 ; : : : ; y n consisting of n points in p dimensions. In this paper we propose a test stati

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Bootstrap, Jackknife and other resampling methods

Testing for a Global Maximum of the Likelihood

Statistical Inference

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

PRIME GENERATING LUCAS SEQUENCES

ECONOMETRICS. Bruce E. Hansen. c2000, 2001, 2002, 2003, University of Wisconsin

Asymptotic Statistics-III. Changliang Zou

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Master s Written Examination

STATISTICS SYLLABUS UNIT I

368 XUMING HE AND GANG WANG of convergence for the MVE estimator is n ;1=3. We establish strong consistency and functional continuity of the MVE estim

G. Larry Bretthorst. Washington University, Department of Chemistry. and. C. Ray Smith

Analysis of 2x2 Cross-Over Designs using T-Tests

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Introductory Econometrics. Review of statistics (Part II: Inference)

PROD. TYPE: COM ARTICLE IN PRESS. Computational Statistics & Data Analysis ( )

Individual bioequivalence testing under 2 3 designs

On robust and efficient estimation of the center of. Symmetry.

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Chapter 4. Theory of Tests. 4.1 Introduction

ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specication.

RT Analysis 2 Contents Abstract 3 Analysis of Response Time Distributions 4 Characterization of Random Variables 5 Density Function

The best expert versus the smartest algorithm

11. Bootstrap Methods

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

When is a copula constant? A test for changing relationships

Gaussian distributions and processes have long been accepted as useful tools for stochastic

Comparison of Power between Adaptive Tests and Other Tests in the Field of Two Sample Scale Problem

Rank tests for short memory stationarity

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Does k-th Moment Exist?

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

MODELLING OF FLEXIBLE MECHANICAL SYSTEMS THROUGH APPROXIMATED EIGENFUNCTIONS L. Menini A. Tornambe L. Zaccarian Dip. Informatica, Sistemi e Produzione

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim

Approximations of empirical processes. Ph.D. Thesis. By Gábor Sz cs Supervisor: Professor Sándor Csörg

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Introduction to Statistical Inference Lecture 10: ANOVA, Kruskal-Wallis Test

Transition Passage to Descriptive Statistics 28

Testing Statistical Hypotheses

Bounds on expectation of order statistics from a nite population

Normal Probability Plot Probability Probability

Math 152. Rumbos Fall Solutions to Exam #2

On reaching head-to-tail ratios for balanced and unbalanced coins

The stochastic modeling

Chapter 2: Resampling Maarten Jansen

Independent Component (IC) Models: New Extensions of the Multinormal Model

A Non-parametric \Trim and Fill" Method of. Sue Taylor. Department of Preventive Medicine and Biometrics,

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Loglikelihood and Confidence Intervals

Chapter 7 Comparison of two independent samples

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Transcription:

Distribution-free test for symmetry based on Bonferroni's Measure Antonietta Mira University of Pavia, Italy Abstract We propose a test based on Bonferroni's measure of skewness. The test detects the asymmetry of a distribution function about an unknown median. We study the asymptotic distribution of the given test statistic and provide a consistent estimate of its variance. The asymptotic relative eciency of the proposed test is computed along with Monte Carlo estimates of its power. This allows us to perform a comparison of the test based on Bonferroni's measure with other tests for symmetry. Two sets of data are tested using our method. 1 Introduction We are interested in studying the skewness of a distribution function F (x) with unknown mean F, and median Me F. Bonferroni (1933), 1

A. Mira 2 introduced in the literature the use of the quantities: p (F ) = F?1 (p) + F?1 (1? p)? 2Me F 8p 2 (0; 1=2) (1) to measure the skewness of the distribution F. Since then, several authors have proposed ways to synthesize the quantities in (1) in a single measure of skewness: MacGillivray (1986), Zenga (1986) and Basak et al. (1992). Bonferroni himself (1933), rst dened the measure which is the object of this paper: 1 (F ) = Z 1 0 p (F ) dp = 2 Z 1 0 F?1 (p) dp? 2Me F = 2( F? Me F ): (2) Given a measure of skewness we can propose a symmetry test after meeting two requirements. First we need to derive the distribution of the measure of interest for nite or innite samples. Finite sample results are quite dicult; therefore we focus on asymptotic results. Second we must ensure that the derived distribution does not depend on the distribution function under study so that we can specify the critical values of the test. The usual approach is to nd consistent estimates of the parameters of the (asymptotic) distribution. In this paper we show that the asymptotic distribution of the sample version of 1 (F ) is Gaussian and propose consistent estimates of the limiting mean and variance. This allows us to construct a test for symmetry based on (2) and to compare it, in terms of asymptotic relative eciency, with other tests known in the literature.

A. Mira 3 2 Asymptotic distribution of Bonferroni's measure of skewness In this section we derive the asymptotic distribution of the sample version of the measure 1 (F ), say 1 (F n ), where F n is the empirical distribution function of a random sample, X 1 ; X 2 ; : : : ; X n, from F. Theorem 1. Let F be a distribution function with F < 1, and rst derivative f, such that f(me F ) > 0. Then p n[1 (F n )? 1 (F )] d! N 0; 2 ( 1 ; F ) ; where! 2 2 ( 1 ; F ) = 4F 2 + 1 1? 4 f(me F ) f(me F ) S Me F : (3) Proof 1. Following the literature on L-statistics, (Sering (1980)), to study the large sample distribution of 1 (F n ) we analyze the asymptotic behavior of the the reminder term of a generalization of Taylor expansion to functionals: where R mn = 1 (F n )? 1 (F )? d k T (F ; F n? F ) = mx k=1 1 k! d k 1 (F ; F n? F ) dk d k T (F + (F n? F )) =0 + is the k-th G^ateaux-dierential of a functional T in F in the direction of F n. To prove the asymptotic normality of 1 (F n ) it is sucient to show that n 1=2 R 1n P! 0 (4)

A. Mira 4 as n!1. We have: where d 1 1 (F ; F n? F ) = 1 n h( 1 ; F; x) = 2(x? F )? 2 nx i=1 h( 1 ; F; X i ); 1=2? I (F?1 (1=2)x) f(me F ) and I A denotes the indicator function of the set A. It follows that a sucient condition for (4) to hold is that f(me F ) > 0. This proves the asymptotic normality. We now nd an expression for the asymptotic mean and variance. We have: E F f 1 (F n )? 1 (F )g = E F fh( 1 ; F; X)g = 0, therefore 2 ( 1 ; F ) = E F fh( 1 ; F; X) 2 g: Note that: E F fh( 1 ; F; X) 2 g = E F 2(X? F )? 2 1=2? I (XMe F ) f(me F ) = 4 2 F + 1 f(me F )? 8 h i 2 f(me F ) E F (X? F ) 1=2? I (XMeF ) : We now need only express the quantity E F (X? F )[1=2? I (XMeF )] (5) in terms of S M e F. It is easy to show that (5) equals: 1 h i 2 F? E F XI(XMeF ) : (6) From the denition of median absolute deviation S MeF = EjX? Me F j = F? 2 we conclude that (6) equals S M e F 2.2 Z MeF?1!! 2 xf(x) dx (7)

A. Mira 5 3 Consistent estimate for the asymptotic variance of Bonferroni's measure The statistic 1 (F n ) is not asymptotically distribution-free but it can be made distribution-free by nding a consistent estimate of the asymptotic variance that does not depend on F. Given X 1 ; X 2 ; : : : ; X n, an i.i.d. sample from F; let X 1:n X 2:n : : : X n:n be the corresponding order statistics. Let X s:n be the sample median dened as follows: for n even X s:n = X n+1 2 :n ; for n odd X s:n = 1 2 (X n 2 :n + X n 2 +1:n ). Theorem 2. A consistent estimate for (3) is: S 2 c ( 1; F n ) = 4^ 2 + [D n;c (Me F )] 2? 4D n;c (Me F ) ^SMeF where: ^S MeF ^ 2 = 1 n? 1 nx i=1 = X n? 2 n (X i? X n ) 2 nx i=1 X i I (Xi X s:n) D n;c (Me F ) = n1=5 2c X[(n=2)+cn 4=5]:n? X [(n=2)?cn 4=5+1]:n : Proof 2. The proof of the theorem follows from the next two lemmas where we will show that ^S MeF for S MeF and 1 f (Me F ) and D n;c (Me F ) are consistent estimates respectively. The Slutsky-Frechet theorem will

A. Mira 6 then allow us to conclude that S 2 c ( 1 ; F n ) is a consistent estimate for 2 ( 1 ; F ). Lemma 1. A consistent estimate for the absolute median deviation is ^S MeF. Proof of Lemma 1. We rewrite the absolute median deviation as in (7) and estimate F with the sample mean, X n. We will now show that the statistic: K n (F n ) = 1 n is a weakly consistent estimate for Let K n be the quantity nx i=1 K(F ) = E[X I (XMeF )] = K n = 1 n nx i=1 X i I (Xi X s:n) Z MeF?1 X i I (Xi Me F ): xdf (x): From the Strong Law of Large Numbers it follows that K n converges almost surely to K. It is therefore sucient to show that K n converges in probability to K n: We have: jk n? K nj = 1 n nx i=1 jmax(x s:n ; Me F )j 1 n jx i ji (Xi :X i between Me F nx i=1 and X s:n) I (Xi :X i between Me F and X s:n) : The quantity max(x s:n ; Me F ) converges in probability to the constant Me F and can thus be omitted. We only need to show that 1 n nx i=1 I (Xi :X i between Me F and X s:n) (8)

A. Mira 7 converges in probability to zero. As proved in Mira (1995), the expectation of (8) tends towards zero as n goes to innity and this implies convergence in probability to zero of (8). 2 Lemma 2. A consistent estimate for the inverse of the density function evaluated at the median is D n;c (Me F ). Proof of Lemma 2. Siddiqui (1960) considers ~D n;m (v p ) = n 2m (x [np+m]:n? x [np?m+1]:n ) as an estimate for g(p) = 1=f(v p ) where v p is the quantile of order p. Bloch and Gastwirth (1968) show that if m is proportional to n 4=5, the corresponding estimate minimizes the asymptotic expected mean square error (AMSE). In the same paper it is proven that if m = o(n) and m! 1 as n! 1, then the statistic ~ D mn (v p ) is a consistent estimate for g(p). Following Bloch and Gastwirth (1968) we choose m = c n 4=5. The conditions for consistency are met and we obtain ~D n;cn 4=5(v 1=2 ) = D n;c (Me F ) the proposed estimate for 1=f(Me F ). 2 Remark 1. The AMSE of ~ D n;m (v p ) is lim n!1 E[ ~ D n;m (v p )? g(p)] 2 = g2 (p) 2m + g00 (p) 2 36 m n 4 :

A. Mira 8 Letting m = c n 4=5, the value of c that minimizes the AMSE is given by = assuming g 00 (p) 6= 0. c =! 9g 2 1 5 (p) 2[g 00 (p)] 2! 9f 8 1 5 (v p ) 2[3(f 0 (v p )) 2? f(v p )f 00 (v p )] 2 The previous formula shows that the optimal choice of c depends on the values of f(v p ); f 0 (v p ) ed f 00 (v p ). For example, when p = 1=2, the best choice of c is: c = 0:5 for a Gaussian, c = 0:4 for a Cauchy and c = 0:58 for a Logistic density. Given that we assume complete ignorance about the form of the density function, this dependence could cause problems. To examine the eect of an incorrect choice of c on the precision of our estimate, we performed some Monte Carlo simulations with varying values of c. The results are presented in Mira (1995) and show that we can use c = 0:5 with no signicant loss in terms of mean square error of our estimate of the asymptotic variance regardless of the distribution under consideration. Remark 2. In Mira (1995) the statistic Sc 2 ( 1 ; F n ) is compared with an other consistent estimate of the asymptotic variance obtained using the delete-d Jackknife, a modication of the Jackknife introduced by Shao (1989). Monte Carlo simulation show that Sc 2 ( 1 ; F n ) has better performances for most of the underlying distributions F, both in terms of expected mean square error and distortion.

A. Mira 9 4 A test based on Bonferroni's measure of skewness and its asymptotic relative eciency Let X 1 ; X 2 ; : : : ; X n, be an i.i.d. sample from F (x) = F 0 (x? Me F ), an absolutely continuous distribution on <. Assume the hypothesis of theorem 1 hold. We are interested in testing the hypothesis of symmetry: H 0 : F 0 (x) = 1? F 0 (?x): The asymptotic test statistic we introduce is: 1 (F n ) = 2(X n? X s:n ): Intuitively, large values of j 1 (F n )j will induce the researcher to reject the null hypothesis of symmetry in favor of the two sided alternative: H 1 : F 0 (x) 6= 1? F 0 (?x): If the alternative hypothesis is one sided the rejection region is accordingly modied. As a consequences of theorem 1 and given that S 2 c ( 1 ; F n ) is a weakly consistent estimate for the asymptotic variance, we conclude p n 1 (F n )? 1 (F ) S c ( 1 ; F n ) d! N (0; 1) : Hence the sequence of rejection regions to test H 0 against H 1 is given by: reject H 0 if j 1 (F n )j an p n S c ( 1 ; F n );

A. Mira 10 where a n! z 1?=2 as n! 1. We denote the th? quantile of the standard normal distribution with z i.e.?1 () = z. By construction the test has, asymptotically, a signicance level of. We now compute the asymptotic relative eciency (ARE) of our test statistic according to the denition of Pitman (1948). Consider a sequence of alternatives that gets closer and closer to the distribution under the null hypothesis as the sample size, n, increases. Suppose we are interested in comparing the performance of the test statistics T and T. Let n and n be the sample sizes needed by the tests respectively of some size, to obtain the same power against the same sequence of alternatives. Suppose that the ratio N N tends to a limit independent of the level of signicance and the power as n! 1. Then that limit is called ARE of the test T relative to T and will be denoted as ARE(T; T ) = EF F (T ) ; where EF F (T ) is the ecacy of T. EF F (T ) If ARE(T; T ) > 1 we can conclude that, relative to the sequence of alternative hypothesis considered, T perform better than T. Consider the classes of density functions given by: f 0 (x) = g(x) I (x0) + g( x )1 I (x>0) where g(x) is symmetric around zero. We are interested in testing the hypothesis of symmetry H 0 : = 0 = 1 against the alternative H 1 : > 1: (9)

A. Mira 11 Having stated the problem in this form allows us to consider the sequence of alternative hypotheses: H 1;n : = 0 + k p n with k an arbitrary positive constant. This sequence of hypothesis tends to H 0 as n goes to innity as required when dening the ARE. A tedious computation (Mira (1995)) shows that the ecacy of our test statistic is given by: EF F ( 1 (F n )) = 4n R 0?1 xg(x)dx 2 4 2 g + 1 g(0) 2? 4 g(0) S Me g : 5 Comparison with other tests for symmetry In this section we compare the test statistic based on Bonferroni's measure of skewness with other tests for symmetry known in the literature. We consider two ways to compare hypothesis tests. The rst one is an exact procedure and is based on the asymptotic relative eciency. Unfortunately this procedure can only be performed if the ecacy is known for each of the test statistics we wish to compare. If this is not the case we can still perform a comparison using a Monte Carlo simulation.

A. Mira 12 5.1 Comparison based on the asymptotic relative eciency Bowley (1920) introduced the measure on skewness: 2 (F ) = 3 F 3 F where 3 F is the third central moment from the mean. The sample version of this measure, 2 (F n ), is often used to test the hypothesis of symmetry: large values of 2 (F n ) induce a rejection of the null hypothesis for a two tailed test. Gupta (1967) shows that if the central moments up to sixth order are nite, then the ecacy of this test, for the hypothesis in (9), is given by: EF F ( 2 (F n )) = n ( @ 3 @ )2 j 0 6? 6 2 4 + 9 3 2 where i and i are the sample central moments of order i under the null and the alternative hypothesis respectively. Once we are given the ecacy of two test statistics we can compute their asymptotic relative eciency. In our case we have the following results. When the distribution g(x) is standard normal: ARE( 1 ; 2 ) = lim n!1 EF F ( 1 ) EF F ( 2 ) = lim n!1 n 0:278 n 0:238 = 1:164;

A. Mira 13 when the g(x) is uniform on [?1; +1] : ARE( 1 ; 2 ) = lim n!1 n 0:187 n 0:205 = 0:914; when g(x) is triangular on [?1; +1] : ARE( 1 ; 2 ) = lim n!1 n 0:333 n 0:0259 = 12:855; when g(x) is the Laplace distribution: ARE( 1 ; 2 ) = lim n!1 n 0:25 n 0:071 = 3:488: We note that the ecacy of 1 (F n ) is either higher than that of 2 (F n ) or there is no signicative dierence between the ecacy of these tests. 5.2 Empirical power: family of 2 distributions If the ecacy of the test statistic we want to compare with 1 (F n) is not available or hard to obtain analytically, we can still perform a comparison using simulations. By sampling from skewed distributions as 2 with varying degrees of freedom we can estimate the power of the test both for one-sided and two-sided alternative hypothesis. As an example simulation consider 2 1. We generate 10 000 samples of varying size, (n = 20; 40; 50; 100); from a 2 1. On each sample we compute the value of p n 1(F n) S 0:5 ( 1 ;F n) = ^T(Fn ) and check how many times ^T(F n ) exceeds the critical values given by the proper quantiles of a standard normal distribution. The signicance level of the test is xed

A. Mira 14 at = 0:05. The number we obtain is a Monte Carlo estimate of the probability of rejecting the null hypothesis when it is false, i.e. of the power of the test. The results of the simulation are presented in tables 1 and 2. Tables 1-2 should be placed here. As expected the power of the test decreases as the degrees of freedom of the 2 distribution we are sampling increase. This is due to the fact that a 2 k converges to a normal distribution as k! 1, therefore it becomes harder to discriminate between the null and the alternative hypothesis. The power also increases as the sample size n increases since we have more information on which to base a decision. For purposes of comparison in tables 3 and 4 we present the empirical power of Bowley's test estimated under the same conditions. Gupta (1967) proves that 2 (F n ), properly normalized, has an asymptotic normal distribution. Therefore the critical values of the test are provided by the proper quantiles of the normal distribution. Tables 3-4 should be placed here. The comparison of tables 1-2 with tables 3-4 shows that the test based on 1 (F n ) performs better than that based on 2 (F n ) for all samples sizes considered when the degrees of freedom of the underlying 2 distribution are less than 3.

A. Mira 15 5.3 Empirical power: Lambda family of distributions We continue our simulation study on the basis of a set of 10 000 random samples of size n = 30, 50 and 100, from nine members of the generalized lambda distributions (GLD, Ramberg (1974)). The inverse function of the GLD is given by F?1 (u) = 1 + [u 3? (1? u) 4 ] 2 : The GLD provides a very convenient method of generating random samples from a wide range of distributions. The values of i i = 1; 2; 3; 4; we consider are taken from Tajauddin (1960) and McWilliams (1990). For each random sample of size n we compute the test statistic ^T(Fn ) and use the normal approximation to count the number of rejections at = 0:05. Tables 5 and 6 present the number of rejections among 10 000 random samples for two-sided and one-sided alternative hypothesis respectively. The GLD in each table is described by its parameter values. Case 1 is the GLD approximating the standard normal distribution thus the number of rejections gives the empirical value for our test statistics. Tables 5-6 should be placed here. From table 5 and 6 we observe the following: 1. For both one-sided and two-sided alternative the empirical slightly underestimates the nominal = 0:05.

A. Mira 16 2. Our test statistic performs better in the case of exponential distributions (case 3, 8, 9) than in the case of unimodal distributions. Tables 7, 8 and 9 give the relative eciency (RE) of the test based on 1 (F ) compared with, respectively, test S, (Tajauddin (1960)); test R, (McWilliams (1990)) and the test based on 1 (Gupta (1967)). Tables 7-8 - 9 should be placed here. McWilliams (1990), compared the test R based on a run statistic, with the tests proposed by Butler (1969), Rothman and Woodroofe (1972) and Hill and Rao (1977). On the basis of his simulation study he recommended the use of the test R over all considered competitors. Later Tajauddin (1960), introduced a test based on the Wilcoxon two-sample test and showed that, when sampling from unimodal distributions (case 4, 5, 6 and 7), the power of the S test is either signicantly higher than that of R or there is no signicant dierence between the powers of the tests. Tables 7 and 8 show that the test based on 1 (F ) performs better than S when sampling from unimodal distributions and better than R when sampling from exponential distributions. In other words, the test we propose performs better than both R and S in almost all cases under study. The only exception is given by case 2 when the RE is less than one for sample sizes bigger than 30, indicating a better performance of the other two tests for both one-sided and two-sided alternative. In this case the test R presents the highest estimated power.

A. Mira 17 Finally table 9 show that the test based on 1 (F ) performs better than that based on 2 (F n ) in all but cases 2 and 4. In both these cases 2 (F n ) is the best test, in terms of empirical power, among the ones under study. Comparison of the density function of a GLD as in case 2 to the inuence curve of 2 (F n ), (Groeneveld (1991)) explains the high empirical power of the test based on Bowley's measure of skewness in this particular case. The inuence function of 2 given by the cubic h( 2 ; F; x) = x 3? 3x; tells us that this measure is particularly sensitive to contaminations towards asymmetry occuring in the tails of a distribution. A GLD with 1 = 0; 2 = 1; 3 = 1:4; 4 = 0:25, has quite heavy tails. This explains why 2 (F n ) is extremely powerful in detecting its skewness. Similar reasoning explains why 2 (F n ) performs better than 1 (F n ) when detecting the asymmetry of 2 distributions with more than three degrees of freedom (tables 1-4). We thus recommend the test based on the standardized third central moment in situations where we suspect that the asymmetry occurs far in the tails of the distribution. 6 Numerical examples We apply the test for symmetry developed in this paper to the data sets presented in Basak at al. (1992). The data refer to the nal examination scores and completion times for 134 individuals. Note that, since the sample size is 134, the maximum value for

A. Mira 18 c = n1=5 2 is 1.33. We will therefore consider values of the parameter c ranging from 0.1 to 1. Values of c greater than 1 are considered too extreme. The values of our test statistic computed with dierent choices of the test parameter c, for the data are: Table 10 should be placed here. We x = 0:05 and test the null hypothesis of symmetry of the distribution against a two-tailed alternative. The critical values of the test, provided by the proper quantiles of the standard normal distribution, are 1:9599. For the nal scores data we fail to reject the null hypothesis thus reaching the same conclusion as in Basak at al. (1992), for all the values of c considered. Regarding the completion times, if we consider the same hypothesis and type one error probability as before, we would reject the null hypothesis of symmetry again reaching the same conclusions as in Basak at al. (1992), for all the values of the parameter c ranging from 0.1 to 0.8. Furthermore, if we are interested in the one-sided alternative hypothesis that completion times are skewed to the right, then the critical value is 1.6448. Based on Bonferroni's measure we would conclude that the completion times are positively skewed regardless of the values of c considered.

A. Mira 19 7 Conclusions We have introduced a test for symmetry which is easy to perform and has good properties both in terms of power and asymptotic relative eciency. References Basak, I. & Balch, W. & Basack, P. (1992) Skewness: asymptotic critical values for a test related to Pearson's measure, Journal of Applied Statistics, 19, pp. 479{487. Bloch, D. & Gastwirth, J. (1968) On a simple estimate of the reciprocal of the density function, Annals of Mathematical Statistics, 39, pp. 1083{1085. Bonferroni, C. E. (1933) Elementi di Statistica Generale, E. Gili (Ed.), Torino, Italy. Bowley, R. (1920) Elements of Statistics, Staples Press Ltd, London. Groeneveld, R. (1991) An inuence function approach to describing the skewness of a Distribution, The American Statistician, 45, pp. 97{102. Gupta, M. (1967) An asymptotic nonparametric test of symmetry. Annals of Mathematical Statistics, 38, pp. 849{866.

A. Mira 20 MacGillivray, H. (1986) Skewness and asymmetry: Measures and orderings, Annals Statistics, 17, pp. 789{802. McWilliams, T. P. (1990) A distribution-free test for symmetry based on runs statistic, Journal of the American Statistical Association, 85, pp. 1130{1133. Mira, A. (1995) Misure di asimmetria: convergenza asintotica e problemi di robustezza, PhD thesis, Universita degli studi di Trento, Italy. Pitman, E. (1948) Non-parametric statistical inference, University of North Carolina Institute of Statistics, (mimeographed lecture notes). Ramberg, J. S. & Schmaiser, B. W. (1974) An approximate mathod for generating asymmetric random variables, Communications of the ACM, 17, pp. 78{82. Sering, R. (1980) Approximation Theorems of Mathematical Statistics, Wiley, New York. Shao, J. (1989) A general theory for Jackknife variance estimation. Annals of Mathematical Statistics, 17, pp. 1176{1197. Siddiqui, M. M. (1960) Distribution of quantiles in samples from a bivariate population, Journal of Research N. B. S., 64B, pp. 145{ 150.

A. Mira 21 Tajauddin, I. H. (1994) Distribution-free test for symmetry based on Wilcoxon two-sample test, Journal of Applied Statistics, 21, pp. 409{415. Zenga, M. (1986) Argomenti di Statistica, Vita e Pensiero, Milano, Italy.

A. Mira 22 Table 1. 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 n = 20 6046 3545 2380 1805 1487 1284 1102 1017 n = 40 9327 6712 4776 3647 2887 2359 2024 1676 n = 50 9835 8354 6660 5278 4455 3844 3314 2903 n = 100 10000 9849 9351 8363 7382 6693 5893 5361 Table 2. 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 n = 20 7850 5362 3995 3244 2704 2400 2099 1914 n = 40 9740 8052 6439 5206 4404 3698 3292 2919 n = 50 9949 9120 7822 6756 5933 5207 4710 4294 n = 100 10000 9938 9616 9148 8473 7884 7264 6662 Table 3. 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 n = 20 3453 2958 2288 1842 1602 1330 1224 1101 n = 40 4624 5176 4901 4468 4010 3672 3290 2998 n = 50 5121 5838 5782 5478 5158 4718 4351 3947 n = 100 6731 7674 7974 8070 8087 7974 7842 7557 Table 4. 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 n = 20 5428 4960 4219 3589 3323 3026 2719 2509 n = 40 6718 7330 7131 6908 6431 6030 5519 5126 n = 50 7176 7672 7776 7587 7334 6951 6607 6349 n = 100 8339 8920 9184 9184 9172 9168 9113 9046

A. Mira 23 Table 5. n = 30 n = 50 n = 100 Case (1) : 1 = 0; 2 = 9:197454; 3 = 4 = 0:134915 365 362 389 Case (2) : 1 = 0; 2 = 1; 3 = 1:4; 4 = 0:25 3362 3395 4490 Case (3) : 1 = 0; 2 = 1; 3 = 0:00007; 4 = 0:19 8737 8693 9651 Case (4) : 1 = 3:586508; 2 = 0:04306; 3 = 0:025213; 4 = 0:094029 3815 3741 5316 Case (5) : 1 = 0; 2 =?1; 3 =?0:0075; 4 =?0:03 5885 5964 7769 Case (6) : 1 =?0:116734; 2 =?0:351663; 3 =?0:13; 4 =?0:16 616 662 871 Case (7) : 1 = 0; 2 =?1; 3 =?0:1; 4 =?0:18 2608 2683 3959 Case (8) : 1 = 0; 2 =?1; 3 =?0:001; 4 =?0:13 9655 9681 9972 Case (9) : 1 = 0; 2 =?1; 3 =?0:0001; 4 =?0:17 9753 9750 9981

A. Mira 24 Table 6. n = 30 n = 50 n = 100 Case (1) : 1 = 0; 2 = 9:197454; 3 = 4 = 0:134915 338 347 397 Case (2) : 1 = 0; 2 = 1; 3 = 1:4; 4 = 0:25 4634 4656 5783 Case (3) : 1 = 0; 2 = 1; 3 = 0:00007; 4 = 0:19 9295 9296 9828 Case (4) : 1 = 3:586508; 2 = 0:04306; 3 = 0:025213; 4 = 0:094029 5276 5253 6717 Case (5) : 1 = 0; 2 =?1; 3 =?0:0075; 4 =?0:03 7303 7291 8722 Case (6) : 1 =?0:116734; 2 =?0:351663; 3 =?0:13; 4 =?0:16 1093 1125 1472 Case (7) : 1 = 0; 2 =?1; 3 =?0:1; 4 =?0:18 3998 4103 5447 Case (8) : 1 = 0; 2 =?1; 3 =?0:001; 4 =?0:13 9867 9869 9995 Case (9) : 1 = 0; 2 =?1; 3 =?0:0001; 4 =?0:17 9933 9941 9996

A. Mira 25 Table 7. Case (2) (3) (4) (5) (6) (7) (8) (9) n = 30 1:44 2:28 2:74 3:23 1:11 2:75 1:99 1:90 n = 50 0:91 1:45 1:72 2:01 1:11 1:87 1:39 1:33 n = 100 0:71 1:10 1:39 1:50 1:21 1:72 1:06 1:05 Table 8. Case (2) (3) (4) (5) (6) (7) (8) (9) n = 30 1:08 1:96 3:41 4:02 1:15 3:45 1:77 1:67 n = 50 0:70 1:28 2:61 3:04 1:24 3:01 1:25 1:21 n = 100 0:70 1:03 2:44 2:42 1:42 3:45 1:03 1:02 Table 9. Case (2) (3) (4) (5) (6) (7) (8) (9) n = 30 0:46 1:10 0:73 1:09 1:16 1:39 1:79 1:98 n = 50 0:47 1:10 0:71 1:12 1:15 1:38 1:82 2:00 n = 100 0:51 1:09 0:75 1:13 1:38 1:52 1:67 1:80

A. Mira 26 Table 10. c value f inal scores compl: times 0:1?0:695 2:457 0:2?0:695 2:465 0:3?0:695 2:423 0:4?0:695 2:365 0:5?0:695 2:372 0:6?0:695 2:305 0:7?0:695 2:190 0:8?0:647 2:090 0:9?0:611 1:889 1:0?0:585 1:709 1:1?0:538 1:571 1:2?0:501 1:285 1:3?0:410 1:109

A. Mira 27 Captions to tables Caption to table 1: Empirical power of 1 (F n ) for = 0:05; H 1 two-sided, sampling from 2 distributions. Caption to table 2: Empirical power of 1 (F n ) for = 0:05; H 1 one-sided, sampling from 2 distributions. Caption to table 3: Empirical power of 2 (F n ) for = 0:05; H 1 two-sided, sampling from 2 distributions. Caption to table 4: Empirical power of 2 (F n ) for = 0:05; H 1 one-sided, sampling from 2 distributions. Caption to table 5: Empirical power of 1 (F n ) for = 0:05; H 1 two-sided, sampling from GLD. Caption to table 6: Empirical power of 1 (F n ) for = 0:05; H 1 one-sided, sampling from GLD. Caption to table 7: Relative eciency wrt S, H 1 two-sided, sampling from GLD. Caption to table 8: Relative eciency wrt R, H 1 two-sided, sampling from GLD. Caption to table 9: Relative eciency wrt 2 (F n ), H 1 sampling from GLD. two-sided, Caption to table 10: Values of ^T(F n ) computed with dierent choices of the parameter c, for the nal scores and the completion times data.