A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

Similar documents
The performance of univariate goodness-of-fit tests for normality based on the empirical characteristic function in large samples

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Power Comparison of Some Goodness-of-fit Tests

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

Applying least absolute deviation regression to regressiontype estimation of the index of a stable distribution using the characteristic function

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

NCSS Statistical Software. Tolerance Intervals

Parameter, Statistic and Random Samples

Expectation and Variance of a random variable

Stat 200 -Testing Summary Page 1

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

A new distribution-free quantile estimator

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Properties and Hypothesis Testing

Stat 319 Theory of Statistics (2) Exercises

Access to the published version may require journal subscription. Published with permission from: Elsevier.

1 Inferential Methods for Correlation and Regression Analysis

Common Large/Small Sample Tests 1/55

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Topic 9: Sampling Distributions of Estimators

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 19: Convergence

arxiv: v3 [math.st] 2 Aug 2011

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Lecture 2: Monte Carlo Simulation

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

32 estimating the cumulative distribution function

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.


Lecture 3. Properties of Summary Statistics: Sampling Distribution

Modified Lilliefors Test

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

ANOTHER WEIGHTED WEIBULL DISTRIBUTION FROM AZZALINI S FAMILY

Published online: 12 Jul 2013.

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Parameter, Statistic and Random Samples

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Data Analysis and Statistical Methods Statistics 651

Statistical inference: example 1. Inferential Statistics

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Investigating a new estimator of the serial correlation coefficient

Chapter 6 Principles of Data Reduction

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

Topic 9: Sampling Distributions of Estimators

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Testing Statistical Hypotheses for Compare. Means with Vague Data

The standard deviation of the mean

Problem Set 4 Due Oct, 12

Introducing a Novel Bivariate Generalized Skew-Symmetric Normal Distribution

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Unit 5: Assumptions and Robustness of t-based Inference. Chapter 3 in the Text

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution

Bayesian Methods: Introduction to Multi-parameter Models

Topic 10: Introduction to Estimation

Lecture 33: Bootstrap

GG313 GEOLOGICAL DATA ANALYSIS

Simulation. Two Rule For Inverting A Distribution Function

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Topic 9: Sampling Distributions of Estimators

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Probability and statistics: basic terms

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Section 14. Simple linear regression.

Lecture 7: Properties of Random Samples

Estimation for Complete Data

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

An Alternative Goodness-of-fit Test for Normality with Unknown Parameters

Modied moment estimation for the two-parameter Birnbaum Saunders distribution

A statistical method to determine sample size to estimate characteristic value of soil parameters

Module 1 Fundamentals in statistics

Monte Carlo Integration

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

A CONSISTENT TEST FOR EXPONENTIALITY BASED ON THE EMPIRICAL MOMENT PROCESS

4.5 Multiple Imputation

Descriptive Statistics

LECTURE 8: ASYMPTOTICS I

Transcription:

A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State, South Africa Abstract The ormal distributio has the uique property that the cumulat geeratig fuctio has oly two terms, amely those ivolvig the mea ad the variace. Various tests based o the empirical characteristic fuctio were proposed. I this work a simple ormality test based o the studetized observatios ad the modulus of the empirical characteristic fuctio is show to outperform six of the most recogized test for ormality i large samples. Keywords ormality test, empirical characteristic fuctio, cumulats, goodess-of-fit 1. Itroductio A goodess-of-fit test statistic which is a fuctio of the empirical characteristic fuctio (ecf) ad with a asymptotic stadard ormal distributio was derived by Murota ad Takeuchi (1981). They derived a locatio ad scale ivariat test usig 1

studetized observatios ad showed that the use of a sigle value whe calculatig the ecf is sufficiet to get good results with respect to power whe testig hypotheses. Murota ad Takeuchi (1981) compared their test agaist a test based o the sample kurtosis, ad coducted a small simulatio study, but did ot compare the performace of their test agaist other tests for ormality. I this work a test is proposed ad the asymptotic distributioal results derived from the results by Murota ad Takeuchi (1981). A simulatio study is coducted to compare the power of this test ad six of the most recogized goodess-of-fit tests for ormality. The test performs reasoably i small sample, but excellet i large samples with respect to power, ad the test statistic is a simple ormal test which will perform better as the sample icreases. Various overview simulatio studies were coducted to ivestigate the performace of tests for ormality. Oe of the most cited papers is the work by Yap ad Sim (011), but they did ot iclude a goodess-of-fit test based o the empirical characteristic fuctio. Their work is used as a guidelie to decide which tests to iclude whe lookig at the performace of the test proposed i this work. The tests icluded are the Jarque Bera, Shapiro-Wilk, Lilliefors, Aderso-Darlig ad D Agostio ad Pearso tests. A paper which icluded a very large selectio of tests for ormality is the work by Romao et al. (013), but the test of Murota ad Takeuchi (1981) was ot icluded i this study. Murota ad Takeuchi (1981) proved that the square of the modulus of the empirical characteristic fuctio coverges weakly to a complex Gaussia process where the observatios are stadardized usig affie ivariat estimators of locatio ad scale ad they derived a expressio for the asymptotic variace. Let x,..., 1 x be a i.i.d. sample

itx of size, from a distributio F. The characteristic fuctio is E( e ) = φ( t) ad it is estimated by the ecf ˆ 1 itx ( ) j φf t = e, (1) j= 1 The studetized sample is z,..., 1 z, where z ( ˆ ) / ˆ j = x j µ σ, j = 1,...,, with ˆ µ = x ad σ = S. Deote the ecf, based o the stadardized data, by ˆ itz ( ) (1/ ) j φ t = e. ˆ The statistic proposed to test ormality is S j= 1 ν (1) = log( ˆ φ (1)/exp( 1/ ) ), () S where ( ν (1)) N(0,0.0431) asymptotically. I the simulatio study it was foud that the asymptotic ormality approximatio yields good results with respect to power for samples larger tha =50. A motivatio for the ratio form of the statistic ca be give i terms of cumulats. Thus reject ormality if ν (1) / ( 0.0431/ = 4.8168 ν (1) > z 1 α /. (3) The test ca also be writte as, reject ormality if (log( ˆ φ (1) ) ( 1/ )) / 0.076 = (log( ˆ φ (1) ) + 1/ ) / 0.076 S > z1 α /. S 3

A discussio of some of the tests where the characteristic is used ca be foud i the book by Ushakov (1999). A test based o the ecf which attracted attetio ad yielded good results is the test derived by Epps ad Pulley (1983). This test is based o the expected value of the squared modulus of the differece betwee the ecf ad the theoretical characteristic fuctio uder ormality, with respect to a weight fuctio which has a similar form as a ormal desity. Heze (1990) derived a large sample approximatio for this test, but eve the approximatio is much more complicated tha the test of Murota ad Takeuchi (1981). Methodology.1 Motivatio for the proposed test statistic A motivatio will be give i terms of the cumulat geeratig fuctio. The ormal distributio has the uique property that the cumulat geeratig fuctio caot be a fiite-order polyomial of degree larger tha two, ad the ormal distributio is the oly distributio for which all cumulats of order larger tha 3 are zero (Cramér 1946, Lukacs 197). The motivatio for the test will be show by usig the momet geeratig fuctio, but experimetatio showed that the use of the characteristic fuctio rather tha the 4

momet geeratig fuctio gives much better results whe used to test for ormality. Cosider a radom variable X with distributio F, mea µ ad variace σ. The cumulat geeratig fuctio Κ F ( t ) of F ca be writte as r Κ F ( t) = κ rt / r!, where r= 1 κ r is the r-th cumulat. The first two cumulats are κ µ κ σ 1 = E( X ) =, = Var( X ) =. Sice Κ ( t ) is the logarithm of the momet geeratig fuctio, the momet geeratig F fuctio ca be writte as F M ( t) E( e ) e Κ t F tx ( ) = =. It follows that Κ ( t ) = tx F log( E ( e )) = r= 1 κ t r r / r! =( 1 ) r κ t + κ t / + ( κrt / r!) = ( ) r= 3 r µ t + σ t / + ( κ t / r!). r= 3 r Let F N deote a ormal distributio with a mea µ ad variace σ. M N ( ) t deotes the momet geeratig fuctio of the ormal distributio with cumulat geeratig fuctio 1 Κ ( t N ) = µ t + σ t. The logarithm of the ratio of the momet geeratig fuctios of F ad F N is give by log( M ( t)/ M ( t)) = log(exp( Κ ( t) Κ ( t))) F N F N = [( µ t + 1 σ t ) + κ t r / r!] ( µ t + 1 σ t ) r r = 3 = Κ ( t) + κ t r / r! Κ ( t)) N r N r= 3 5

r = κrt / r!. (4) r= 3 If F is a ormal distributio, the sum give i (1) is zero. Replacig Κ ( t) Κ ( t) by Κ ( it) Κ ( it) it follows that F N F N log( log( φ ( t)) log( φ ( t)) ) = log( Κ ( it) Κ ( it) ) F N F N 4+ r = κ4+ rt /(4 + r)!, r= 0 which would be equal to zero whe the distributio F is a ormal distributio ad this expressio ca be used to test for ormality. Murota ad Takeuchi (1981) used the fact that the square root of the log of the modulus of the characteristic fuctio of a ormal distributio is liear i terms of t, i other words ( log( φ ( t N ) )) 1/ is a liear fuctio of t.. Asymptotic variace Let ˆ φ ( t ) deote the ecf calculated i the poit t usig studetized ormally NS distributed observatios. A asymptotic variace of ν ( t) = log( ˆ φ ( t) / exp( t / ) ) ca be foud by usig the delta method ad the results of Murota ad Takeuchi (1981). They showed that the process defied by NS Z ɶ t ˆ t t, (5) ( ) = ( φns ( ) exp( )) 6

coverges weakly to a zero mea Gaussia process ad variace 4 E( Zɶ ( t)) = 4exp( t )(cosh( t ) 1 t / ). (6) Note that ˆ φ t / NS ( t ) e = ad by applyig the delta method it follows that Var( ˆ φ ( t) ) Var( ˆ φ ( t) )( ˆ φ ( t) ), NS NS NS thus Var( ˆ φ ( t) ) Var( ˆ φ ( t) ) /( ˆ φ ( t) ). NS NS NS By applyig the delta method agai it follows that Var ν t Var ˆ φ t e 1/ ( ( )) = (log( NS ( ) / )) (1/ ˆ φ ( t) ) Var( ˆ φ ( t) ) NS NS = Var( ˆ φ ( t) ) / 4( ˆ φ ( t) ) NS 4 NS 4 = (cosh( t ) 1 t / ) /. (7) The statistic ν ( t ) coverges weakly to a Gaussia distributio with mea zero ad 4 variace Var( ν ( t)) = (cosh( t ) 1 t / ) /, ad Var( ν (1)) = 0.0431/, t = 1. (8) 7

The test used is: ν (1) /( 0.0431/ > z 1 α /. I the followig figure the average of the log the modulus calculated i the poit, t = 1, usig stadard ormally distributed samples, for various samples sizes is show. The solid lie is where studetized observatios were used ad the dashed lie where the ecf is calculated usig the origial sample. It ca be see that there is a large bias i small samples ad the studetized ecf has less variatio. -0.488-0.49 average log of the modulus -0.49-0.494-0.496-0.498-0.5-0.50 0 100 00 300 400 500 600 700 800 900 1000 Fig. 1 Plot of the average log of the modulus of the ecf for various sample sizes calculated usig m = 5000 calculated usig samples form a stadard ormal distributio. The solid lie is where studetized observatios are used ad the dashed lie usig the origial sample. Calculated i the poit t=1, ad the expected value is -0.5. 8

I the followig histogram 5000 simulated values of ν (1) /( 0.0431/ = 4.8168 ν (1) are show, where the ν (1)' s are calculated usig simulated samples of size = 1000 from a stadard ormal distributio. 100 1000 800 frequecy 600 400 00 0-4 -3 - -1 0 1 3 4 5 ν Fig. Histogram of 5000 m = simulated values of v (1) / ( 0.0431/ ), with = 1000. Calculated usig ormally distributed samples, data stadardized usig estimated parameters. I Figure 3 the variace of ν (1) is estimated for various sample sizes, based o 1000 estimated values of ν (1) for each sample size cosidered. The estimated variaces is plotted agaist the asymptotic variace Var( v ) = 0.0431/. 9

1.8 x 10-3 1.6 asymptotic ad estimated variace 1.4 1. 1 0.8 0.6 0.4 0. 0 0 100 00 300 400 500 600 700 800 900 1000 - sample size Fig. 3 Estimated variace of v ad asymptotic variace for various values of. Dashed lie the estimated variace. Estimated variace calculated usig 1000 ormally distributed samples, data stadardized usig estimated parameters. 3. Simulatio study The paper of Yap ad Sim (011) is used as a guidelie to compare the proposed test agaist other tests for ormality. The proposed test will be deoted by ECFT i the tables. The power of the test will be compared agaist several tests for ormality: The Lilliefors test (LL), Lilliefors (1967) which is a slight modificatio of the Kolmogorov-Smirov test for where parameters are estimated. The Jarque-Bera test (JB), Jarque ad Bera (1987), where the skewess ad kurtosis is combied to form a test statistics. 10

The Shapiro-Wilks test (SW), Shapiro ad Wilk (1965). This test makes use of properties of order statistics ad were later developed to be used for large samples too by Roysto (199). The Aderso-Darlig test (AD), Aderso ad Darlig. The D Agostio ad Pearso test (DP), D Agostio ad Pearso (1973). This statistic combies the skewess ad kurtosis to check for deviatios from ormality. Samples are geerated from a few symmetric uimodal distributios with sizes = 30,50,100, 50,500, 750,1000. The proportio rejectios are reported based o m=5000 repetitios. The test are coducted at the 5% level ad for the ecf, the ormal approximatio is used. The followig symmetric distributios are cosidered, uiform o the iterval [0,1], the logistic distributio with mea zero the stadard t-distributio ad the Laplace distributio with mea zero ad scale parameter oe. The stadard t -distributio with 4, 10 ad 15 degrees of freedom. Skewed distributios ad multimodal distributios would ot be ivestigated, sice i large samples such samples ca be already excluded with certaity as beig ot from a ormal distributio by lookig at the histograms. All the tests performed for the ecf test were coducted usig the ormal approximatio, but percetiles ca also easily be simulated. Simulated estimates of the Type I error for = 30,50,100, 50,500, 750,1000, are give i Table 1 based o m = 5000 simulated samples. The simulated samples are stadard ormally distributed ad studetized to calculate the Type I error. 11

Type I error Normality assumptio LL JB SW AD DP 30 0.0438 50 0.0466 100 0.044 50 0.044 500 0.0488 750 0.0498 1000 0.0496 0.0534 0.051 0.0580 0.0506 0.055 0.0480 0.0514 0.053 0.0494 0.0570 0.0504 0.0460 0.0444 0.0468 0.0500 0.054 0.0506 0.045 0.0516 0.055 0.0498 0.0490 0.0430 0.0480 0.0540 0.0460 0.0480 0.0414 0.0488 0.056 0.0500 0.0510 0.0404 0.0530 0.0544 Table 1 Simulated percetiles to test for ormality at the 5% level. Calculated from m = 5000 simulated samples of size each i the poit t=1. I Table the rejectio rates, whe testig at the 5% level ad symmetric distributios, are show for various sample sizes based o 10000 samples each time. I table 1 the t-distributio where ot all momets exist is cosidered. The JB, DP ad ECFT tests performs best, ad i large samples the ECFT test performs the best. 1

ECFT LL JB SW AD DP 50 0.5798 0.3066 0.5510 0.4778 0.430 0.514 100 0.8144 0.4896 0.7810 0.715 0.6514 0.7340 t(4) 50 0.9836 0.8354 0.9754 0.9606 0.9406 0.9648 500 0.9998 0.9866 0.9998 0.999 0.9974 0.9990 750 1.0000 0.9990 1.0000 1.0000 1.0000 1.0000 1000 1.0000 0.9998 1.0000 1.0000 1.0000 1.0000 50 0.1896 0.0886 0.1900 0.1446 0.1184 0.1768 100 0.3194 0.1100 0.3098 0.5 0.1614 0.7 t(10) 50 0.5604 0.1766 0.540 0.405 0.80 0.4644 500 0.8078 0.85 0.757 0.6376 0.4838 0.703 750 0.9110 0.398 0.869 0.7866 0.6396 0.8336 1000 0.963 0.5136 0.9400 0.884 0.7688 0.9170 50 0.1376 0.061 0.1418 0.1050 0.076 0.1358 100 0.114 0.0794 0.08 0.1408 0.1078 0.1778 t(15) 50 0.3344 0.0934 0.3110 0.160 0.1384 0.694 500 0.518 0.140 0.4834 0.36 0.318 0.41 750 0.6558 0.1694 0.6038 0.4704 0.31 0.5460 1000 0.771 0.180 0.7184 0.5868 0.4004 0.667 Table Simulated power of ormality tests. Rejectio proportios whe testig for ormality at the 5% level. 13

1 0.9 0.8 power - ECFT, JB, DP - t(10) 0.7 0.6 0.5 0.4 0.3 0. 0.1 0 100 00 300 400 500 600 700 800 900 1000 Fig. 4 Plot of the three best performig tests with respect to power, testig for ormality, data t- distributed with 10 degrees of freedom. Solid lie, ecf test, dashed lie JB test ad dash-dot the DP test. 14

ECFT LL JB SW AD DP 50 0.1360 0.66 0.0096 0.577 0.5778 0.795 U(0,1) 100 0.9510 0.5910 0.7410 0.9870 0.95 0.9976 50 1.0000 0.9866 1.0000 1.0000 1.0000 1.0000 500 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 0.6196 0.4398 0.5660 0.5164 0.548 0.5138 Laplace 100 0.8686 0.7008 0.806 0.7888 0.838 0.7448 50 0.9958 0.976 0.9848 0.9884 0.994 0.978 500 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 50 0.758 0.119 0.656 0.1976 0.1716 0.416 100 0.440 0.154 0.393 0.960 0.416 0.348 Logistic 50 0.7150 0.748 0.643 0.5416 0.4436 0.5770 500 0.9338 0.5068 0.8930 0.80 0.7480 0.8506 750 0.984 0.6958 0.9670 0.9370 0.908 0.9504 1000 0.9978 0.880 0.9904 0.980 0.966 0.987 Table 3 Simulated power of ormality tests. Rejectio proportios whe testig for ormality at the 5% level. It ca be see the ECFT outperforms the other tests with respect to power i large samples, especially whe testig data from a logistic distributio. Samples were simulated from a mixture of two ormal distributios, with a proportio α from a stadard ormal ad a proportio 1 α from a ormal with variace σ. This ca also be cosidered as a cotamiated distributio. The results are show i Table 4. The proposed test yielded good results. 15

( σ, α ) ECFT LL JB SW AD DP 30 0.518 0.1056 0.580 0.195 0.1596 0.478 50 0.383 0.1378 0.3676 0.79 0.134 0.3398 100 0.5986 0.016 0.5640 0.4484 0.3344 0.5016 (,0.) 50 0.8834 0.3630 0.8506 0.7646 0.600 0.8016 500 0.987 0.6378 0.9784 0.9546 0.8790 0.9684 750 0.9980 0.838 0.9978 0.9934 0.9746 0.9958 1000 1.0000 0.94 1.0000 0.9998 0.9948 1.0000 30 0.0884 0.0676 0.0940 0.0816 0.075 0.0918 50 0.110 0.0710 0.1074 0.0868 0.0814 0.0984 100 0.1634 0.0894 0.1406 0.0994 0.1100 0.1166 (0.5,0.) 50 0.784 0.1446 0.084 0.149 0.050 0.165 500 0.4406 0.448 0.30 0.380 0.353 0.55 750 0.5766 0.3600 0.48 0.36 0.5004 0.3566 1000 0.7040 0.4646 0.5398 0.4678 0.6346 0.460 30 0.034 0.1066 0.197 0.156 0.148 0.1896 50 0.98 0.136 0.634 0.1970 0.1946 0.358 100 0.4954 0.136 0.4178 0.33 0.39 0.3538 (.0,0.5) 50 0.844 0.457 0.7160 0.6416 0.6696 0.600 500 0.9788 0.7760 0.940 0.96 0.937 0.906 750 0.9988 0.930 0.9888 0.9846 0.9876 0.9796 1000 1.0000 0.9838 0.999 0.9986 0.9990 0.997 Table 4 Simulated power of ormality tests. Rejectio proportios whe testig for ormality at the 5% level. Mixture of two ormal distributios (cotamiated data) 16

1 0.9 power - ECFT,JB,AD - mixture ormal 0.8 0.7 0.6 0.5 0.4 0.3 0. 0.1 0 100 00 300 400 500 600 700 800 900 1000 Fig 5 Plot of the three best performig tests with respect to power, testig for ormality, data mixture of ormal distributios with two compoets. Mixture.5N(0,1)+0.5N(0,0,16). Solid lie, ecf test, dashed lie JB test ad dash-dot the AD test. 4 Coclusios The proposed test performs better with respect to power i large samples tha the other tests for ormality for the distributios cosidered i the simulatio study. I small samples of say less tha = 50, it was foud that the test of D Agostio ad Pearso (1973) was ofte either the best performig or close to the best performig test. I practice oe would ot test data from a skewed distributio for ormality i large samples. The simple ormal test approximatio will perform better, the larger the 17

sample is. The asymptotic ormality ad variace properties, which is of a very simple form, ca be used i large samples. This test ca be recommeded as probably the test of choice i terms of power ad easy of applicatio i large samples. Refereces Aderso, T.W., Darlig, D.W. (1954). A test of goodess of fit. J. Amer. Statist. Assoc., 49 (68), 765 769. Cramér, H. (1946). Mathematical Methods of Statistics. NJ: Priceto Uiversity Press, Priceto, NJ. D Agostio, R., Pearso, E. S. (1973). Testig for departures from ormality. Biometrika, 60, 613-6. Epps, T.W. ad Pulley, L.B. (1983). A test for ormality based o the empirical characteristic fuctio. Biometrika, 70, 73-76. Heze, N. (1990), A Approximatio to the Limit Distributio of the Epps-Pulley Test Statistic for Normality. Metrika, 37, 7 18. Jarque, C.M., Bera, A.K. (1987). A test for ormality of observatios ad regressio residuals. It. Stat. Rev., 55 (), 163 17. Lilliefors, H.W. (1967). O the Kolmogorov-Smirov test for ormality with mea ad variace ukow. J. America. Statist. Assoc., 6, 399 40. Lukacs, E. (197). A Survey of the Theory of Characteristic Fuctios. Advaces i Applied Probability, 4 (1), 1 38. Murota, K ad Takeuchi, K. (1981). The studetized empirical characteristic fuctio ad its applicatio to test for the shape of distributio. Biometrika; 68, 55-65. 18

Romão, X, Delgado, R, Costa, A. (010). A empirical power compariso of uivariate goodess-of-fit tests for ormality. J. of Statistical Computatio ad Simulatio.; 80:5, 545 591. Roysto, J.P. (199). Approximatig the Shapiro-Wilk W-test for o-ormality. Stat. Comput.,, 117 119. Shapiro, S.S., Wilk, M.B. (1965). A aalysis of variace test for ormality (complete samples). Biometrika 5, 591-611. Ushakov, N.G. (1999). Selectic Topics i Characteristic Fuctios. VSP, Utrecht. Yap, B.W., Sim, C.H. (011). Comparisos of various types of ormality tests. J. of Statistical Computatio ad Simulatio, 81 (1), 141-155. 19