Nonparametric Goodness-of-Fit Tests for Discrete, Grouped or Censored Data 1

Similar documents
Chapter 8 Hypothesis Testing

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Summation Method for Some Special Series Exactly

Lecture 8. Dirac and Weierstrass

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

STK4011 and STK9011 Autumn 2016

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

After the completion of this section the student. V.4.2. Power Series Solution. V.4.3. The Method of Frobenius. V.4.4. Taylor Series Solution

Power Comparison of Some Goodness-of-fit Tests

Fluids Lecture 2 Notes

What is a Hypothesis? Hypothesis is a statement about a population parameter developed for the purpose of testing.

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION

Construction of Control Chart for Random Queue Length for (M / M / c): ( / FCFS) Queueing Model Using Skewness

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Sampling, Sampling Distribution and Normality

Observer Design with Reduced Measurement Information

Certain inclusion properties of subclass of starlike and convex functions of positive order involving Hohlov operator

Chapter 13, Part A Analysis of Variance and Experimental Design

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Local Estimates for the Koornwinder Jacobi-Type Polynomials

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

A statistical method to determine sample size to estimate characteristic value of soil parameters

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

A proposed discrete distribution for the statistical modeling of

One way Analysis of Variance (ANOVA)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

(Dependent or paired samples) Step (1): State the null and alternate hypotheses: Case1: One-tailed test (Right)

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2

Unit 5: Assumptions and Robustness of t-based Inference. Chapter 3 in the Text

ε > 0 N N n N a n < ε. Now notice that a n = a n.

On generalized Simes critical constants

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

Problem Set 4 Due Oct, 12

Stat 200 -Testing Summary Page 1

Properties and Hypothesis Testing

GG313 GEOLOGICAL DATA ANALYSIS

Basic Probability/Statistical Theory I

The Use of L-Moments in the Peak Over Threshold Approach for Estimating Extreme Quantiles of Wind Velocity

Sample Size Determination (Two or More Samples)

Fall 2013 MTH431/531 Real analysis Section Notes

5. Likelihood Ratio Tests

Stat 319 Theory of Statistics (2) Exercises

Simulation. Two Rule For Inverting A Distribution Function

NCSS Statistical Software. Tolerance Intervals

Last Lecture. Wald Test

Topic 9: Sampling Distributions of Estimators

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

SYNTHESIS OF SIGNAL USING THE EXPONENTIAL FOURIER SERIES

Optimal Penalty Functions Based on MCMC for Testing Homogeneity of Mixture Models

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Testing Statistical Hypotheses for Compare. Means with Vague Data

Supplementary Material for: Classical Testing in Functional Linear Models

ON STEREOGRAPHIC CIRCULAR WEIBULL DISTRIBUTION

Lecture 1 Probability and Statistics

Sx [ ] = x must yield a

Lecture 12: November 13, 2018

An Introduction to the Theory of Imprecise Soft Sets

Summary. Recap ... Last Lecture. Summary. Theorem

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Stat 421-SP2012 Interval Estimation Section

Analog Filter Synthesis

General IxJ Contingency Tables

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Estimation for Complete Data

Topic 9: Sampling Distributions of Estimators

Chapter 18 Summary Sampling Distribution Models

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

The beta density, Bayes, Laplace, and Pólya

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Output Analysis (2, Chapters 10 &11 Law)

THE MEASUREMENT OF THE SPEED OF THE LIGHT

The Bootstrap, Jackknife, Randomization, and other non-traditional approaches to estimation and hypothesis testing

11 Correlation and Regression

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Bayesian and E- Bayesian Method of Estimation of Parameter of Rayleigh Distribution- A Bayesian Approach under Linex Loss Function

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Lecture 19: Convergence

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Ion Vladimirescu, Radu Tunaru

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Measurement uncertainty of the sound absorption

MULTILEVEL ANALYSIS OF DELAMINATION INITIATED NEAR THE EDGES OF COMPOSITE STRUCTURES

Bootstrap Intervals of the Parameters of Lognormal Distribution Using Power Rule Model and Accelerated Life Tests

PREDICTION INTERVALS FOR FUTURE SAMPLE MEAN FROM INVERSE GAUSSIAN DISTRIBUTION

Transcription:

Noparametri Goodess-of-Fit Tests for Disrete, Grouped or Cesored Data Boris Yu. Lemeshko, Ekateria V. Chimitova ad Stepa S. Kolesikov Novosibirsk State Tehial Uiversity Departmet of Applied Mathematis Karl Marx 639 Novosibirsk, Russia (e-mail: headrd@fpm.ami.stu.ru) Abstrat. The problems of appliatio of oparametri Kolmogorov, Cramer-vo Mises- Smirov, Aderso-Darlig goodess-of-fit tests for disrete, grouped ad esored data have bee osidered i this paper. The use of these tests for grouped ad esored data as well as samples of disrete radom variables is based o Smirov trasformatio. The overgee of statisti distributios to the orrespodig limitig distributio laws has bee ivestigated uder true ull hypothesis by meas of statistial simulatio methods, as well as the test power agaist lose ompetig hypotheses. For disrete ad grouped data the riteria have bee ompared by power with Pearso hi-squire test. The riteria have bee also ompared by power with the modified oparametri tests for esored samples. Keywords: Goodess-of-fit tests; disrete, grouped, esored data; Smirov trasformatio; Kolmogorov test, Cramer-vo Mises-Smirov test, Aderso-Darlig test. Itrodutio I ase of disrete or grouped data there are o evidet problems with testig simple hypotheses about goodess-of-fit of a empirial distributio to theoretial law oly if χ goodess-of-fit tests are beig used. Diret appliatio of Kolmogorov, ω Cramer-vo Mises-Smirov or Ω Aderso-Darlig tests is impossible, as the limitig statisti distributios for these riteria are obtaied o the assumptio of radom variable otiuity. For testig simple goodess-of-fit hypotheses from right ad/or left esored samples oe a use the Reyi test [Reyi, 953], Kolmogorov-Smirov [Barr ad Davidso, 973], ω Cramer-vo Mises-Smirov or Ω Aderso-Darlig [Pettitt ad Stephes, 976] modified tests. However i ase of esored data, these riteria have a umber of disadvatages embarrassig their appliatio i pratie. I partiular, Reyi statisti distributio overges to the limitig law very slowly, espeially for high or, o the otrary, low esorig degree [Lemeshko ad Chimitova, 4]. The distributios of modified Kolmogorov-Smirov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests overge rather quikly This researh was supported by the Russia Foudatio for Basi Researh, projet o. 6--59

Lemeshko et al. to the orrespodig limitig laws for small esorig degree [Lemeshko ad Chimitova, 4]. Appliatio of the riteria for esored data has t bee realized almost i ay kow for us software system of statistial aalysis. Ad hee they are hardly available for a large umber of speialists. M. Nikuli has attrated our attetio to the possibility of effetive appliatio of oparametri goodess-of-fit tests for the aalysis of grouped ad esored data ad samples of disrete radom variables by meas of Smirov trasformatio ad the radomizatio, eablig to move from stairase ad disotiuous distributio futio to the otiuous oe [Greewood ad Nikuli, 6]. The advatages of suh approah are evidet as we move to the problem of testig goodess-of-fit of the empirial distributio obtaied after trasformatios to the otiuous (uiform) distributio law. Smirov trasformatio is used rather ofte i statistial aalysis. Let us test whether the radom sample X, X,..., X orrespods to the law with distributio futio F (x). The trasformatio U = F x ) overts the X X,..., X, observed sample of radom variables ito the sample of values uiformly distributed o the iterval [, ]. The the hypothesis about belogig U U,..., U, of to the uiform law a be tested, for example, usig the Kolmogorov riterio with statisti D = sup u F ( u) u where F (u) is the empirial distributio futio. i ( i, () The radomizatio as a tehique of oversio of grouped ad esored data ad disrete variable observatios to the otiuous variable observatio is really appliable oly i omputer aalysis. The purpose of the paper is to ivestigate some pratial aspets of appliatio of lassial goodess-of-fit tests for the aalysis of grouped ad esored data ad disrete variable observatios i ase of usig the Smirov trasformatio with radomizatio. I the paper it has bee studied the overgee of statisti distributios to the orrespodig limitig laws, as well as the power of the osidered riteria for testig lose ompetig hypotheses. Grouped ad disrete data Let us test simple hypothesis about goodess-of-fit of grouped sample to the theoretial distributio law F (x). Grouped sample of the size is give with the boudary poits itervals, x ad k x < x <... < x k < x, where k is the umber of k x are the left ad right boudaries of the radom variable domai respetively, ad the umber of observatios i falle ito the i -th

k Noparametri Goodess-of-Fit Tests 3 iterval, i =. Assume Y ij ( i =,... k, j=,... i ) are idepedet i= realizatios of the radom variable uiformly distributed o [,]. The the radom variables obtaied with radomizatio o the groupig itervals x, ( i, xi ] U F x ) + Y [ F( x ) F( x )], i,..., k ij = ( i ij i i =, j =,...,i, () are idepedet ad uiformly distributed o [,]. The statemet () allows [Greewood ad Nikuli, 6] to move from grouped sample to omplete sample of idividual observatios uiformly distributed o [,]. After that oe a test the simple hypothesis about goodessof-fit of the empirial distributio, built by the sample of values U, i =,..., k, j =,..., i, to the uiform distributio usig ay oparametri goodess-of-fit test. A sample of observatios X, X,..., X of some disrete radom variable a be similar to the grouped ase trasformed to the sample of uiformly distributed observatios U F( X ) + Y [ F( X ) F( X )], i =,...,, (3) i = i i i i = limf( x z ad Y, Y,..., Y are z idepedet where F( x ) ) realizatios of the radom variable uiformly distributed o [,]. I radomizatio the values Y ad Y i the statemets () ad (3) have to be ij i simulated i aordae with the uiform distributio o [,]. I [Lemeshko ad Postovalov, ] it was show that oparametri goodess-of-fit test statisti distributios i ase of otiuous distributio laws ad omplete samples overge to orrespodig limitig laws very quikly. The limitig laws a be already used with without risk of makig a great mistake. Noparametri goodess-of-fit test statisti distributios have bee ivestigated for disrete radom variables ad grouped samples of otiuous values with the usage of osidered approah. It has bee show that empirial distributios of oparametri test statistis also overge with the sample size growth to the orrespodig limitig laws very fast. For example, i the figure the limitig Kolmogorov law K (S) ad obtaied after simulatio of empirial distributio of Kolmogorov test statisti G K H ) are show. The true ( hypothesis H uder test is about goodess-of-fit to the ormal law. The empirial distributio is built by N = grouped samples of the size = with k = groupig itervals i ase of asymptotially optimal groupig method. ij

4 Lemeshko et al. As the Kolmogorov test statisti we have used the statisti with Bolshev s orretio 6 D + K 6 + + i where D = max{ D, D }, D = max F( X ( i) ), i i D = max F( X ( i) ). i =, (4) Fig.. The empirial distributio futio of statisti (4) ad the limitig Kolmogorov distributio law The empirial distributio of the Kolmogorov statisti perfetly fits the Kolmogorov law K (S) eve for =. This fat is also ofirmed with the * values of ahieved sigifiae level P { S > S } while testig hypothesis about goodess-of-fit of the sample of statisti s (4) values to the Kolmogorov distributio K (S) with χ Pearso, ω Cramer-vo Mises-Smirov, Ω * Aderso-Darlig ad Kolmogorov riteria. S is the value of orrespodig goodess-of-fit test statisti. The similar results about overgee of statisti distributios to the limitig laws for grouped data ad disrete radom variables have bee obtaied for Cramer-vo Mises-Smirov ad Aderso-Darlig riteria. It has bee also

Noparametri Goodess-of-Fit Tests 5 show that the rate of overgee of G S H ) to the limitig laws of ( statisti S does ot deped o the groupig method ad the umber of groupig itervals k. 3 Cesored data Let X,...,, X X be a sample of idepedet similarly distributed radom variables. A set of values X ( ) X ()... X ( r) ( X ( r) X ( r+ )... X ( ) ) is alled a right/left esored sample, where r< is the umber of omplete observatios, ad the rest r observatios are esored. The modifiatios of oparametri Kolmogorov-Smirov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests are itrodued i [Barr ad Davidso, 973], [Pettitt ad Stephes, 976] for testig goodess-of-fit by esored samples. I partiular the Kolmogorov statisti for esored data is defied by K = sup F( x) F ( x), where M = { x : F( x) a} for left esorig M ad M = { x : F( x) a} for right esorig, a (,) is the esorig degree. The limitig distributio of the Kolmogorov statisti K for esored data is give as [Barr ad Davidso, 973] + i a S a P{ K < S} = ( ) exp( i S ) P X is < = K ( S) i= a a a where X is the stadard ormal radom variable. Whe a = the limitig distributio of statistis K oiides with the Kolmogorov distributio K (S). As before it is possible to move from a esored sample to the sample of radom variables U, U,..., U, uiformly distributed o [,]. I ase of right esorig we have U = F ), U = F ),, U = F ), ad ( X ( ) r, Ur+ U ( X ( ) r ( X (r ) the values U +,..., are simulated uiformly o the iterval [ F ( x ), ], where x is the esorig poit. I ase of the first type esorig the poit x is fixed ad the umber of omplete observatios r is radom. I the seod type esorig the last (first) observed value i sample is take as x. Classial Kolmogorov, Cramer-vo Mises-Smirov ad Aderso-Darlig tests a be applied to aalyze trasformed sample. The empirial distributios of statisti (4) ad modified Kolmogorov statisti by the esored sample are represeted i the figure. The orrespodig limitig distributios are give i the figure for ompariso. Statisti s values are

6 Lemeshko et al. alulated by right esored samples from the expoetial distributio of the sample size = ad esorig degree 8% (the right part of radom variable domai is iaessible for observatio, probability to fall i whih is equal to a =.8 ). The empirial distributio of Kolmogorov statisti K, alulated from the trasformed samples, perfetly agrees with the limitig law K( S ) already for =. At the same time the empirial distributio of the modified Kolmogorov statisti K, applied diretly to esored samples of the same size, essetially a differ from the limitig law K ( S ). Fig.. The distributios of Kolmogorov test statisti i testig goodess-of-fit to the expoetial law i ase of = ad esorig degree 8% Distributios of statisti K (with Smirov trasformatio ad radomizatio) have bee ivestigated with differet types ad degrees of esorig. It has bee show that the rate of overgee of empirial distributios G K H ) to K (S) does ot deped o the type ad degree of ( esorig. Similar results have bee obtaied for ω Cramer-vo Mises- Smirov ad Ω Aderso-Darlig tests. Empirial distributios G( K H ) agree with the limitig law K a (S) rather well begiig with = 3 oly whe esorig degree is less tha 5% ( a <. 5 ). If the esorig degree ireases up to 95%, suffiiet loseess of

Noparametri Goodess-of-Fit Tests 7 G( K H ) to K a (S) takes plae if 5 [Lemeshko ad Chimitova, 4]. 4 Some remarks o the test power There is o doubt that olusios obtaied i [Lemeshko et al., 7], oerig the omparative aalysis of the test power for lose ompetig hypotheses, are also plae for grouped samples. For esored data it is worth omparig the power of lassial riteria applied to the trasformed data with the power of modified for esored samples tests [Barr ad Davidso, 973], [Pettitt ad Stephes, 976]. For example, the power of modified Kolmogorov test essetially depeds o the esorig degree. By meas of statistial modelig methods we have show that the higher esorig degree the more modified Kolmogorov test exeeds by power the Kolmogorov test with Smirov trasformatio ad radomizatio. For small esorig degrees (approximately up to 3%) these riteria are lose by power. Fig. 3. The distributios of Kolmogorov statisti for the true hypothesis H ad H The illustratio (fig. 3) shows two ases of modified Kolmogorov test statisti distributios applied to esored sample ad two ases of test statisti distributios alulated by the trasformed sample U, U,..., U. I the first ase the hypothesis H, the Weibull distributio with the form parameter 3, is true; ad i the seod ase the ompetig hypothesis H, the Weibull

8 Lemeshko et al. distributio with the form parameter 3.5, is true. The sample size = 3, seod type right esorig, the esorig degree a =. 5. 5 Colusio The results of ivestigatio eable to olude a good possibility to use the approah osidered (Smirov trasformatio with radomizatio) for orret appliatio of lassial oparametri goodess-of-fit tests for grouped ad esored data ad samples of disrete radom variables. I ase of simple hypothesis testig, oparametri statisti distributios overge to statisti limitig distributios very quikly. For the sample size oe a use the limitig laws without risk of makig a great mistake. The ifluee of groupig methods o the power of oparametri goodessof-fit tests should be ivestigated i more detail. The appliatio of Smirov trasformatio with radomizatio is quite effiiet for realizatio i software systems of statistial aalysis. It expads the possibilities of the lassial oparametri goodess-of-fit tests` appliatio to grouped data ad disrete radom variables. Referees [Barr ad Davidso, 973] Barr D.M., Davidso T. A Kolmogorov-Smirov test for esored samples. Tehometris, 973. V. 5. N. 4. [Greewood ad Nikuli, 996] Greewood P.E., Nikuli M.S. A Guide to Chi-Squared Testig. Joh Wiley & Sos, I. 996. 8 p. [Lemeshko ad Chimitova, 4] Lemeshko B.Yu., Chimitova E.V. Ivestigatio of the estimates properties ad goodess-of-fit test statistis from esored samples with omputer modelig tehique // Proeedigs of the Seveth Iteratioal Coferee Computer Data Aalysis ad Modelig: Robustess ad Computer Itesive Methods, September 6-, 4, Misk. Vol.. P. 43-46 [Pettitt ad Stephes, 976] Pettitt A.N., Stephes M.A. Modified Cramer vo Mises statistis for esored data // Biometrika, 976. V. 63. N.. [Reyi, 953] Reyi A. O the theory of order statistis // Ata Mathem. Aad. Si. Hug. 953. Vol. 4. P. 9-3. [Lemeshko ad Postovalov, ] Lemeshko B.Yu., Postovalov S.N. O the depedee of oparametri test statisti distributios ad the test power o parameter estimatio method // Zavodskaya Laboratoriya. Diagostika materialov.. Vol. 67. - 7. - P. 6-7. (i Russia) [Lemeshko et al., 7] Lemeshko B.Yu., Lemeshko S.B., Postovalov S.N. Power goodess-of-fit tests at lose alteratives // Izmeritelaya Tehika. 7.. P. -7. (i Russia)