Florida Iteratioal Uiversity FIU Digital Commos FIU Electroic Teses ad Dissertatios Uiversity Graduate Scool -4-04 A Alterative Goodess-of-fit Test for Normality wit Ukow Parameters Weilig Si amadasi335@gmail.com DOI: 0.548/etd.FI40735 Follow tis ad additioal works at: ttp://digitalcommos.fiu.edu/etd Part of te Applied Statistics Commos, Statistical Models Commos, ad te Statistical Teory Commos Recommeded Citatio Si, Weilig, "A Alterative Goodess-of-fit Test for Normality wit Ukow Parameters" 04). FIU Electroic Teses ad Dissertatios. 63. ttp://digitalcommos.fiu.edu/etd/63 Tis work is brougt to you for free ad ope access by te Uiversity Graduate Scool at FIU Digital Commos. It as bee accepted for iclusio i FIU Electroic Teses ad Dissertatios by a autorized admiistrator of FIU Digital Commos. For more iformatio, please cotact dcc@fiu.edu.
FLORIDA INTERNATIONAL UNIVERSITY Miami, Florida AN ALTERNATIVE GOODNESS-OF-FIT TEST FOR NORMALITY WITH UNKNOWN PARAMETERS A tesis submitted i partial fulfillmet of te requiremets for te degree of MASTER OF SCIENCE i STATISTICS by Weilig Si 04
To: Iterim Dea Micael R. Heitaus College of Arts ad Scieces Tis tesis, writte by Weilig Si, ad etitled a Alterative Goodess-of-Fit Test for Normality wit Ukow Parameters, avig bee approved i respect to style ad itellectual cotet, is referred to you for judgmet. We ave read tis tesis ad recommed tat it be approved. Gauri Gai Florece George Date of Defese: November 4, 04 Te tesis of Weilig Si is approved. Zemi Ce, Major Professor Iterim Dea Micael R.Heitaus College of Arts ad Scieces Dea Laksmi N. Reddi Uiversity Graduate Scool Florida Iteratioal Uiversity, 04 ii
Copyrigt 04 by Weilig Si All rigts reserved. iii
DEDICATION I dedicate tis tesis to my parets. Te completio of tis tesis would ot be possible witout teir love, support ad ecouragemet. iv
ACKNOWLEDGMENTS I am deeply grateful to my major professor ad metor Dr. Zemi Ce. Wit Dr. Ce s cosistet support, guidace ad elp I fiised my master s study at Florida Iteratioal Uiversity. Dr. Ce s professio, resposibility ad carig eert a importat ifluece i my life. I beefited from Dr. Ce s wisdom ad approac to researc. All is virtues will be a valuable treasure of my wole life i te future. I would also like to tak te members of my tesis committee, Dr. Gauri Gai ad Dr. Florece George, for teir time, advice ad reviewig my tesis. I te ed, I would like to tak my parets for teir uderstadig, ecouragemet ad love trougout te years. Tey gave me te warmest family oe as ever wised for. v
ABSTRACT OF THE THESIS AN ALTERNATIVE GOODNESS-OF-FIT TEST FOR NORMALITY WITH UNKNOWN PARAMETERS by Weilig Si Florida Iteratioal Uiversity, 04 Miami, Florida Zemi Ce, Major Professor Goodess-of-fit tests ave bee studied by may researcers. Amog tem, a alterative statistical test for uiformity was proposed by Ce ad Ye 009). Te test was used by Xiog 00) to test ormality for te case tat bot locatio parameter ad scale parameter of te ormal distributio are kow. Te purpose of te preset tesis is to eted te result to te case tat te parameters are ukow. A table for te critical values of te test statistic is obtaied usig Mote Carlo simulatio. Te performace of te proposed test is compared wit te Sapiro-Wilk test ad te Kolmogorov-Smirov test. Mote-Carlo simulatio results sow tat proposed test performs better ta te Kolmogorov-Smirov test i may cases. Te Sapiro Wilk test is still te most powerful test altoug i some cases te test proposed i te preset researc performs better. vi
TABLE OF CONTENTS CHAPTER PAGE I. INTRODUCTION... Itroductio... Basic Idea... 4 II. METHODOLOGY... 7 Metodology of Tree Tests... 7 Power Study... 3 III. POWER COMPARISON... 5 Alterative Distributios... 5 Summary of Power Study... 9 IV. CONCLUSION AND DISCUSSION... 3 V. REFERENCES... 37 vii
LISTS OF TABLES TABLE PAGE. G Test Critical Value Table... 5. Power Compariso: V-Sape triagle =0.5)... 6 3. Power Compariso: V-Sape triagle =0.5)... 6 4. Power Compariso: V-Sape triagle =0.75)... 7 5. Power Compariso: Beta α=4, β=)... 7 6. Power Compariso: Beta α=0.5, β=0.5)... 8 7. Power Compariso: Beta α=, β=4)... 8 8. Power Compariso: Beta α=, β=)... 9 9. Power Compariso: Triagle =0.5)... 9 0. Power Compariso: Triagle =0.5)... 30. Power Compariso: Triagle =0.75)... 30 viii
LISTS OF FIGURES FIGURE PAGE. Power Compariso: V-Sape triagle =0.5)... 3. Power Compariso: V-Sape triagle =0.5)... 3 3. Power Compariso: V-Sape triagle =0.75)... 3 4. Power Compariso: Beta α=4, β=)... 3 5. Power Compariso: Beta α=0.5, β=0.5)... 33 6. Power Compariso: Beta α=, β=4)... 33 7. Power Compariso: Beta α=, β=)... 34 8. Power Compariso: Triagle =0.5)... 34 9. Power Compariso: Triagle =0.5)... 35 0. Power Compariso: Triagle =0.75)... 35 i
CHAPTER I INTRODUCTION. Itroductio Te goodess-of-fit test is a particular useful statistical model for testig weter observed data are represetative of a particular distributio. A goodess-of-fit test ca summarize te discrepacy betwee observed values ad te values epected uder ay give model. Numerous researc papers ave bee publised by scietists cocerig tese tests. Tere are may eistig test statistics icludig some commoly used goodess-of-fit tests suc as te Ci-squared test Pearso, 900), te Kolmogorov-Smirov test Kolmogorov, 933 ad Smirov,939), te Cramer-Vo Mises test Cramer,98 ad vo Mises), ad te Aderso-Darlig test Aderso ad Darlig, 95). All tese commoly used statistical tests ca be used to test ormality. Te Ci-squared test is te most importat member of te oparametric family of statistical tests because it as some attractive features icludig te fact tat it ca be applied to ay uivariate distributio ad calculated muc easier ta oter test statistics. It is used for quatitative ad bied data. For o-bied data, a istogram or frequecy table sould be costructed to put te data ito te categories before te Ci-squared test is used. However, te values of te Ci-squared test are affected by skewess ad kurtosis. Plus, it is sesitive to te sample size. Te Ci-squared test as reduced power especially for te small sample size uder 50. Te Kolmogorov-Smirov test K-S test) is also a oparametric test for te equality of cotiuous, oe-dimesioal probability distributios tat ca be used to
compare a sample wit a referece probability distributio, or to compare two samples. Te K-S test relies o te fact tat te value of te sample cumulative desity fuctio is asymptotically ormally distributed. Te Kolmogorov Smirov statistic quatifies a distace betwee te empirical distributio fuctio of te sample ad te cumulative distributio fuctio of te referece distributio, or betwee te empirical distributio fuctios of two samples. However, te K-S test teds to be more sesitive ear te ceter of te distributio ta it is at te tails of te distributio. Additioally, te most serious limitatio is tat te distributio must be fully specified. If locatio, scale, ad sape parameters are estimated from te data, te critical regio of te K-S test is o loger valid. Te K-S test statistic typically must be determied by simulatio. Various studies ave foud tat, eve i tis corrected form, te test is less powerful for testig ormality ta te Sapiro-Wilk test or te Aderso Darlig test. Te Aderso-Darlig test is a modificatio of te K-S test wic gives more weigt to te tails of te distributio ta te K-S test. Te K-S test is distributio free i te sese tat te critical values do ot deped o te specific distributio beig tested, wile te Aderso-Darlig test makes use of te specific distributio i calculatig critical values. Te Aderso-Darlig test as te advatage of allowig a more sesitive test ta te K-S test ad te disadvatage tat critical values must be calculated for eac distributio. Te Sapiro-Wilk test, proposed by Samuel Saford Sapiro ad Marti Wilk i 965, is used for testig ormality ad logormal distributios. It compares te observed cumulative frequecy distributio curve wit te epected cumulative frequecy curve. Te Sapiro-Wilk test is based o te ratio of te best estimator of
te variace to te usual corrected sum of squares estimator of te variace. Te Sapiro-Wilk test is ot as affected by ties as te Aderso-Darlig test, but is still biased by sample size. Power study of te most commoly used goodess-of-fit tests as bee coducted by may researcers icludig Sapiro, Wilk ad Ce 968), ad Aly ad Sayib 99). Some recet researc papers cocluded tat te Sapiro Wilk s test as te best power for a give sigificace, followed closely by Aderso-Darlig we comparig te Sapiro-Wilk, Kolmogorov Smirov, Lilliefors, ad Aderso- Darlig tests. Altoug it was metioed by Steele ad Caselig 009) tat oe of te eistig test statistics ca be regarded as te best test statistic, to maimize te power of te test statistic for ceckig ormality is still uder eplored ad modified by may statistics researcers. Te purpose of tis tesis is to compare tese tests. I te preset researc, te statistical tests proposed i Ce ad Ye 009) ad Xiog 00) will be adopted ad will be eteded to test ormality for te case tat te locatio parameter ad te scale parameter of te ormal distributio are bot ukow. A table for te critical values of te test statistics is provided usig Mote Carlo simulatio. Te performace of te ewly updated statistics are compared wit te Sapiro-Wilk s test ad te Kolmogorov-Smirov test.. Basic Idea Ce ad Ye 009) proposed a ew test statistic G statistic) for testig uiformity. Te test statistic ca be used to test if te uderlyig populatio distributio is a uiform distributio. O te basis of te probability itegral 3
trasformatio See F.N. David ad N.L. Joso, 948), te uderlyig populatio distributio ca be ay distributio. Suppose,,, are te observatios of a radom sample from a populatio distributio wit distributio fuctio F). Suppose also tat,,, are te correspodig order statistics. Te purpose is to test: H0 : F ) F0 ). H : F ) F0 ). It ca be see tat F ), F ),..., F ) are te ordered observatios of a radom ) ) ) sample from te U0, ) distributio. Te G test statistics ca be used to coduct te followig test procedure. Te test statistics ca be defied as G ), ),... ) ) ) i F 0 i) ) F 0 i) ) )). ) H 0 sould be rejected at sigificace level if G. ), ),..., ) ) G Here G is te upper critical value of te G statistic. Te value of G is calculated by te Mote Carlo simulatio. For simplicity, G,,..., ) ca also be epressed as ) ) ) G ), ),... ) ) i F 0 i) ) F 0 i) )). ) Epressio ) will be used i te Mote Carlo simulatio. Te test ca be used for testig ay ypotesized distributio. Te ormal distributio is merely a special case. 4
Te rage of te fuctio G,,..., ) is from 0 to ad te ) ) ) matematical epectatio ad variace of te test statistic ave bee give i Ce ad Ye 009). I Xiog s researc, te parameters icludig te epected value ad te stadard deviatio of te ormal distributio are assumed to be kow. We te parameters of te distributio are ukow, te test is o loger valid. To solve tis problem, Lilliefors idea is adopted ere to treat te case wit ukow parameters. Estimatio of te populatio mea ad populatio variace derived from te sample data is coducted before calculatig statistic. Te procedure i tis researc for simulatig te critical values of te statistic is summarized as follows: G test. Geerate a pseudo radom sample,,..., of size from te stadard ormal distributio;. Calculate te sample mea ) ad variace s ); 3. Fid te ordered values ), ),..., ) ad defie 0) 0 ad ) ; 4. Calculate F ), F ),..., F ). Here F ) is te cumulative distributio fuctio of te distributio; ) ) ) 5. Calculate te value of G,,..., ) usig Equatio ); ) ) ) 6. Repeat steps to 5 k times k=,000,000 i tis researc); 7. Sort all te values of G i ascedig order; 5
8. Fid te critical values wit = 0., 0.05, 0.0, 0.005, 0.00, tat is, to calculate te 90t, 95t, 99t, 99.5t ad 99.9t percetiles. Te procedure sow above uses te stadard ormal distributio. It will ot affect te simulatio result. I fact, it ca be sow tat it remais ivariat we te parameters of te ormal distributio cage. Suppose X as a ormal distributio wit te mea ad stadard deviatio. G ), ),..., ) ) 0 ) ) 0 ) )) F i F i i i t) t) ) i) = e dt) e dt) ). i Let z t. Te value of G,..., ) becomes ), ) ) i z i z ) z i) z e dt) e dt) ) z i z z z ) i) = e dt) e dt) ). i Tis is te same as i te case tat te stadard ormal distributio is picked. Te performace of te G test statistics is compared wit te Sapiro- Wilk s test ad te K-S test for testig ormality. Sice te Sapiro-Wilk s test ca 6
oly be used for testig ormal distributio ad logormal distributio, te ormality test for te power compariso is coducted i tis researc. Capter outlies te metod for calculatig tese tree test statistics. Te power study results of tese tree tests are aalyzed i Capter 3. Capter 4 cocludes te performace comparisos of te performace comparisos wit te G test, te Sapiro-Wilk s test ad te Kolmogorov-Smirov test. Mote Carlo simulatio was used to coduct power study for tese tree tests. Te computer programmig laguages used i tis researc are SAS/IML ad SAS/Base. 7
CHAPTER II METHODOLOGY Te performace of a test statistic ca be evaluated by a power study. To evaluate te performace of test statistic G,,..., ) proposed i tis tesis, ) ) ) various alterative distributios are used to fid te power of te test statistic. Te test power is compared wit te Sapiro-Wilk test ad te Kolmogorov-Smirov test i te preset researc. Te ull ypotesis assumes tat te uderlyig distributio is a ormal distributio, wile te alterative ypotesis assumes a distributio tat is ot a ormal distributio. Te alterative distributios used ere iclude te triagle distributios, V-saped triagle distributios, ad Beta distributios. G. Metodology of Tree Tests.. G test Suppose,...,, are te observatios of a radom sample from a populatio distributio wit a distributio fuctio F ). Suppose also tat ), ),..., are te correspodig order statistics. To test weter or ot te ) uderlyig distributio is a ormal distributio, te ull ad alterative ypoteses are H 0 : Te populatio distributio is a ormal distributio, H : Te populatio distributio is ot a ormal distributio. As discussed i Capter, we te test statistics G,,..., ) is used, H 0 ) ) ) sould be rejected at sigificat level if G, were ), ),..., ) ) G G is te 8
critical value of te G test statistic. Te value of G,,..., ) ca be calculated ) ) ) usig equatio ) for coveiece... Sapiro-Wilk Test Te Sapiro-Wilk test utilizes te ull ypotesis priciple to determie weter a sample,...,, come from a ormally distributed populatio. Te W test statistic is te ratio of te best estimator of te variace derived from te square of a liear combiatio of te order statistics) to te usual corrected sum of squares estimator of te variace Sapiro ad Wilk; 965). We is greater ta tree, te coefficiets to compute te liear combiatio of te order statistics ca be approimated by te metod of Roysto 99). Te statistic W is always greater ta zero ad less ta or equal to oe 0 W ). Small values of W lead to te rejectio of te ull ypotesis of ormality. Te distributio of W is igly skewed. Seemigly large values of W suc as 0.90) may be cosidered small will result i rejectig te ull ypotesis. Researc papers sow tat te Sapiro-Wilk test as te better performace compared wit te Aderso-Darlig test ad te Kolmogorov-Smirmov test. Te test statistic is W i i a i i ) i) ) were... 9
is te sample mea; is te order statistic; te costats ' a i s are give by a, a,..., a ) m T m V T V V m). Here m,..,, m m are te epected values of te order statistics of idepedet ad idetically distributed radom variables sampled from te stadard ormal distributio ad V is te covariace matri of tose order statistics. Reject H 0 if W is too small. To compute te value of test statistic W for a give complete radom sample,,...,, te procedure proposed i Sapiro &Wilk 965) is as follows: ) Order te observatios to obtai a ordered sample ) )... ). i ) i i ) Compute i) ), were is te sample mea. 3) a) If is eve, m, compute b a ) values of a ai are give i Sapiro ad Wilk 965). m i i i) i), were te b) If is odd, m te computatio is te same as te oe i 3)a) sice a 0 m. Tus b a ) ) )... a ) ) ), were te value of ), m m m te sample media, does ot eter te computatio of b. 4) Compute W b / i ) ). i 5) Compare wit te critical values from quatiles of te Sapiro-Wilk test for m 0
ormality table. If te calculated value of te test statistic W is smaller ta is rejected at te sigificace level...3 Kolmogorov-Smirov Test W, H 0 Te Kolmogorov-Smirov test statistic is defied as D D D sup F ) F ) ma D, D supf ) F ), supf ) F ). ), Here F is te cumulative distributio fuctio specified by te ull ypotesis. Let ) )... ) be te order statistic of ) )... ). Te te empirical distributio fuctio is i ) = F i,,..., ). for ) ) i i D ma sup i ma F 0i Similarly, i i F ) ma i 0 i ) ) ) i i i i ) i) i ) mama F i i) if ),0 F ) D i ma maf i) ),0 i If te calculated value of te test statistic is greater ta D, H 0 is rejected at te stated sigificace level. Here D is te critical value of te Kolmogorov- Smirrov test statistic.
Te popularity of te Kolmogorov-Smirov test relates to te fact tat te test does ot deped o te uderlyig cumulative distributio fuctio beig tested. However, its disadvatages also limit its applicatio sometimes because it oly applies to cotiuous distributios. Te test statistic becomes more sesitive ear te ceter of te distributio ta at te bot tails. All te parameters suc as locatio, scale ad sape of a distributio must be fully specified. If ot, te K-S test is o loger valid. It must be estimated by simulatio...4 Mote Carlo Simulatio Metod Mote Carlo simulatio is applied to geerate pseudo radom samples from a variety of alterative distributios to compare te power of te G test, Sapiro-Wilk test ad K-S test. Firstly, k pseudo radom samples of size are geerated from a specified distributio. I tis researc, V-sape triagle distributio, triagle distributio ad te Beta distributio are cose. For eac pseudo radom sample, te observatios,...,, are sorted ad become order statistics oe ), ),..., ). Te put te ordered statistics ito te formulas of te G test ad Sapiro-Wilk test, ad te K-S test. Te values of te test statistics ca be computed. By comparig te calculated value ad te critical values of te G test, Sapiro-Wilk test ad te K-S test, te rejectio rates ca be foud. I te preset researc, te sample sizes =5, 0, 0, 30, 40, 50 are selected to coduct Mote Carlo simulatio. Te umber of repetitios is selected to be k,000,000 to esure te accuracy of te power. Te procedure for calculatig te power is summarized as follows:
. Geerate a radom sample,,..., from te specified alterative distributio listed above;. Fid te ordered values ) )... ad defie ) 0) 0 ad ) ; 3. Calculate te correspodig F ), F ),..., F ),were F 0 is te 0 ) 0 ) 0 ) cumulative distributio fuctio of ormal ere ad te epected value ad stadard deviatio of te ormal distributio are calculated from te pseudo radom sample; 4. Calculate te value of G,,..., ) usig equatio ); ) ) ) 5. Compare te value of G,,..., ) wit te idicated critical value at ) ) ) sigificace level α=0.05, ad determie weter H 0 is rejected; 6. Usig te metod metioed above to calculate te value of te Sapiro-Wilk test statistic W ; 7. Compare te value of W wit W at te same sigificace level as i step 5, ad determie weter H 0 is rejected; 8. Usig te metod metioed above to calculate te value of te K-S test statistic D ; 9. Compare te value of D wit D at te same sigificace level, ad determie weter H 0 is rejected; 0. Repeat steps to 9,000,000 times;. Calculate te rejectio rates for te test. G test, te Sapiro-Wilk test ad te K-S 3
. Power Study Te power of a statistical test is te probability tat it correctly rejects te ull ypotesis we te ull ypotesis is false. Tat is, Power = P reject ull ypotesis/ ull ypotesis is false) wic ca be deoted as were β is te probability of committig type II error. Te power of te test statistic i tis researc ca be preseted as P H G G ) P G G H ) for G test; P H W W ) P W W H ) for Sapiro-Wilk test; P H D D ) P D D H ) for Kolmogorov-Smirov test; Te power estimate is ig meas tat te performace of te test is good. Statistical power may deped o a umber of factors. Some of tese factors may be particularly because of a specific testig situatio, but at a miimum, power always depeds o te followig tree factors: sample size, te sigificace level, ad te sesitivity of te data. Te rejectio rates of te tree test statistics are used as estimates of teir power i te tesis. Hig rejectio rates meas te power of te test statistic is ig. To evaluate te performace of te test statistic G,,..., ) proposed i tis ) ) ) researc, various alterative distributios icludig V-sape triagle distributio, Beta distributios ad triagle distributios are used to study te power of tis test statistic. 4
To fid te powers of te G test, Sapiro-Wilk test ad Kolmogorov-Smirov test, Mote Carlo simulatio was used to geerate pseudo radom samples from te various alterative distributios. To accomplis tis, k pseudo radom samples of size are geerated from a specified distributio. For eac pseudo radom sample, te observatio,...,, are sorted ad te sorted observatios become ), ),..., ).Te te values of te test statistics ca be computed for all tree tests. Fially, te rejectio rates or powers for teg test, Kolmogorov-Smirov test ad Sapiro-Wilk test are calculated. I te preset researc, te sample sizes = 5 to 50 are selected to coduct te Mote Carlo simulatio. I order to esure te accuracy of te power study, te umber of te repetitios is selected to be k=,000,000. 5
CHAPTER III POWER COMPARISON 3. Alterative Distributios 3.. V-saped Triagle Alterative Distributios Te probability desity fuctio of te V-saped triagle distributio is 0 ) f ) 0 elsewere. Here is a costat betwee 0 ad. Te followig V-saped triagle distributios are used i tis researc: Alterative Distributio Cosider = 0.5. Tis is a left-skewed V-saped triagle distributio. Te power compariso result uder tis V-saped triagle distributio is sow i Table. It ca be foud tat te G test is performs better ta Sapiro-Wilk test we sample size is 5. We sample size icreases, te power of tese tree tests also icreases. Compared wit te Kolmogorov-Smirov test, te G test outperforms te Kolmogorov-Smirov test i all cases. Te Sapiro-Wilk test performs better ta te oter two tests we te sample size becomes large. 6
Alterative Distributio Cosider = 0.5. Tis is a symmetric V-saped triagle distributio. Te power compariso result uder tis V-saped triagle distributio is sow i Table 3. Te result is similar to te previous case. Te G test performs better ta Sapiro- Wilk test we sample size is 5 ad is more powerful ta Kolmogorov-Smirov test for all sample sizes. We sample size icreases, te power of Sapiro-Wilk test icrease faster ta te oter two test statistics. Alterative Distributio 3 Cosider = 0.75. Tis is a rigt-skewed V-saped triagle distributio. Te power compariso result uder tis V-saped triagle distributio is sow i Table 4. It sows tat te G test is performs better ta Sapiro-Wilk test also we sample size is 5. Sapiro-Wilk test performs very well we sample size icreases. Te test still outperforms te Kolmogorov-Smirov test i all cases. Sice tere is o fuctio call of V-sape triagle distributio i SAS, te followig propositio is eeded. Propositio: Suppose U is a radom variable wit uiform distributio o iterval 0, ). Te HU) as a V-saped triagle distributio wit parameter. Here Hu) is defied as G H u) u ) u 0 u u. Let X HU ). Te te cumulative distributio of X is 7
8. ) 0 ) 0 0 ) ) 0 ) ) ) ) ) P U P U U P u P U H P X P F X Tat is,. 0 0 0 ) F X Te te pdf of X deoted as ) f ) is:, 0 ) 0 ) elsewere f wic is te probability desity fuctio of V-sape triagle distributio wit parameter. 3.. Beta Alterative Distributios Te probability desity fuctio of te Beta distributio is 0). 0, 0 0 ) ) ) ) ) elsewere f Te followig special cases of te beta distributios are used i te power study:
Alterative Distributio 4 B4,) distributio. Tis is a left-skewed Beta distributio. Te power compariso result uder tis Beta alterative distributio is sow i Table 5. It ca be foud tat te G test performs better ta Sapiro-Wilk test uder small sample size case we = 5. We sample size icreases, te power of tese tree tests icreases too. Te Sapiro-Wilk test icreases more ta G test. Te Sapiro-Wilk test performs still well i most cases. However, te G test is better ta Sapiro-Wilk test for small sample size we 5. Kolmogorov-Smirov test outperforms te uder tis distributio. Alterative Distributio 5 G test B 0.5,0.5) distributio. Tis is a symmetric battub-saped Beta distributio. Te power compariso result uder tis Beta alterative distributio is sow i Table 6. It ca be foud from te table tat G test performs better ta Sapiro-Wilk test uder small sample size we =5. TeG test is still more powerful ta te Kolmogorov-Smirov test i all cases. Alterative Distributio 6 B,4) distributio. Tis is a rigt-skewed Beta distributio. Te power compariso result uder tis Beta alterative distributio is sow i Table 7. Te power compariso result sows tat G test performs better ta Sapiro-Wilk test uder small sample size we = 5. G test is more powerful ta te Kolmogorov- Smirov test we sample size is 0,0,30,40,50 ecept te case of sample size 5 9
Alterative Distributio 7 B,) distributio. Tis is actually a uiform distributio. Te power compariso result uder tis Beta alterative distributio is sow i Table 8. It ca be see from te table tat G test performs better ta Sapiro-Wilk test uder small sample size case icludig = 5 ad 0. Te Kolmogorov-Smirov test we sample size is 0, 0, 30, 40, 50. G test is also more powerful ta te 3..3 Triagle Alterative Distributio Te probability desity fuctio of te triagle distributio is ) f ) 0 0 elsewere. Here is a costat betwee 0 ad. Te followig triagle distributios are used i te power study. Alterative Distributio 8 Cosider =0.75. Tis is a left-skewed triagle distributio. Te power compariso result of tis alterative distributio is sow i Table 9. It ca be foud form te figure tat te G test performs better ta Sapiro-Wilk test we sample size =5 ad 0. Te Sapiro-Wilk size performs well we te sample size icreases. Kolmogorov-Smirov test performs better ta te G test i tis case. 0
Alterative Distributio 9 Cosider =0.5. Tis is a symmetric triagle distributio. Te power compariso result of tis alterative distributio is sow i Table 0. Te statistic performs te best i tis case. test we sample size is = 5,0,0,30. Alterative Distributio 0 G test G test is more powerful ta te Sapiro-Wilk Cosider =0.5. Tis is a rigt-skewed triagle distributio. Te power compariso result of tis alterative distributio is sow i Table. It ca be foud form te figure tat te G test performs better ta Sapiro-Wilk test we sample size =5. Te Kolmogorov-Smirov test performs better ta te G test i tis case. 3. Summary of Power Compariso From te above aalysis, we ca coclude te followig: For all te above alterative distributios, te G test statistics performs better ta te Sapiro-Wilk test for small sample size; For all te V-saped alterative distributios, icludig tree V-saped triagle distributios ad te left-skewed, battub saped, rigt-skewed Beta distributio, te G test statistics performs better ta te Sapiro-Wilk test for small sample size. For all te V-saped alterative distributios ad battub saped Beta distributios, te for all cases; G test statistics performs better ta te Kolmogorov-Smirov test
For te symmetric triagle alterative distributio, te G test statistics performs te best amog all tese cases coducted. It performs better ta te Sapiro- Wilk test does we sample sizes are 5,0,0,30 40; For te left-skewed ad rigt-skewed triagle alterative distributios, te G test performs better ta te Kolmogorov-Smirov test for small sample sizes; For te uiform alterative distributio, te G test statistics performs better ta te Kolmogorov-Smirov test ad sows similar power to te Kolmogorov- Smirov test we sample size icreases; For te uiform alterative distributios, te G test statistics performs better ta te Sapiro-Wilk test we sample size is less ta 0.
CHAPTER IV CONCLUSION AND DISCUSSION Te goodess-of-fit test is a statistical procedure to measure te discrepacy betwee observed values ad te values epected uder a specific distributio. Te goal of te goodess-of-test is to ceck weter te uderlyig probability distributio differs from a ypotesized distributio. Tere are may eistig test statistics icludig some commoly used goodess-of-fit tests suc as te Sapiro-Wilk test, Kolmogorov-Smirov test, Aderso-Darlig test ad Cramer-Vo Mises test. All tese commoly used statistical tests ca be used to test ormality. Amog tem, a alterative statistical test G test for uiformity was proposed by Ce ad Ye 009). Te test was used by Xiog 00) to test ormality for te case tat bot locatio parameter ad te scale parameter of te ormal distributio are kow. Te purpose of tis tesis is to eted te result to te case tat te parameters are ukow. Power study is coducted to compare te performace of tis proposed test wit te Sapiro-Wilk test ad te Kolmogorov-Smirov test. Te result of te Mote Carlo simulatio sows tat te G test performs better ta te Sapiro-Wilk test for small sample cases uder all te alterative distributios used i tis researc. Te G test also outperforms te Kolmogorov-Smirov test i most of cases. It ca also be foud tat te Sapiro-Wilk test performs better we te sample size icreases. Sice te computatio of te G test statistic is less complicated ta te Sapiro- Wilk test. Terefore, te G test statistics i tis tesis is wort beig recommeded to be a alterative approac for testig ormality, especially we te sample size is 3
small. However, we sample size icreases, its power does ot icrease as fast as te Sapiro-Wilk test does. Sice te Kolmogorov-Smirov test ad te G test ca be used to test ay ypotesized distributio, wile Sapiro-Wilk test ca be oly used for ceckig ormality ad logormality, te power compariso betwee tem of ormality test amog te tree tests is coducted i tis researc. Etedig te usage of te G test for te case tat te parameters of te distributio are ukow is useful. We ormality is tested, te mea ad te variace of te distributio are usually ukow. For testig weter or ot te uderlyig distributio of a data set belogs to a specified distributio family suc as te ormal distributio family, te epoetial distributio family ad so o, ca still be used eve if te parameters of te distributio are ukow. G test 4
Table Critical Value of G test Statistic G 0. 00 G 0. 050 G 0. 00 G 0. 005 G 0. 00 5 0.865 0.90 0.83 0.306 0.344 6 0.664 0.958 0.548 0.769 0.390 7 0.49 0.757 0.36 0.54 0.955 8 0.349 0.583 0.00 0.307 0.743 9 0.7 0.440 0.99 0.5 0.508 0 0.7 0.30 0.757 0.94 0.340 0.040 0.5 0.66 0.784 0.64 0.0965 0.5 0.495 0.655 0.04 3 0.0900 0.047 0.393 0.54 0.883 4 0.084 0.0976 0.96 0.433 0.759 5 0.079 0.097 0.0 0.340 0.649 6 0.0705 0.08 0.065 0.78 0.445 7 0.0705 0.08 0.065 0.78 0.445 8 0.0669 0.0770 0.008 0.4 0.370 9 0.0635 0.079 0.0953 0.05 0.95 0 0.0605 0.069 0.0904 0.000 0.5 0.0577 0.0660 0.0858 0.0948 0.59 0.055 0.063 0.087 0.090 0.09 3 0.059 0.0603 0.078 0.0860 0.049 4 0.0507 0.0578 0.0745 0.08 0.004 5 0.0487 0.0554 0.07 0.0784 0.096 6 0.0469 0.053 0.068 0.0750 0.096 7 0.0453 0.056 0.0657 0.07 0.0878 8 0.0437 0.0494 0.063 0.0693 0.0845 5
9 0.04 0.0476 0.0606 0.0664 0.087 30 0.0408 0.0460 0.0586 0.064 0.0783 3 0.0395 0.0445 0.0564 0.067 0.0745 3 0.038 0.0430 0.0544 0.0595 0.078 33 0.037 0.046 0.055 0.0575 0.0694 34 0.0360 0.0404 0.0509 0.0555 0.0673 35 0.0349 0.039 0.0493 0.0537 0.065 36 0.0340 0.039 0.0493 0.0537 0.064 37 0.0330 0.0370 0.0463 0.0506 0.0606 38 0.03 0.0360 0.0450 0.0490 0.059 39 0.034 0.0350 0.0435 0.047 0.0573 40 0.0305 0.034 0.044 0.046 0.055 4 0.0300 0.033 0.04 0.0448 0.0540 4 0.09 0.033 0.040 0.0435 0.05 43 0.084 0.035 0.0390 0.044 0.0508 44 0.077 0.0308 0.0380 0.043 0.0490 45 0.07 0.030 0.0370 0.040 0.0478 46 0.065 0.094 0.036 0.039 0.0465 47 0.059 0.087 0.0353 0.038 0.0456 48 0.054 0.08 0.0344 0.037 0.0446 49 0.048 0.075 0.0336 0.0363 0.043 50 0.043 0.069 0.039 0.0355 0.04 6
Table Power Compariso : V-Sape triagle =0.5) G-Test SW-Test KS-Test 5 0.680 0.45 0.4 0 0.88 0.3954 0.37 0 0.5390 0.869 0.5330 30 0.750 0.9884 0.745 40 0.895 0.9996 0.88 50 0.9635 0.9554 Table 3 Power Compariso: V-Sape triagle =0.5) G-Test SW-Test KS-Test 5 0.46 0.68 0.576 0 0.5004 0.5307 0.349 0 0.775 0.9586 0.3443 30 0.973 0.999 0.7393 40 0.9770 0.989 50 0.9956 0.9974 7
Table 4 Power Compariso: V-triagle =0.75) G-Test SW-Test KS-Test 5 0.690 0.46 0.09 0 0.873 0.3947 0.366 0 0.5378 0.8697 0.5335 30 0.7504 0.9884 0.743 40 0.896 0.9996 0.8804 50 0.963 0.9556 Table 5 Power Compariso: Beta α= 4, β=) G-Test SW-Test KS-Test 5 0.0575 0.0408 0.058 0 0.0639 0.0650 0.063 0 0.078 0. 0.0979 30 0.0783 0.856 0.08 40 0.0869 0.90 0.5 50 0.0980 0.3935 0.884 8
Table 6 Power Compariso: Beta α= 0.5, β=0.5) G-Test SW-Test KS-Test 5 0.436 0.045 0.0964 0 0.35 0.89 0.56 0 0.3466 0.730 0.3376 30 0.5043 0.959 0.506 40 0.6766 0.9969 0.6643 50 0.87 0.9999 0.797 Table 7 Power Compariso: Beta α=, β=4) G-Test SW-Test KS-Test 5 0.0577 0.0409 0.057 0 0.0635 0.0650 0.0636 0 0.074 0. 0.098 30 0.0780 0.848 0.06 40 0.0869 0.909 0.59 50 0.0975 0.394 0.889 9
Table 8 Power Compariso: Beta α=, β=) G-Test SW-Test KS-Test 5 0.07 0.045 0.053 0 0.09 0.0763 0.066 0 0.6 0.033 0.07 30 0.394 0.4 0.434 40 0.73 0.6869 0.945 50 0. 0.860 0.574 Table 9 Power Compariso: Triagle =0.5) G-Test SW-Test KS-Test 5 0.064 0.0433 0.055 0 0.076 0.07 0.070 0 0.0837 0.37 0.0 30 0.093 0.47 0.608 40 0.06 0.347 0.090 50 0.43 0.4604 0.663 30
Table 0 Power Compariso: Triagle =0.5) G-Test SW-Test KS-Test 5 0.0495 0.0334 0.045 0 0.05 0.034 0.047 0 0.0537 0.0336 0.044 30 0.0538 0.0364 0.040 40 0.0539 0.0559 0.040 50 0.0550 0.0753 0.046 Table Power Compariso: Triagle =0.75) G-Test SW-Test KS-Test 5 0.06 0.0433 0.0556 0 0.073 0.0708 0.0707 0 0.0833 0.376 0.9 30 0.093 0.54 0.604 40 0.030 0.3409 0.097 50 0.46 0.4598 0.655 3
Figure Power Compariso: V-Sape triagle =0.5) Figure Power Compariso: V-Sape triagle =0.5) 3
Figure 3 Power Compariso: V-triagle =0.75) Figure 4 Power Compariso: Beta 4, ) 33
Figure 5 Power Compariso: Beta 0.5, 0. 5) Figure 6 Power Compariso: Beta, 4) 34
Figure 7 Power Compariso: Beta, ) Figure 8 Power Compariso: Triagle =0.5) 35
Figure 9 Power Compariso: Triagle =0.5) Figure 0 Power Compariso: Triagle =0.75) 36
REFERENCES Ce, Z. ad Ye, C. 009) A alterative test for uiformity, Iteratioal Joural of Reliability, Quality ad Safety Egieerig, 6, 343-356. Aderso, T. W. ad Darlig, D.A. 95) Asymptotic Teory of Certai Goodess of Fit Criteria Based o Stocastic Process, Aals of Matematical statistics, 3:93-. Kolmogorov, A.N. 933) Sulla Determiazioe Empirica di Ua Legge di Distribuzioe, Giorale dell Istituto degli attuari, 4:83-9 Steele,M,N.Smart, C.Hurst ad J. Caselig 009) Evaluatig te Statistical power of goodess-of-fit tests for ealt ad medicie survey data, 8t IMACS world cogress MODSIM 09: Iteratioal cogress o modellig ad simulatio: Iterfacig modellig ad simulatio wit matematical ad computatioal scieces. Cairs, Qld,Australia. F.N. David ad N.L. Joso 948) Te probability itegral trasformatio we parameters are estimated from te sample, Biometrika 06/948; 35Pts -):8-90. Lilliefors, W.H. 967) O te Kolmogorov-Smirov test for ormality wit mea ad variace ukow, Joural of te America Statistics Associatio, 6:399-40. Lilliefors, W.H. 969) O te Kolmogorov-Smirov test for te epoetial distributio wit ukow mea, Joural of te America Statistics Associatio, 64:387-389. Sapiro, S. S. ad Fracia, R. S. 97) Approimate Aalysis of Variace Tests for Normality, Joural of te America Statistics Associatio, 67:5-6. Sapiro, S. S. ad Wilk, M. B. 965) A Aalysis of avariace Test for Normality complete Samples), Biometrika, 5:59-6. Sapiro, S. S., Wilk, M. B. ad Ce, H. 968) A Comparative Study of Various Tests for Normality, Joural of te America Statistics Associatio, 63:343-37. Ju, X. 00). A alterative statistic for testig ormality. Upublised master s tesis, Florida Iteratioal Uiversity, Miami, Florida. 37