Joural of mathematcs ad computer scece 4 (5) 74-83 Artcle hstory: Receved ecember 4 Accepted 6 Jauary 5 Avalable ole 7 Jauary 5 Goodess of Ft Test for The Skew-T strbuto M. Magham * M. Bahram + epartmet of Statstcs Uversty of Isfah Isfaha Ira * magham8@gmal.com + m.bahram@sc.u.ac.r Abstract I ths mauscrpt goodess-of-ft test s proposed for the Skew-t dstrbuto based o propertes of the famly of these dstrbutos ad the sample correlato coeffcet. The crtcal values for the test ca be acheved by Mote Carlo smulato method for several sample szes ad levels of sgfcace. The power of the proposed test ca be specfed for dfferet sample szes ad cosderg dverse alteratves. Keywords: Sample correlato coeffcet; Skew-t; Goodess-of-ft test.. Itroducto Let Z be a radom varable we say that Z has the Skew-ormal dstrbuto deoted by ts probablty desty fucto be Z ~ SN f ad f z ( z; ) ( z) ) z I ( ( z) () where deotes the desty ad cumulatve dstrbuto fucto of stadard Normal dstrbuto respectvely. The skew-ormal dstrbuto was troduced by Azzal (985) as a famly wth the appealg property of strctly cludg the ormal law as well as a wde varety of skewed destes. We say that a radom varable W has the Skew-t dstrbuto wth parameters ad R f W d Z / V where Z s the skew-ormal varable wth pdf () V / ad are depedet. Ths varable s deoted by W St. If a radom varable Y s defed as Y W wth R R the Y St( ). Skew-Cauchy dstrbuto s obtaed smply as specal cases of the skew-t wth SC. ad deoted by ()
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 Some well kow propertes of skew-t varables whch wll be useful for costructg goodess of ft test are the followg (See for detals [3]): (a) If W St( ) the W St( ). (b) If W St( ) the W F( ).. EF-Based Tests Perez Rodrguez ad Vllaseor () developed a goodess of ft test for the skew ormal famly based o the sample correlato coeffcet ad showed that ther test have greater power tha the Emprcal strbuto Fucto-based tests agast some alteratve dstrbutos. We are terested testg the ull hypothess H : Y s St( ) for some R R R R (3) agast geeral alteratves. I ths secto we dscuss geeral EF-based goodess-of-ft statstcs desged to test the ull hypothess H. EF-based test statstcs measure the dfferece betwee the dstrbuto fucto F (.) stated the ull hypothess ad the EF a step fucto deoted by F (.) gve as F ( / where ( )... ( ) are the ordered statstcs of the 's. To compare the two dstrbuto fuctos several statstcs ca be used that Stephes (986) dvdes to two famles. The Cramér-vo Mses famly cotas the Cramér-vo Mses statstc W Watso's U statstc ad the Aderso-arlg statstc A defed as: W ( ) ( ) () F ( F( df( ( ) U F ( F( F ( t) F( t) df( t) df( A The Kolmogorov-Smrov famly cotas the statstcs ad the Kuper statstc V defed as: sup F ( F( F( F( df( F ( F( supf ( F (. ma V. the Kolmogorov-Smrov statstc Stephes (986) provdes the followg smple formulae for calculatg these statstcs: W p() 75 (4)
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 U W p (5) A log p log p (6) ( ) ( ma p( ) (7) ma p( ) (8) ma (9) Where F V p( ) ( ) ad p p( ) /. Large values of a gve statstc dcate sgfcat dffereces betwee the emprcal ad hypotheszed dstrbuto fuctos ad thus that we should reject the ull hypothess. I geeral whe the parameter values of the hypotheszed dstrbuto are completely specfed the samplg dstrbuto of ay of these EF statstcs s kow eactly ad tables of percetage pots are avalable (see Stephes (986) Table 4.). However whe the values take by the parameters of the dstrbuto are ukow ad have to be estmated from the sample the samplg dstrbuto of ay EF statstc depeds o the dstrbuto beg tested sample sze true values of the ukow parameters ad method used to estmate the parameters. Now we descrbe the parametrc bootstrap techques used to estmate the quatles of the test statstc T whe the hypotheszed dstrbuto s skew-t wth parameter values estmated from the data. Mamum lkelhood methods ca be employed to estmate the parameters of the skew-t dstrbuto. Sce aalytc epressos do ot est for these estmators umercal methods must be used to compute them. Note that whe the ukow parameters are locato or scale parameters ad they are estmated usg locato ad scale equvarat estmators (as are mamum lkelhood estmators) the samplg dstrbutos of the EF statstcs do ot deped o the true values of those parameters. (see Eastma ad Ba (973)). Therefore the values of ad were used for smplcty because of the samplg dstrbutos of the statstcs beg varat to chages the locato ad scale parameters. Sce however the asymptotc ull dstrbuto of the test statstc depeds upo the ukow value of ad a parametrc bootstrap versos of the test s performed: ^. Gve the sample y... y compute the mamum lkelhood estmator ad ˆ of ad.. Calculate the value of the chose test statstc T usg the approprate formula(e) from Eqs. ( 4) - ( ) where (.) St ˆ ˆ. F deoted the dstrbuto fucto of (a) Geerate a bootstrap sample of sze from St ˆ ˆ. () 76
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 (b) Gve the bootstrap sample geerated prevously compute the ML estmators of ad say ad ˆ*. * ˆ (c) Compute the value of the test statstc say T usg * ˆ 3. Repeat steps (a) (b) ad (c) tmes to get T j.... 4. Obta T (.5 ) as (95) j ˆ* ad the bootstrap sample. T where T j... deotes the ordered T j values. ( j) 3. Correlato Goodess-of-Ft Test I ths secto we troduced goodess-of ft test for skew-t dstrbuto wth sample correlato coeffcet. The test procedure s based o property (b). From Eq. (): Y W where Y St( ) ad W St( ) the X : ( Y ) W By property (b): X : ( ). W F From () parameter has bee elmated from the problem. () () For fed ad say ad X has a scale dstrbuto P( X G( ) where G s the dstrbuto fucto of a F ( ) radom varable. So gve the sample y.. y ad calculate... by usg (). A cosstet estmator for P( X s the emprcal dstrbuto fucto the G( ) F ( therefore u : G F ( (3) Sce (3) s establshed we should epect a strog lear relatoshp betwee 's ad u 's uder the ull hypothess stated (3). If ad are estmated by cosstet estmators say ad the t s epected that the lear relatoshp (4) stll holds. To test f there s a strog lear relatoshp betwee 's ad u 's the sample correlato coeffcet statstc s used whch s gve by C Corr( X U) X X U U X X U U The ull hypothess (3) s rejected at the level of sgfcace f C C ( ) where C ( ) s such that (4) ma ( ). (6) ma P Re ject H H 77 P C C
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 The dstrbuto of C uder the ull hypothess for each fed value of ad ca be obtaed by Mote Carlo smulato. Note that C s scale varat ad the dstrbuto of X () does ot deped o therefore we wll f ad. If the radom sample comes from a dstrbuto fucto dfferet from the skew-t dstrbuto for whch property (b) does ot hold the t meas that (3) does ot hold. Therefore the sample correlato coeffcet (4) ca ot be ear hece C should be lower tha the crtcal value sce uder H the dstrbuto of C wll be cocetrated close to. Therefore we use the followg procedure to obta the crtcal values:. F.. Smulate a sample of sze from St ( ) 3. Calculate the mamum lkelhood estmator of parameter. 4. Calculate... usg Eq. (). 5. Sort 's to ascedg order. 6. Calculate u : G F ( ) dstrbuto.... where G s the quatle fucto of the F ( ) 7. Calculate C usg Eq. (4) ad the data u geerated steps 5 ad 6. 8. Repeat steps -7 B tmes. Upo fshg the smulato process we have B realzatos of C for a gve value of ad. Therefore the value of the crtcal costat C ( ) s determed wth the quatles from the emprcal dstrbuto of C. For eample fg. presets graph of ( )..5.5 ad 5 for whch shows that the dstrbuto of the test statstc C uder H ot depeds o the value of the ukow parameter. Our smulatos show ths fact defeasble for arbtrary. C as a fucto of Note that we have lmted our atteto to Y St( ) wth sce Y St( ) by property (a). Therefore dstrbuto of C does ot deped o the sg of hece the crtcal costat C ( ) (5) s such that ma P C C ( ) ma PC C ( ). For arbtrary smulatos show that the values of the crtcal costat C ( ) are determed wth the quatles from the emprcal dstrbuto of C obtaed by smulato wth arbtrary. Fg. show ths fact for.? 78
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 Fgure. Crtcal values as a fucto of for 5 B 5 for the statstc C. Gve a radom sample of data values the steps ecessary to carry out a gve test ca be summarzed as follows:. Calculate the MLEs of the ad usg lbrary `s' (Azzal (8)) R (R evelopmet Core ad Team 8) ad deote by ˆ ad ˆ.. Calculate the value of the test statstc C usg the Eq. (4). 3. For a gve sgfcace level detfy the quatle C ( ) of the test statstc correspodg to ˆ ad. 4. If C C ( ) the ull hypothess s rejected at the sgfcace level. 4. Smulato studes 4-. Tests sze The results of sze estmatos of tests preseted Table ad obtaed by smulato for. 5. The selected sample szes were 5 ad the value of parameter { 4 5 37 5 7.5 3. } From Table ad t ca be see that the estmated tests szes are very close to the omal sgfcace level. Table : Test sze estmates usg the statstcs obtaed by smulato wth B Mote Carlo samples of sze 5 wth. 5. Statstc A W U V C 4 5 3 7 5 7.5 3.4.57.4.43..47.47.54.57.39.8.53.35.3.43.37.54.33.53.5.5..37.3 79.4...36.4.....43.37.5.4.9.38.3.54.56.56.7.4.54.48.7
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 Table : Test sze estmates usg the statstcs obtaed by smulato wth B Mote Carlo samples of sze wth. 5. Statstc A W U V C 4 5 37 5 7.5 3.4..7.34.54.7.37...3..8.4.34..34.3.4.53.49.33.4.9.7.45.5.37.47.4.5.38.4.35.7.53.4.6...3.4.54.3.5...49.9 4-. Tests power To aalyze the behavor of the proposed tests alteratves dfferet to the skew-t were cosdered. The dstrbutos selected for ths were: skew-slash (SSL) Logstc Epoetal Ch squared Webull Gumbel Log Normal ad Stable (see Nola (999)). We also cosdered some bmodal dstrbutos. The results are show Tables 3 ad 4 from whch t ca be see that the proposed test C show the hghest powers for several of the cosdered alteratves. Table 3: Power estmates of the A A W U V C statstcs for some alteratves wth 5. 5 B 5. U V C Alteratve W SSL().54.64.43.34.7.745 Stadard logstc.4.34.7.48.3.47 Stadard ep..3.64..54.347.65 Chsquared(4).78.95.74.3.47.9 Webull(.75).3.7.4.4.37.84 Stadard Gumbel.59.47.47.5.3.537 Log-Normal(.5).7.56.54.3.7.694 Sta(.6.5;).34.4.4.3.9.45.5N(4.5.5)+.5N(-4.5.5).87.9.54.67.49.93.9N(4.5.5)+.N(-4.5.5).95.54.48.5.64.934.5N((/3))+.5N(-(/3)).5.6.7.74.45.664.9N((/3))+.N(-(/3)).4.574.3.7.36.9 8
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 Table 4: Power estmates of the A W U V C statstcs for some alteratves wth.5 B 5. Alteratve A W U V C SSL().67.74.3.4.39.753 Stadard logstc.5.4.3.3.345.58 Stadard ep..345.75.43.64.457.76 Chsquared(4).93.7.8.5.7.3 Webull(.75).47.69.396.48.67.867 Stadard Gumbel.746.64.68.69.47.77 Log-Normal(.5).875.64.58.44.35.793 Sta(.6.5;).44.57.9.487.47.673.5N(4.5.5)+.5N(-4.5.5).83.945.673.74.576.944.9N(4.5.5)+.N(-4.5.5).96.6.33.549.673.967.5N((/3))+.5N(-(/3)).574.73.35.47.59.8.9N((/3))+.N(-(/3)).5.67.4.3.3.384 6. Numercal eample To llustrate how the test procedure works wth real data we use data collected at the Australa Isttute of Sport (AIS) (Cook & Wesberg (994)) cotag male athletes of body mass de (BMI). Table 5 reports mamum lkelhood estmators of some skew models cosderg the full St ( ) model ad two specal cases: Skew-ormal ad Skew-cauchy. The Akake formato crtero (AIC) s used to compare the estmated models (Lerou (99)). As s well kow a model wth a mmum AIC value s to be preferred. Therefore the St ft appears to be preferable. These pots are further llustrated Fgure 3 where a hstogram of the data s plotted together wth the ftted destes. Table 5: MLE estmates ad Log-lkelhood values. Model SN SC St.7978.7597.37376 4.37343.38549.9735 3.69.54.446 - - 5.6387 Log-lkelhood -37.8347-47.553-35.933 AIC 48.6694 5.46 479.866 8
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 AIC log( L) k. L ad k are the mamzed log lkelhood ad umber of parameters...5..5..5 SN SGN SCN SC S-t Fgure : Hstogram of BMI of Australa athletes. The les represet dstrbutos ftted usg mamum lkelhood estmato. However the goodess of ft test for ths data Skew-Normal ad Skew-Cauchy rejecto of SN ad SC models but we ca ot reject the hypothess of a uderlyg skew-t populato for data set. (See for detals [8]). The results are summarzed Table 6. The crtcal pots the correspodg value of the test statstcs ad rage of P value gve Table 6. Table 6: Crtcal pots ad values of the test statstcs for the BMI data Model SN SC St * r r.9754747 R =.76834 C =.97686 Test statstcs.94698 5 3 35 %.9455.9738395.73456.7389985.5%.937466.979894.775845.783385 5%.9586.9834587.8549.87583 %.96663.986867.864546.8893 5%.97494.9885898.899.9979.5%.98959.9936363.97463.97486 P value (.5.5) (.5.5) (.5.5) (.5) 8
M. Magham M. Bahram/ J. Math. Computer Sc. 4 (5) 74-83 It s mportat to meto that all the calculatos show ths work were obtaed usg routes wrtte R. Ths routes uses the s package ad are freely avalable up o request. Refereces [] A. Azzal A class of dstrbutos whch cludes the ormal oes Scadava Joural of Statstcs. (985) 7 78. [] A. Azzal R package s: the skew-ormal ad skew-t dstrbutos (verso.4 6) Uversta d Padova (8). [3] A. Azzal A. Captao strbutos geerated by perturbato of symmetry wth emphass o a multvarate skew t dstrbuto Joural of the Royal Statstcal Socety: Seres B (Statstcal Methodology). 65() (3) 367 389. [4] R.. Cook S. Wesberg A Itroducto to Regresso Graphcs New York: Wley (994). [5] J. Eastma L.J. Ba A property of mamum lkelhood estmators the presece of locatoscale usace parameters Commu. Statst. (973) 3 8. [6] B.G. Lerou Cosstet estmato of a mg dstrbuto Aals of Statstcs. (3) (99) 35 36. [7] J.P. Nola Stable dstrbutos Uversty Washgto C (999). [8] P. Perez Rodrguez J.A. Vllaseor O testg the skew ormal hypothess Joural of Statstcal Plag ad Iferece. 4 () 348 359. [9] R evelopmet Core Team R: a laguage ad evromet for statstcal computg R Foudato for Statstcal Computg Vea Austra; ISBN 3-95-7-. (8). [] M.A. Stephes Tests based o EF statstcs I: 'Agosto R. B. Stephes M. A. eds. Goodess-of-Ft Techques. New York: Marcel ekker (986). 83