TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS

Size: px

Start display at page:

Download "TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS"

Leslie Miller
5 years ago
Views:

1 MATHEMATICS OF COMPUTATION Volume 70, Number 233, Pages S (00) Article electroically published o February 17, 2000 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS JIA-JUAN LIANG, KAI-TAI FANG, FRED J. HICKERNELL, AND RUNZE LI Abstract. Some ew statistics are proposed to test the uiformity of radom samples i the multidimesioal uit cube [0, 1] d (d 2). These statistics are derived from umber-theoretic or quasi-mote Carlo methods for measurig the discrepacy of poits i [0, 1] d. Uder the ull hypothesis that the samples are idepedet ad idetically distributed with a uiform distributio i [0, 1] d, we obtai some asymptotic properties of the ew statistics. By Mote Carlo simulatio, it is foud that the fiite-sample distributios of the ew statistics are well approximated by the stadard ormal distributio, N(0, 1), or the chi-squared distributio, χ 2 (2). A power study is performed, ad possible applicatios of the ew statistics to testig geeral multivariate goodess-of-fit problems are discussed. 1. Itroductio Testig uiformity i the uit iterval [0, 1] has bee studied by may authors. Some early work i this area is [Ney37], [Pea39] ad [AD54]. Queseberry ad Miller [QM77, MQ79] made a thorough Mote Carlo simulatio to compare a umber of existig statistics for testig uiformity i [0, 1] ad recommeded Watso s U 2 -test [Wat62] ad Neyma s smooth test [Ney37] as geeral choices for testig uiformity i [0, 1]. D Agostio ad Stephes [DS86, Chapter 6] gave a comprehesive review o tests for uiformity i [0, 1]. Testig uiformity of radom samples i the multidimesioal uit cube (d 2), C d =[0, 1] d = {x =(x 1,...,x d ) R d : 0 x i 1, i=1,...,d}, seems to have received less attetio i the literature. Two well-kow quatities are the Kolmogorov-Smirov type statistic, (1.1) KS = sup F (x) F (x), x R d ad the Cramér-vo Mises type statistic, (1.2) CM = [F (x) F (x)] 2 ψ(x) dx. R d Here, F (x) is the ull distributio fuctio (d.f.), F (x) deotes the empirical distributio fuctio (e.d.f.) based o idepedet ad idetically distributed Received by the editor August 14, 1998 ad, i revised form, February 11, Mathematics Subject Classificatio. Primary 65C05, 62H10, 65D30. Key words ad phrases. Goodess-of-fit, discrepacy, quasi-mote Carlo methods, testig uiformity. This work was partially supported by a Hog Kog Research Grats Coucil grat RGC/97-98/ c 2000 America Mathematical Society

2 338 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI (i.i.d.) samples, ad ψ(x) 0 i (1.2) is a suitable weight fuctio. Ufortuately, the Kolmogorov-Smirov type statistic is difficult to compute for large d. I the literature of umber-theoretic methods or quasi-mote Carlo methods [Nie92, FW94, SJ94], there are a umber of criteria for measurig whether a set of poits is uiformly scattered i the uit cube C d. These criteria are called discrepacies, ad they arise i the error aalysis of quasi-mote Carlo methods for evaluatig multiple itegrals. Give a itegral over the uit cube, (1.3) I(f) = f(x) dx, C d quasi-mote Carlo methods approximate this itegral by the sample mea, Q(f) = 1 (1.4) f(z i ), over a set of uiformly scattered sample poits, P = {z 1,...,z } C d.examples of good sets for quasi-mote Carlo itegratio are give i [HW81, Nie92, SJ94, Tez95] ad the refereces therei. The worst-case quadrature error of a quasi-mote Carlo method is bouded by a geeralized Koksma-Hlawka iequality, (1.5) I(f) Q(f) D(P)V (f), where V (f) is a measure of the variatio or fluctuatio of the itegrad, ad the discrepacy, D(P), is a measure of the quality of the quadrature rule, or equivaletly, of the set of poits, P. A smaller discrepacy implies a better set of poits. The precise defiitios of the discrepacy ad the variatio deped o the space of itegrads. I the origial Koksma-Hlawka iequality [Nie92, Chap. 2], V (f) is the variatio of the itegrad i the sese of Hardy ad Krause, ad the discrepacy is the star discrepacy, defied as follows: (1.6) D (P) = sup x C d i=1 P [0, x] Vol([0, x]), where deotes the umber of poits i a set, ad Vol([0, x]) deotes the volume of the hypercube [0, x] (x R d ). Takig the ull distributio i (1.1) to be the uiform distributio, F (x) =Vol([0, x]), ad otig that the e.d.f. is F (x) = P [0, x] /, the star discrepacy, (1.6), is a special case of the Kolmogorov- Smirov type statistic, (1.1). This relatioship betwee discrepacies arisig i quasi-mote Carlo quadrature error bouds ad goodess-of-fit statistics is rather geeral ad has bee discussed i [Hic98b, Hic99]. If the space of itegrads is a reproducig kerel Hilbert space, the oe may obtai a computatioally simple form for the discrepacy. A special case is the L 2 -star discrepacy, which correspods to the Cramér-vo Mises statistic, (1.2), for the uiform distributio with weight fuctio ψ(x) = 1. Hickerell [Hic98a] proposed a geeralized discrepacy based o reproducig kerels. This discrepacy has a computatioally simple formula. Three special cases, the symmetric discrepacy, thecetered discrepacy ad the star discrepacy, have iterestig geometrical iterpretatios. Thus, all three of them may give useful statistics for testig multivariate uiformity of a set of poits. However, to perform a statistical test, oe must kow the probability distributio of the test statistic uder the assumptio of i.i.d. uiform radom poits. This problem is the focus of this article. Sectio 2 defies the ew test statistics ad

3 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 339 describes their asymptotic behavior. For statistical reasos the discrepacy itself is ot the best goodess-of-fit statistic, but useful statistics are derived from the discrepacy. Sectio 3 presets simulatio results o how well the distributios of the statistics are approximated by their asymptotic limits ad o the power performace of the ew statistics. Some applicatios are also discussed. 2. The ew statistics ad their asymptotic properties 2.1. Some L 2 -type discrepacies. The geeralized L 2 -type discrepacy proposed i [Hic98a] is as follows: [D(P)] 2 =M d 2 [M + β 2 µ(z kj )] j=1 [ + 1 (2.1) (M 2 + β 2 µ(z kj )+µ(z lj )+ 1 2 B 2({z kj z lj }) k,l=1 j=1 ]) + B 1 (z kj )B 1 (z lj ), where {}deotes the fractioal part of a real umber or vector, β is a arbitrarily give positive costat, ad µ( ) is a arbitrary fuctio satisfyig { µ f : df 1 } dx L ([0, 1]) ad f(x) dx =0. 0 The costat M is determied i terms of β ad µ as follows: 1 ( ) 2 dµ (2.2) M =1+β 2 dx. 0 dx The B 1 ( ) adb 2 ( ) i (2.1) are the first ad the secod degree Beroulli polyomials, respectively: B 1 (x) =x 1 2 ad B 2 (x) =x 2 x For ay z 1, z 2 [0, 1], it is true that B 2 ({z 1 z 2 })=B 2 ( z 1 z 2 )= z 1 z 2 2 z 1 z The three special cases of [D(P)] 2 (deoted by D s (P) 2, D c (P) 2 ad D (P) 2, respectively) give i [Hic98a] are derived by takig three differet choices of the fuctio µ( ) ad the costat β i (2.2): 1) the symmetric discrepacy: µ(x) = 1 2 B 2(x) = 1 2 (x2 x ), β 1 = 1 2, M = 4 3, (2.3) D s (P) 2 = ( ) d d 2 j=1 (1 + 2z kj 2zkj) 2 k,l=1 j=1 (1 z kj z lj );

4 340 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI 2) the cetered discrepacy: µ(x) = 1 ({ 2 B 2 x 1 }) ( = x 1 2 (2.4) D c (P) 2 = ( ) d ) the star discrepacy: k,l=1 j=1 j=1 2 x 1 ) 2 + 1, β 1 =1, M = , ( z kj z kj 1 2 [ ) z kj z lj ] 2 z kj z lj ; (2.5) D (P) 2 = µ(x) = 1 6 x2 2, β 1 =1, M = 4 3, ( ) d j=1 ( ) 3 z 2 kj k,l=1 j=1 [2 max(z kj,z lj )] Asymptotic properties of the discrepacies. The ull hypothesis for testig the uiformity of radom samples P = {z 1,...,z } C d ca be stated as (2.6) H 0 : z 1,...,z are uiformly distributed i C d. The alterative hypothesis H 1 implies rejectio of H 0 i (2.6). A test for (2.6), that is, a test of multivariate uiformity, ca be performed by determiig whether the value of a test statistic is ulikely uder the ull hypothesis. If oe wishes to use oe of the discrepacies described above as a test statistic, the its probability distributio uder the ull hypothesis must be calculated. Although, this distributio is too complicated to describe for fiite sample size, it ca be characterized rather simply i the limit of ifiite sample size. The mai results o the asymptotic properties of the discrepacies are cotaied i Theorems 2.1 ad 2.3 below. Their proofs rely o the theory of U-type statistics [Ser80, Chapter 5]. Theorem 2.1. Uder the ull hypothesis (2.6), the statistic [D(P)] 2 give by (2.1) has the asymptotic property [D(P)] 2 a.s. (2.7) 0 ( ), for a arbitrary fuctio µ( ) ad a arbitrary costat β, where a.s. meas coverges almost surely. Proof. For P = {z 1,...,z } with z k =(z k1,...,z kd ) (k =1,...,), uder the ull hypothesis (2.6), the radom variables z kj (k =1,...,, j =1,...,d)havea uiform distributio U[0, 1]. Let (2.8) g 1 (z k )= [M + β 2 µ(z kj )], j=1

5 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 341 ad (2.9) h(z k, z l ) = j=1 ( M + β 2[ µ(z kj )+µ(z lj )+ 1 ]) 2 B 2( z kj z lj )+B 1 (z kj )B 1 (z lj ) for k, l =1,...,. The the square discrepacy give by (2.1) ca be writte as [D(P)] 2 = M d 2 g 1 (z k )+ 1 (2.10) 2 h(z k, z l ) k,l=1 ] = M d 2 g 1 (z k )+ 1 [2 2 h(z k, z l )+ h(z k, z k ) where (2.11) = M d 2 g 2 (z k )= k<l g 1 (z k )+ 1 2 ( 1) j=1 h(z k, z l )+ 1 2 k<l ( [ ]) 1 M + β µ(z kj)+b 1 (z kj ) 2. g 2 (z k ), Note that both {g 1 (z k )} ad {g 2(z k )} are sequeces of i.i.d. radom variables. By the strog law of large umbers, U 1 = 1 (2.12) g 1 (z k ) a.s. E[g 1 (z 1 )] = M d, ad 1 (2.13) g 2(z k ) a.s. E[g 2 (z 1 )] <. It follows that 1 2 g 2 (z k ) a.s. 0. By the theory of the U-type statistics [Ser80, Chapter 5], 2 (2.14) U 2 = h(z k, z l ) ( 1) is a secod-order U-statistic. By the strog law of large umbers for geeral U- statistics, a.s. U 2 E[h(z 1, z 2 )] = M d ( ). Therefore, by (2.10), as,[d(p)] 2 a.s. M d 2M d + M d =0. Corollary 2.2. Uder the ull hypothesis (2.6), it is true that D s (P) 2 a.s. 0, k<l D c (P) 2 a.s. 0, D (P) 2 a.s. 0, as,whered s (P) 2, D c (P) 2 ad D (P) 2 are give by (2.3)-(2.5), respectively. The followig theorem ad corollaries cosider pieces of the discrepacy defied i (2.1). These pieces are the re-combied to give ew statistics for testig multivariate ormality.

6 342 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI Theorem 2.3. Let U 1 ad U 2 be give by (2.12) ad (2.14), respectively. The, uder the ull hypothesis (2.6), ( ) U1 M d D U 2 M d N 2 (0, Σ) ( ), where D meas coverges i distributio, ad Σ is a sigular covariace matrix: ( ) 1 2 (2.15) Σ= ζ 2 4 1, where ζ 1 =(M 2 + β 4 c 2 ) d M 2d ad c 2 = 1 0 µ(x)2 dx. Proof. Sice the radom variable U 1 give by (2.12) is a first-order U-statistic ad the radom variable U 2 give by (2.14) is a secod-order U-statistic, by the cetral limit theorem for U-statistics, we have / (U1 EU 1 ) var ( ) D (2.16) U 1 N(0, 1), where EU 1 = M d by the proof of Theorem 2.1. It is easy to calculate var ( U 1 ) =(M 2 + β 4 c 2 ) d M 2d. By Lemma A of [Ser80, p. 183], we obtai the variace of U 2 : (2.17) var(u 2 )= 4( 2) ( 1) ζ ( 1) ζ 2, where ζ 1 is give by (2.15) ad ζ 2 =[M 2 + β 4 (2c 2 +1/90)] d M 2d.Bythetheorem i [Ser80, p. 189], U 2 ca be writte as (2.18) U 2 () =Û2()+R, where Û2() is a radom variable that ca be writte as a sum of i.i.d. radom variables as follows: Û 2 () EU 2 () = 2 (2.19) h 1 (z i ), for some fuctio h 1 ( ) [Ser80, p. 188, equatio (2)] ad EU 2 () =M d. Formula (2.19) ca be writte as (2.20) Û 2 () = 1 i=1 h 2 (z i ), where h 2 ( ) =2h 1 ( )+M d. I (2.18), R is a residual term, R = o p ( 1 (log ) δ) ( ), which implies that R teds to zeros i probability, where δ > 1/v (v >0) if E{h(z 1, z 2 )} v <. h(, ) is defied by (2.9). Uder the ull hypothesis (2.6), E{h(z 1, z 2 )} v < for ay v>0. Combiig (2.18) ad (2.20), we ca write U 2 = 1 i=1 h 2 (z i )+R. i=1

7 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 343 The (2.21) ( U1 U 2 ) = ( ) 1 g 1(z k ) h 2(z k )+R = 1 1 ( ) ( ) g1 (z k ) + 0. h 2 (z k ) R Uder the ull hypothesis (2.6), by the multivariate cetral limit theorem, we have ( ) U1 D 2-dimesioal ormal distributio U 2 because of the idepedece of the z k (k =1,...,)adthefactthat ( ) 0 P 0 ( ), R where P meas coverges i probability. It is easy to obtai the covariace betwee U 1 ad U 2 : cov( U 1, U 2 )=2[(M 2 + β 4 c 2 ) d M 2d ]=2ζ 1, where ζ 1 is the same as i (2.15). The we have ( ) U1 M d D U 2 M d N 2 (0, Σ), with Σ give by (2.15). This completes the proof. Theorem 2.3 implies that the asymptotic distributio of ( U 1, U 2 )isasigular (degeerate) ormal distributio. This property results i the followig corollary, which shows why the discrepacy itself is ot ecessarily a suitable goodessof-fit statistic. Corollary 2.4. Uder the ull hypothesis (2.6), the geeralized L 2 -type discrepacy [D(P)] 2 give by (2.1) has a further asymptotic property: [D(P)] 2 P 0 ( ). Proof. By the proof of Theorem 2.1, [D(P)] 2 ca be writte as [D(P)] 2 = M d 2U U g 2 (z k ), where g 2 ( ) is give by (2.11). By (2.13) ad (2.14), we ca write 1 ( ) 1 2 g 2 (z k )=O p, where the otatio f() =O p ( 1 ) meas f() P a costat. The we have [D(P)] 2 = 2 (U 1 M d )+ (U 2 M d ) 1 ( ) 1 (2.22) U 2 + O p.

8 344 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI a.s. Sice U 2 M d < ( ) by Theorem 2.1, we have (1/ )U 2 = O p (1/ ). The we ca write (2.22) as [D(P)] 2 = 2 (U 1 M d )+ ( ) 1 (2.23) (U 2 M d )+O p = ( ) ( ) U1 M ( 2, 1) d 1 U 2 M d + O p. By Theorem 2.3, we obtai [D(P)] 2 D N(0, a Σa), where a =( 2, 1) ad Σ is give by (2.15). It turs out that a Σa = 0. Therefore, [D(P)] 2 D 0. This implies [D(P)] 2 P 0. A better goodess-of-fit statistic tha the discrepacy ca be obtaied by a liear combiatio of U 1 ad U 2 that is ot a degeerate ormal distributio. This is the idea behid the statistic A defied below. Corollary 2.5. Uder the ull hypothesis (2.6), the statistic (2.24) A = [(U 1 M d )+2(U 2 M d )]/(5 ζ 1 ) D N(0, 1) ( ), where U 1 ad U 2 are defied by (2.12) ad (2.14), respectively, ad ζ 1 is give by (2.15). Proof. It was oted i the proof of Corollary 2.4 that a =( 2, 1) is a eigevector associated with the zero-eigevalue of Σ give by (2.15). O the other had, b = (1, 2) (a b = 0) is a eigevector associated with the eigevalue 5ζ 1 of Σ. By Theorem 2.3, E(A ) = 0 ad the variace var(a )=b Σ b/(25ζ 1 ) b Σb/(25ζ 1 )=1 ( ) uder the ull hypothesis (2.6), where Σ is the covariace matrix of ( U 1, U 2 ), which turs out to be ( ) ζ1 2ζ 1 (2.25) Σ = 4( 2) 2ζ 1 1 ζ ζ, 2 ad ζ 2 is the same as i (2.17). Assertio (2.24) holds as a result of Theorem 2.3. The statistic A ca be employed for testig hypothesis (2.6). Larger values of A imply rejectio for the ull hypothesis (2.6). It ca be verified that Σ give by (2.25) teds to sigularity very slowly. For example, uder the symmetric discrepacy, whe = 1000, the coditio umber of Σ = 1467 (d = 2), 1196 (d = 5) ad 841 (d = 10). Therefore, for fiite sample size (e.g., 1000), we recommed usig the ormal distributio N 2 (0, Σ ) as the approximate joit distributio of ( U 1, U 2 ). Based o this idea, we propose the followig χ 2 -type statistic for testig the ull hypothesis (2.6): (2.26) T = [(U 1 M d ), (U 2 M d )]Σ 1 [(U 1 M d ), (U 2 M d )]. The approximate ull distributio of T ca be take as the chi-squared distributio χ 2 (2). Larger values of T imply rejectio for the ull hypothesis (2.6). The

9 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 345 covergece rate of [D(P)] 2 P 0 i Corollary 2.4 is also very slow. For example, for the symmetric discrepacy, we ca write ( ) Ds (P) 2 = 2, 2d ( 1) (2.27) [ (U 1 EU 1 ), (U 2 EU 2 )] + R. The residual term R =[2 d ( 4 3 )d ]/ i (2.27) teds to zero very slowly. Whe =10, 000, R = (d = 2), (d = 5) ad (d = 10). For the three special cases of the geeralized L 2 -type discrepacy, we ca easily obtai the parameters eeded to defie the statistics A ad T i (2.24) ad (2.26): 1) the symmetric discrepacy: U 1 = 1 j=1 U 2 = 2d+1 ( 1) (1 + 2z kj 2zkj 2 ), k<l j=1 (1 z kj z lj ), M =4/3, ζ 1 =(9/5) d (6/9) d ad ζ 2 =2 d (16/9) d ; 2) the cetered discrepacy: U 1 = 1 U 2 = j=1 2 ( 1) ( z kj z kj ), k<l j=1 [ z kj z lj ] 2 z kj z lj ), M =13/12, ζ 1 =(47/40) d (13/12) 2d ad ζ 2 =(57/48) d (13/12) 2d ; 3) the star discrepacy: U 1 = 1 U 2 = j=1 2 ( 1) ( 3 zkj 2 k<l j=1 ), [2 max(z kj,z lj )], M =4/3, ζ 1 =(9/5) d (16/9) d ad ζ 2 =(11/6) d (16/9) d. 3. Mote Carlo study ad applicatios The exact fiite-sample distributios of the statistics A (give by (2.24)) ad T (give by (2.26)) uder the ull hypothesis (2.6) are ot readily obtaied. However, the effectiveess of the approximatio of their fiite-sample distributios by their asymptotic distributios ca be studied by Mote Carlo simulatio. The approximatio of the distributio of T by χ 2 (2) is iflueced ot oly by the covergece of U 1 ad U 2 but by the covergece of Σ as well, while the covergece of A to ormal N(0, 1) is iflueced oly by the covergece of U 1 ad U 2. Therefore, it is expected that the approximatio of the distributio of A by N(0, 1) is better tha that of T by χ 2 (2).

10 346 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI 3.1. Numerical comparisos betwee the fiite-sample distributios of A ad N(0, 1), ad T ad χ 2 (2). I the simulatio, for each sample size ( =25, 50, 100, 200), we geerate 10,000 uiform samples Z =(z 1,...,z ). The elemets of Z are i.i.d. U(0, 1)-variates. The we obtai 10,000 values of A ad T uder the three (symmetric, cetered ad star) discrepacies, respectively. These values are sorted ad the ordered samples of A ad T are obtaied. The empirical quatiles of A ad T, respectively, are calculated from 10,000 order statistics of A ad T. Tables 1, 2, ad 3 list the umerical comparisos betwee some selected 100(1 α)-percetiles (α = 1%, 5% ad 10%) of A ad T ad the 100(1 α)-percetiles of N(0, 1) ad χ 2 (2). Sice a test usig A is two-sided, ad atestusigt is oe-sided, we list the upper (U) ad lower (L) percetiles of both A ad N(0, 1), ad oly the upper percetiles of both T ad χ 2 (2). The closer the percetiles of A ad the percetiles of N(0, 1) are, the better approximatio we obtai by usig N(0, 1) as the approximate fiite-sample distributio of A. The same is true for the umerical compariso betwee the statistic T ad χ 2 (2). Several empirical coclusios ca be summarized from the umerical results i Tables 1 3: a) the stadard ormal N(0, 1) approximates the fiite-sample distributio of A better tha the χ 2 (2) approximates the fiite-sample distributio of T ; b) the approximatio of the fiite-sample distributio of A by N(0, 1), ad the approximatio of the fiite-sample distributio of T by χ 2 (2), appear to be the best for the symmetric discrepacy; c) the approximatio of the percetiles of the fiite-sample distributio of A ad the approximatio of the percetiles of the fiite-sample distributio of T for α =5%adα = 10% are much better tha for α =1%;ad d) the approximatio for the case = 25 is almost as good as those cases for = Type I error rates. Based o the umerical comparisos show i Tables 1 3, we perform a simulatio of the empirical type I error rates of the two statistics A ad T uder the three discrepacies. For coveiece, we choose the ull distributio of the radom vectors z i to be composed of i.i.d. U(0, 1) margials. I the simulatio, we geerate 2,000 sets of {z 1,...,z } for each with the compoets of z i cosistig of i.i.d. U(0, 1) variates. Tables 4, 5, ad 6 summarize the simulatio results o the type I error rates of A ad T uder the three discrepacies, where the percetiles of A are chose as those of N(0, 1) ad the percetiles of T as those of χ 2 (2). It shows that for α =5%adα = 10%, the type I error rates are better cotrolled for the statistics A ad T,whilethetypeIerrorrates for α = 1% ted to be large by usig the percetiles of N(0, 1) for A ad the percetiles of χ 2 (2) for T i most cases Power study. Now we tur to study the power of A ad T i testig hypothesis (2.6). The alterative distributios are chose to be meta-type uiform distributios. The theoretical backgroud of some meta-type multivariate distributios is give i [KS91] ad [FFK97]. The idea for costructig the alterative distributios is as follows. Let the radom vector x =(X 1,...,X d )havead.f. F (x) (x R d ). Deote by F i (x i ) the margial d.f. of X i (i =1,...,d), which is assumed to be cotiuous. Defie the radom vector u =(U 1,...,U d )by (3.1) U i = F i (X i ), i =1,...,d.

11 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 347 Table 1. Comparisos betwee the empirical percetiles of A, T ad the percetiles of N(0, 1) ad χ 2 (2) for the symmetric discrepacy (U=upper, L=lower) A α 1% 5% 10% α 1% 5% 10% N(0, 1) (U) χ 2 (2) (U) (L) d =2 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =5 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =10 = 25(U) = (L) (U) (L) (U) (L) (U) (L) T

12 348 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI Table 2. Comparisos betwee the empirical percetiles of A, T ad the percetiles of N(0, 1) ad χ 2 (2) for the cetered discrepacy (U=upper, L=lower) A α 1% 5% 10% α 1% 5% 10% N(0, 1) (U) χ 2 (2) (U) (L) d =2 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =5 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =10 = 25(U) = (L) (U) (L) (U) (L) (U) (L) T

13 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 349 Table 3. Comparisos betwee the empirical percetiles of A, T ad the percetiles of N(0, 1) ad χ 2 (2) for the star discrepacy (U=upper, L=lower) A α 1% 5% 10% α 1% 5% 10% N(0, 1) (U) χ 2 (2) (U) (L) d =2 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =5 = 25(U) = (L) (U) (L) (U) (L) (U) (L) d =10 = 25(U) = (L) (U) (L) (U) (L) (U) (L) T

14 350 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI Table 4. Empirical type I error rates of A ad T uder the symmetric discrepacy A α 1% 5% 10% 1% 5% 10% d =2 = d =5 = d =10 = T Table 5. Empirical type I error rates of A ad T uder the cetered discrepacy A α 1% 5% 10% 1% 5% 10% d =2 = d =5 = d =10 = T It is obvious that each U i i (3.1) has a uiform distributio U(0, 1), but the joit distributio of u = (U 1,...,U d ) may be quite differet from the uiform distributio i C d. If the joit d.f. F (x) ofx =(X 1,...,X d ) possesses a desity fuctio f(x) =f(x 1,...,x d ), where f i (x i ) deotes the margial desity fuctio of X i, the the joit desity fuctio of u =(U 1,...,U d ) ca be obtaied by a

15 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 351 Table 6. Empirical type I error rates of A ad T uder the star discrepacy A α 1% 5% 10% 1% 5% 10% d =2 = d =5 = d =10 = T direct calculatio: / (3.2) p(u 1,...,u d )=f(f1 1 (u 1 ),...,F 1 d)) f i (F 1 i (u i )), where (u 1,...,u d ) C d ad F 1 i ( ) deotes the iverse fuctio of F i ( ). It is clear that the complexity of (3.2) is determied by the joit distributio of x =(X 1,...,X d ) i (3.1). I particular, if the radom variables X i i (3.1) are idepedet, the the U i s give by (3.1) will be i.i.d. U(0, 1) variates. I this case, the radom vector u =(U 1,...,U d ) is uiformly distributed i C d. This ca be see from (3.2). I the simulatio, we choose the radom vector x =(X 1,...,X d ) i (3.1) to have a joit distributio belogig to the subclasses of elliptical distributios [FKN90, Chapter 3], where the parameters µ ad Σ are chose as µ = 0 ad Σ = (σ ij ), where σ ii =1adσ ij = σ ji = ρ =0.5 for1 i j d. Except the ormal distributio N d (µ, Σ), we give geeral expressios for the desity fuctios of the selected subclasses of elliptical distributios below: 1) the multivariate t-distributio x Mt d (m, µ, Σ), with desity fuctio give by C Σ 1/2 ( 1+m 1 (x µ) Σ 1 (x µ) ) (d+m)/2, m > 0, x R d, where C is a ormalizig costat (the followig C s have a similar meaig, but their values may be differet); 2) the Kotz type distributio, with desity fuctio give by C Σ 1/2 [ (x µ) Σ 1 (x µ) ] N 1 { exp r[(x µ) Σ 1 (x µ)] s}, r, s > 0, 2N + d>2, x R d ; d i=1

16 352 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI 3) the Pearso type VII distributio, with desity fuctio give by C Σ 1/2 ( 1+m 1 (x µ) Σ 1 (x µ) ) N, N > d/2, m>0, x R d ; 4) the Pearso type II distributio, with desity fuctio give by C Σ 1/2 ( 1 (x µ) Σ 1 (x µ) ) m, m > 1, (x µ) Σ 1 (x µ) < 1; 5) the multivariate Cauchy distributio MC d (µ, Σ), with desity fuctio give by Γ((d +1)/2) ( 1+(x µ) Σ 1 (x µ) ) (d+1)/2, x R d. π (d+1)/2 Whe the radom vector x =(X 1,...,X d ) is geerated by oe of the elliptical distributios above, the the radom vector u =(U 1,...,U d ) give by (3.1) is cosidered to have a meta-type uiform distributio deoted as follows: 0) u MNU whe x N d (0, Σ); 1) u MTU whe x has a multivariate t-distributio with m =5; 2) u MKU whe x has a Kotz type distributio with N =2,r =1ad s =0.5; 3) u MPV IIU whe x has a Pearso type VII distributio with N =10ad m =2; 4) u MPIIU whe x has a Pearso type II distributio with m =3/2; 5) u MCU whe x has a Cauchy distributio. The power of the multivariate test for uiformity is the probability that the statistical test correctly idetifies a sample comig from oe of the above distributios as beig o-uiform. Table 7 summarizes the simulatio results o the power of A ad T, where the simulatio is doe with 2,000 replicatios, the critical poits of both A ad T are chose as those of N(0, 1) ad χ 2 (2), respectively, ad the empirical samples from the elliptical distributios are geerated by the TFWW algorithm [Tas77] ad [FW94, pp ]. It shows that the two statistics A ad T uder the three discrepacies are powerful for testig uiformity i C d i most cases. The χ 2 -type statistic T seems to be more powerful tha the ormal-type statistic A. For both A ad T, they seem to be more powerful i the higher dimesioal case (d = 10) tha i the lower dimesioal case (d =5). Itisalso oticed that the two statistics A ad T uder the symmetric discrepacy seem to be more powerful tha uder the cetered discrepacy ad the star discrepacy i most cases. A power compariso betwee the two statistics A, T ad some existig statistics, such as the Kolmogorov-Smirov type statistic (1.1), seems to be ifeasible i high dimesios because of computatioal difficulties. Thus, we have ot performed such comparisos Applicatios. By usig the Roseblatt trasformatio [Ros52], we ca trasfer a test for the simple hypothesis (3.3) H 0 : x 1,...,x have a kow d.f. F (x),

17 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 353 Table 7. Empirical power of A ad T for testig multidimesioal uiformity agaist the meta-type uiform distributios (α =5%) discrepacy MNU MTU MKU MPVIIU MPIIU MCU d =5 A symmetric cetered star T symmetric cetered star d =10 A symmetric cetered star T symmetric cetered star

18 354 J.-J. LIANG, K.-T. FANG, F. J. HICKERNELL, AND RUNZE LI to a test for the uiformity of radom poits i C d, ad the apply the two statistics A ad T to test for uiformity of the trasformed variates i C d.deoteby (3.4) F 1 (x 1 ) = the margial d.f. of X 1, F 2 1 (x 2 x 1 ) = the coditioal d.f. of X 2 give X 1 = x 1,.. F k+1 (1,...,k) (x k+1 x 1,...,x k ) = the coditioal d.f. of X k+1 give (X 1,...,X k )=(x 1,...,x k ), where k =1,..., d 1. Perform the followig series of Roseblatt trasformatios o each observatio x i =(x i1,...,x id )(i =1,...,): (3.5) U i1 = F 1 (x i1 ) U i2 = F 2 1 (x i2 x i1 ).. U i,k+1 = F k+1 (1,...,k) (x i,k+1 (x i1,...,x ik )), where i =1,..., ad k =1,...,d 1. If the ull hypothesis (3.3) is true, the radom vectors u i =(U i1,...,u id )(i =1,...,) are i.i.d. ad the compoets of u i have a uiform distributio U[0, 1]. The more frequet cases i applicatios are those hypotheses which ivolve ukow parameters i the ull distributios. Justel, Peña ad Zamar [JPZ97] proposed a multivariate versio of the Kolmogorov-Smirov type statistic (1.1) for testig the simple hypothesis (3.3). Their statistics are difficult to compute for large dimesios (d 3) ad always require estimatig ukow parameters i the ull distributio. The two statistics A ad T developed i this paper are easy to compute i arbitrary dimesio, ad estimatio of ukow parameters ca be avoided whe the ull distributio belogs to the class of the multivariate ormal distributios, the spherically symmetric distributios, the l 1 -orm symmetric distributios [FKN90, Chapter 5], the l p -orm symmetric distributios [YM95], or the L p -orm spherical distributios [GS97]. Refereces [AD54] T. W. Aderso ad D. A. Darlig, A test of goodess-of-fit, J. Amer. Statist. Assoc. 49 (1954), MR 16:1039h [DS86] R. B. D Agostio ad M. A. Stephes (eds.), Goodess-of-fit techiques, Marcel Dekker, Ic., New York ad Basel, MR 88c:62075 [FFK97] H. B. Fag, K. T. Fag, ad S. Kotz, The meta-elliptical distributios with give margials, Tech. Report MATH-165, Hog Kog Baptist Uiversity, [FKN90] K. T. Fag, S. Kotz, ad K. W. Ng, Symmetric multivariate ad related distributios, Chapma ad Hall, Lodo ad New York, MR 91i:62070 [FW94] K. T. Fag ad Y. Wag, Number-theoretic methods i statistics, Chapma ad Hall, Lodo, MR 95g:65189 [GS97] A. K. Gupta ad D. Sog, L p-orm spherical distributios, J. Statist. Pla. Iferece 60 (1997), MR 98h:62092 [Hic98a] F. J. Hickerell, A geeralized discrepacy ad quadrature error boud, Math.Comp. 67 (1998), MR 98c:65032 [Hic98b] F. J. Hickerell, Lattice rules: How well do they measure up?, Radom ad Quasi- Radom Poit Sets (P. Hellekalek ad G. Larcher, eds.), Lecture Notes i Statistics, vol. 138, Spriger-Verlag, New York, 1998, pp CMP 99:06

19 TESTING MULTIVARIATE UNIFORMITY AND ITS APPLICATIONS 355 [Hic99] F. J. Hickerell, Goodess-of-fit statistics, discrepacies ad robust desigs, Statist. Probab. Lett. 44 (1999), CMP 99:17 [HW81] L. G. Hua ad Y. Wag, Applicatios of umber theory to umerical aalysis, Spriger- Verlag, Berli, ad Sciece Press Beijig, MR 83g:10034 [JPZ97] A. Justel, D. Peña, ad R. Zamar, A multivariate Kolmogorov-Smirov test of goodess of fit, Statist. Probab. Lett. 35 (1997), MR 98k:62087 [KS91] S. Kotz ad J. P. Seeger, A ew approach to depedece i multivariate distributios, Advaces i Probability Distributios with Give Margials (S. Kotz G. Dall Aglio ad G. Salietti, eds.), Kluwer, Dordrecht, Netherlads, 1991, pp MR 95k:62162 [MQ79] F. L. Miller Jr. ad C. P. Queseberry, Power studies of tests for uiformity II, Comm. Statist. Simulatio Comput. B8(3) (1979), [Ney37] J. Neyma, Smooth test for goodess of fit, J. Amer. Statist. Assoc. 20 (1937), [Nie92] H. Niederreiter, Radom umber geeratio ad quasi-mote Carlo methods, CBMS- NSF Regioal Cof. Ser. Appl. Math., vol. 63, SIAM, Philadelphia, MR 93h:65008 [Pea39] E.S.Pearso,The probabilty trasformatio for testig goodess of fit ad combiig idepedet tests of sigificace, Biometrika 30 (1939), [QM77] C. P. Queseberry ad F. L. Miller Jr., Power studies of some tests for uiformity, J. Statist. Comput. Simulatio 5 (1977), [Ros52] M. Roseblatt, Remarks o a multivariate trasformatio, A. Math. Statist. 23 (1952), MR 14:189j [Ser80] R. J. Serflig, Approximatio theorems of mathematical statistics, Joh Wiley & Sos ic., New York, MR 82a:62003 [SJ94] I. H. Sloa ad S. Joe, Lattice methods for multiple itegratio, Oxford Uiversity Press, Oxford, MR 98a:65026 [Tas77] D. Tashiro, O methods for geeratiig uiform poits o the surface of a sphere, A. Ist. Statist. Math. 29 (1977), MR 58:8144 [Tez95] S. Tezuka, Uiform radom umbers: Theory ad practice, Kluwer Academic Publishers, Bosto, [Wat62] G. S. Watso, Goodess-of-fit tests o a circle. II, Biometrika 49 (1962), MR 25:1626 [YM95] X. Yue ad C. Ma, Multivariate l p-orm symmetic distributios, Statist. Probab. Lett. 24 (1995), MR 96i:62057 Departmet of Mathematics, Hog Kog Baptist Uiversity, Kowloo Tog, Hog Kog SAR, Chia, ad Istitute of Applied Mathematics, Chiese Academy of Scieces, Beijig, Chia address: jjliag@hkbu.edu.hk Departmet of Mathematics, Hog Kog Baptist Uiversity, Kowloo Tog, Hog Kog SAR, Chia, ad Istitute of Applied Mathematics, Chiese Academy of Scieces, Beijig, Chia address: ktfag@hkbu.edu.hk Departmet of Mathematics, Hog Kog Baptist Uiversity, Kowloo Tog, Hog Kog SAR, Chia address: fred@hkbu.edu.hk URL: fred Departmet of Statistics, Uiversity of North Carolia, Chapel Hill, NC, , Uited States of America address: lirz@ .uc.edu

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated