KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

Similar documents
Stat 710: Mathematical Statistics Lecture 31

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

Nonparametric hypothesis tests and permutation tests

Asymptotic Statistics-III. Changliang Zou

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Non-parametric Inference and Resampling

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

A NOTE ON THE COMPLETE MOMENT CONVERGENCE FOR ARRAYS OF B-VALUED RANDOM VARIABLES

Unit 14: Nonparametric Statistical Methods

Lecture 1: August 28

ON TWO RESULTS IN MULTIPLE TESTING

Advanced Statistics II: Non Parametric Tests

Non-parametric Inference and Resampling

This condition was introduced by Chandra [1]. Ordonez Cabrera [5] extended the notion of Cesàro uniform integrability

Complete moment convergence of weighted sums for processes under asymptotically almost negatively associated assumptions

Lecture 28: Asymptotic confidence sets

Empirical Power of Four Statistical Tests in One Way Layout

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Lecture 11. Multivariate Normal theory

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

IMPROVING TWO RESULTS IN MULTIPLE TESTING

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Empirical Likelihood Tests for High-dimensional Data

Inferential Statistics

ON PITMAN EFFICIENCY OF

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Nonparametric Location Tests: k-sample

Asymptotic Statistics-VI. Changliang Zou

Lecture 3 - Expectation, inequalities and laws of large numbers

A Signed-Rank Test Based on the Score Function

Summary of Chapters 7-9

5 Introduction to the Theory of Order Statistics and Rank Statistics

Central Limit Theorem ( 5.3)

Yunhi Cho and Young-One Kim

f (1 0.5)/n Z =

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.

Lecture 32: Asymptotic confidence sets and likelihoods

ECON Fundamentals of Probability

Institute of Actuaries of India

Bivariate Paired Numerical Data

WLLN for arrays of nonnegative random variables

4. Distributions of Functions of Random Variables

1 Probability theory. 2 Random variables and probability theory.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Hypothesis testing. Anna Wegloop Niels Landwehr/Tobias Scheffer

Lecture 2: Review of Probability

6 The normal distribution, the central limit theorem and random samples

simple if it completely specifies the density of x

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Multiple Regression Analysis

NEW CONSTRUCTION OF THE EAGON-NORTHCOTT COMPLEX. Oh-Jin Kang and Joohyung Kim

Quick Tour of Basic Probability Theory and Linear Algebra

The symmetry in the martingale inequality

Gaussian vectors and central limit theorem

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Class 8 Review Problems solutions, 18.05, Spring 2014

Analysis of Experimental Designs

1 Review of Probability and Distributions

PRECISE ASYMPTOTIC IN THE LAW OF THE ITERATED LOGARITHM AND COMPLETE CONVERGENCE FOR U-STATISTICS

Introduction to Nonparametric Statistics

Notes on Time Series Modeling

Spring 2012 Math 541B Exam 1

. Find E(V ) and var(v ).

Multivariate Time Series: Part 4

Intermediate Econometrics

Shih-sen Chang, Yeol Je Cho, and Haiyun Zhou

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

UCSD ECE250 Handout #24 Prof. Young-Han Kim Wednesday, June 6, Solutions to Exercise Set #7

Econometrics Multiple Regression Analysis: Heteroskedasticity

Conditional moment representations for dependent random variables

A REVERSE TO THE JEFFREYS LINDLEY PARADOX

Multivariate Distributions

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

THE EXPONENTIAL DISTRIBUTION ANALOG OF THE GRUBBS WEAVER METHOD

Orthogonal decompositions in growth curve models

Chapter 11 Sampling Distribution. Stat 115

USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST

Chapter 2. Discrete Distributions

Statistics (1): Estimation

Contents. Acknowledgments. xix

Negative Multinomial Model and Cancer. Incidence

A Central Limit Theorem for the Sum of Generalized Linear and Quadratic Forms. by R. L. Eubank and Suojin Wang Texas A&M University

Functions of Several Random Variables (Ch. 5.5)

Probability Theory and Statistics. Peter Jochumzen

CS145: Probability & Computing

Simple Linear Regression: The Model

STA 4322 Exam I Name: Introduction to Statistics Theory

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Rank-sum Test Based on Order Restricted Randomized Design

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY

Mark Scheme (Results) Summer 2007

Chapter 18 Resampling and Nonparametric Approaches To Data

Non-Parametric Statistics: When Normal Isn t Good Enough"

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Master s Written Examination

Transcription:

Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting distribution for the linear placement statistics under the null hypotheses has been provided by Orban and Wolfe [9] and Kim [5] when one of the sample sizes goes to infinity, and by Kim, Lee and Wang [6] when the sample sizes of each group go to infinity simultaneously. In this paper we establish the generalized Kruskal-Wallis one-way analysis of variance for the linear placement statistics.. Introduction and statement of main results To determine whether the mutually independent random samples come from a common underlying distribution, we use various kinds of two-sample tests. Two-sample tests that are valid with no assumptions (except the continuity of the underlying distribution), are called the distribution-free procedures. Some distribution-free procedures use the sample ranks; the rank-sum test of Wilcoxon [] and the median test of Mood [8]. Other distribution-free tests utilize the sample placements; the linear placement test of Orban and Wolfe [9] and the fixed and updated linear placement test of Kim [5]. Asymptotic theories for the sample rank tests are well developed; see Chernoff and Savage [], Govindarajulu, Le Cam and Raghavachari [3], Hajek [4], Pyke and Shorak [], and Dupac and Hajek [2]. However, asymptotic theories for the sample linear placement tests are quite limited. Orban and Wolfe [9] provided the limiting theory when one of the sample sizes goes to infinity. Kim and Wolfe [7] investigated the iterative asymptotic distribution when the control group sample size goes to infinity then the treatment group sample size goes to infinity. Similarly, Kim [5] established the limiting theory of the fixed and updated linear placement test when the control group sample size goes to Received March, 23. 2 Mathematics Subject Classification. Primary 6F5, 62E2, 62G2; Secondary 6E5, 6E. Key words and phrases. Kruskal-Wallis one-way analysis of variance, central limit theorem, linear placement statistic. This work was supported by the BK2 project of the Department of Mathematics, Yonsei University and Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2-99). 7 c 24 Korean Mathematical Society

72 Y. HONG AND S. LEE infinity. Moreover, Kim, Lee, and Wang [6] further established the central limit theorem when the sample sizes of each group go to infinity simultaneously. In this paper we develop the Kruskal-Wallis one-way analysis of variance based on linear placements. The usual Kruskal-Wallis one-way analysis of variance is based on ranks but our Kruskal-Wallis one-way analysis of variance is based on linear placements, which can be used for comparing more than two samples that are independent, or not related. Our main results are follows. First we fix our notations. The data set consists of k groups and the group i has n i observations X ij, j n i. Overall we have N = k n i observations. We assume that X ij, i k, j n i, are mutually independent samples from the continuous cumulative distribution functions G i (x) = F(x θ i ). For each X ij let V ij be the rank of X ij among all observations excluding observations from the group that X ij belongs, and let W ij be its normalized rank: V ij = n s (.) (X st X ij ), s i t= V ij (.2) W ij =. N n i We further let V i. be the sample mean rank of group i observations and V.. be the sample mean rank of all observations: = n i (.3) V ij, n i (.4) V i. V.. = N j= n i V ij. j= When the sample sizes of all groups go to infinity simultaneously, we have the following Kruskal-Wallis one-way analysis of variance based on linear placements. Theorem. If θ = = θ k and if for each i, as N n i N λ i, < λ i <, then as N (.5) H := 2 N(N +) n i (V i. EV i. ) 2 χ 2 k, where by (2.7), EV i. = (N n i )/2. In Figure we create 2 samples of H using iid standard normal X ij, i =,...,5, j =,...,n i. More specifically, in Figure (a) we consider the equal group sizes and in Figure (b) we consider the unequal group sizes. In

KRUSKAL-WALLIS ONE-WAY ANALYSIS 73 empirical c.d.f. of H c.d.f. of χ 2 k 5 5 2 H values (a) The equal group sizes; n =, n 2 =, n 3 = 2, n 4 = 995 and n 5 = 5. empirical c.d.f. of H c.d.f. of χ 2 k 5 5 2 H values (b) The unequal group sizes; n =, n 2 = 5, n 3 = 2, n 4 = 995 and n 5 = 5. Figure. Comparison of the empirical distribution of H and the Chi-square distribution. We create 2 samples of H using iid standard normal X ij, i =,...,5, j =,...,n i, from both the equal group sizes (a) and the unequal group sizes (b). In both cases we see that the distribution of H is very close to the Chi-square distribution. both cases we see that the distribution of the Kruskal-Wallis test statistic H based on linear placements is very close to the Chi-square distribution as we present in Theorem. If we have the equal group sizes, we replace EV i. in H with V.. and we still get the Chi-square limit. Theorem 2. If θ = = θ k and if for each i, as N (.6) n i = N k +f i(n), f i (N) = O(N βi ), β i < 2, then as N (.7) Ĥ := 2 N(N +) n i (V i. V.. ) 2 χ 2 k. However, if we don t have the same group sizes, we cannot replace EV i. in H with V.. to get the Chi-square limit. Proposition. If θ = = θ k, if for each i, as N (.8) n i = N k +f i(n), f i (N) = O(N βi ), β i <, and if for some i, as N (.9) n i = N k +f i(n), f i (N) = Ω(N βi ), then as N 2 < β i <, E(H Ĥ) and Var(H Ĥ).

74 Y. HONG AND S. LEE Remark. If in Proposition, instead of (.8) and (.9) we assume for each i, as N (.) n i = N k +f i(n), f i (N) = O(N βi ), β i 2, and if for some i, as N (.) n i = N k +f i(n), f i (N) = Ω(N βi ), β i = 2, we cannot draw any meaningful conclusion for this borderline case. empirical c.d.f. of Ĥ c.d.f. of χ 2 k 5 5 2 Ĥ values (a) The equal group sizes; n =, n 2 =, n 3 = 2, n 4 = 995 and n 5 = 5. empirical c.d.f. of Ĥ c.d.f. of χ 2 k 2 3 4 5 Ĥ values (b) The unequal group sizes; n =, n 2 = 5, n 3 = 2, n 4 = 995 and n 5 = 5. Figure 2. Comparison of the empirical distribution of Ĥ and the Chi-square distribution. We create 2 samples of Ĥ using iid standard normal X ij, i =,...,5, j =,...,n i, from both the equal group sizes (a) and the unequal group sizes (b). In the equal group sizes case (a) we see that the distribution of Ĥ is very close to the Chi-square distribution. However, in the unequal group sizes case (b) the distribution of Ĥ is far away from the Chi-square distribution. Proposition 2. If θ = = θ k and if for each i, as N (.2) and n i N λ i, < λ i <, (.3) λ i k for some i, then as N E(H Ĥ) and Var(H Ĥ).

KRUSKAL-WALLIS ONE-WAY ANALYSIS 75 In Figure 2 we create 2 samples of Ĥ using iid standard normal X ij, i =,...,5, j =,...,n i. More specifically, in Figure 2(a) we consider the equal group sizes and in Figure 2(b) we consider the unequal group sizes. In the equal group sizes case we see that the distribution of the Kruskal-Wallis test statistic Ĥ based on linear placements is very close to the Chi-square distribution as we present in Theorem 2. However, in the unequal group sizes case the distribution of Ĥ is far away from the Chi-square distribution as we present in Propositions and 2. Let S N be the total sum of the normalized ranks W ij, i.e., (.4) S N = n i W ij. j= Then we also have the following variant of the central limit theorem of the linear placement statistics of Kim, Lee, and Wang [6]. Theorem 3. If θ = = θ k and if for each i, as N (.5) and n i N λ i, < λ i <, (.6) λ i k for some i, then as N (.7) Ŝ := S N E(S N ) Var(SN ) N(,) in distribution. Remark. If in Theorem 3, instead of (.6) we assume for each i (.8) λ i = k, we cannot prove the CLT. However, even in this case based on the simulation we think the CLT should hold under certain mild conditions. A little caution is needed here. As we see in (3.), since in the extremely balanced case n = = n k = N/k, the sum k ni j= R ij is a deterministic constant N(N+)/2, S N is not random and hence in this case the CLT cannot be true. In Figure 3 we create 2 samples of Ŝ using iid standard normal X ij, i =,...,5, j =,...,n i. More specifically, in Figure 3(a) we consider the equal group sizes and in Figure 3(b) we consider the unequal group sizes. In both caseswe see that the distribution ofŝ is very closeto the standard normal distribution as we present in Theorem 3.

76 Y. HONG AND S. LEE empirical c.d.f. of Ŝ c.d.f. of N(,) 5 5 Ŝ values (a) The equal group sizes; n =, n 2 =, n 3 = 2, n 4 = 995 and n 5 = 5. empirical c.d.f. of Ŝ c.d.f. of N(,) 5 5 Ŝ values (b) The unequal group sizes; n =, n 2 = 5, n 3 = 2, n 4 = 995 and n 5 = 5. Figure 3. Comparison of the empirical distribution of Ŝ and the standard normal distribution. We create 2 samples of Ŝ using iid standardnormal X ij, i =,...,5, j =,...,n i, from both the equal group sizes (a) and the unequal group sizes (b). In both cases we see that the distribution of Ŝ is very close to the standard normal distribution. We can still consider the generalization of Theorem 3 in line with Kim, Lee, and Wang [6]. For a non-constant C function ϕ : [,] R we let n i S ϕ N = ϕ(w ij ). j= If θ = = θ k and if for each i, as N n i N λ i, < λ i <, then as N we conjecture that Ŝ ϕ := Sϕ N E(Sϕ N ) Var(S ϕ N ) N(,) in distribution. In Figure 4 with ϕ (x) = x 2, ϕ 2 (x) = x, ϕ 3 (x) = log(+x), ϕ 4 (x) = e x, we create 2 samples of Ŝϕi using iid standard normal X ij, i =,...,5, j =,...,n i. More specifically, on the left we consider the equal group sizes and on the right we consider the unequal group sizes. In these equal and unequal cases we see that the distribution of Ŝϕi is close to the standard normal distribution as we conjecture. More interestingly, we notice that the distribution of Ŝϕi for the unequal group sizes is closer to the standard normal distribution than that for the equal group sizes. This agrees with the fact that in the case ϕ(x) = x of Theorem 3, S N in the extremely balanced case n = = n k = N/k, is not random and hence in this case the CLT does not hold. So, for general ϕ we conjecture even further that in the extremely balanced case n = = n k =

KRUSKAL-WALLIS ONE-WAY ANALYSIS 77 empirical c.d.f. of Ŝϕ c.d.f. of N(,) 5 5 Ŝ ϕ values empirical c.d.f. of Ŝϕ c.d.f. of N(,) 5 5 Ŝ ϕ values empirical c.d.f. of Ŝϕ2 c.d.f. of N(,) 5 5 Ŝ ϕ2 values empirical c.d.f. of Ŝϕ2 c.d.f. of N(,) 5 5 Ŝ ϕ2 values empirical c.d.f. of Ŝϕ3 c.d.f. of N(,) 5 5 Ŝ ϕ3 values empirical c.d.f. of Ŝϕ3 c.d.f. of N(,) 5 5 Ŝ ϕ3 values empirical c.d.f. of Ŝϕ4 c.d.f. of N(,) 6 4 2 2 4 6 Ŝ ϕ4 values empirical c.d.f. of Ŝϕ4 c.d.f. of N(,) 5 5 Ŝ ϕ4 values Figure 4. Comparison of the empirical distribution of Ŝ ϕi and the standard normal distribution. With ϕ (x) = x 2, ϕ 2 (x) = x, ϕ 3 (x) = log( + x), ϕ 4 (x) = e x, we create 2 samples of Ŝϕ using iid standard normal X ij, i =,...,5, j =,...,n i, from both the equal group sizes n =, n 2 =, n 3 = 2, n 4 = 995, n 5 = 5 on the left and the unequal group sizes n =, n 2 = 5, n 3 = 2, n 4 = 995, n 5 = 5 on the right. In these equal and unequal cases we see that the distribution of Ŝϕi is close to the standard normal distribution. We also note that the distribution of Ŝϕi for the unequal group sizes on the right is closer to the standard normal distribution than that for the equal group sizes on the left. N/k, the CLT does not hold and that the convergence rate for the unequal group sizes is greater than that for the equal group sizes.

78 Y. HONG AND S. LEE In Section 2, we prove Kruskal-Wallis one-way analysis of variance based on linear placements, Theorems, 2, and Proposition using the tools developed for Kruskal-Wallis one-way analysis of variance based on ranks. In Section 3, we prove the variant of the central limit theorem of the linear placement statistics of Kim, Lee, and Wang [6] using Lyapunov CLT. 2. Kruskal-Wallis one-way analysis of variance based on linear placements In this section, we prove Kruskal-Wallis one-way analysis of variance based on linear placements, Theorems, 2, and Proposition using the tools developed for Kruskal-Wallis one-way analysis of variance based on ranks. To prove this, we start with the following lemma that characterizes the Chi-square distribution. Lemma. Let X = (X,...,X n ) T be an n dimensional multivariate normal random vector with mean vector and covariance matrix B. If the covariance matrix B satisfies the following two conditions then n k : X2 i B 2 = B and rank(b) = k, comes from the Chi-square distribution with the degree of freedom n Xi 2 χ 2 k. Proof. SinceBissymmetric, thereisanorthogonalmatrixpsuchthatp BP = Λ, where Λ is a diagonal matrix. Since rank(b) = k, Λ has k non-zero diagonal elements. By rearranging the coordinates if needed, we may assume that the first k diagonal elements of Λ are non-zero. Since B 2 = B, all the non-zero diagonal elements of Λ are in fact. Since X N n (,B) and since P BP = Λ, we see that Z := P X = (Z,...,Z n ) T is an n dimensional multivariate normal random vector with mean vector and covariance matrix Λ, i.e., Z N n (,Λ). Since the covariance matrix Λ is diagonal and since the first k diagonal elements of Λ are and all other elements of Λ are, Z,...,Z k are iid standard normal and Z k+ = = Z n =. Therefore, n n Xi 2 = X T X = Z T P PZ = Z T Z = Zi 2 = Zi 2 χ 2 k. Proof of Theorem. Let T := (T,...,T k ) T be given by 2n i (2.) T i := N(N +) (V i. EV i. ), i =,...,k. By Lemma, to prove Theorem it suffices to prove the following three; (2.2) T N k (,B),

(2.3) (2.4) KRUSKAL-WALLIS ONE-WAY ANALYSIS 79 B = B 2, rank(b) = k. To prove these we use the tools developed for Kruskal-Wallis one-way analysis of variance based on ranks. For each X ij let R ij be the rank of X ij among all the observations from all groups, and let L ij be the rank of X ij among all the observations from group i: n s R ij = (X st X ij ), L ij = s= t= n i (X it X ij ). t= We further let R i. be the sample mean rank of group i observations and R.. be the sample mean rank of all observations: (2.5) R i. = n i R ij := R i., n i n i (2.6) R.. = N j= n i R ij := N R... j= By the definitions of V ij, R ij, and L ij, we can rewrite V ij as (2.7) V ij = R ij L ij. Since (2.8) n i j= n i L ij = j= j = n i(n i +), 2 by (2.7) we see V i. EV i. = R i. ER i. and hence (2.9) V i. EV i. = R i. ER i.. So, we can rewrite T i as 2n i (2.) T i = N(N +) (R i. ER i. ), i =,...,k. Now the standard results on the Kruskal-Wallis one-way analysis of variance based on ranks say that T := (T,...,T k ) T with T i given by (2.) satisfies (2.2)-(2.4). Therefore, Theorem follows from Lemma. Lemma 2. For R ij, L ij, and R i., we have ER ij = N+ 2, Var(R ij) = (N )(N+) 2, Cov(R ij,r st ) = N+ 2, EL ij = ni+ 2, Var(L ij ) = (ni )(ni+) 2, Cov(L ij,l ik ) = ni+ 2, ER i. = ni(n+) 2, Var(R i. ) = ni(n ni)(n+) 2, Cov(R i.,r j. ) = ninj(n+) 2.

7 Y. HONG AND S. LEE Proof. By symmetry R ij is uniform on {,2,...,N}, i.e., for any observation X ij P(Rij = r) = for r N. N Also, by symmetry for any two different observations X ij and X st P(R ij = r,r st = l) = N(N ) for r l N. The lemma for R ij and R i. follows from the direct calculation using the above joint probability mass function. By the same reasoning one can get the lemma for L ij. By the definitions of Ĥ and H, we can decompose Ĥ into H and C; (2.) Ĥ = = where, by (2.9) 2 N(N +) 2 N(N +) n i (V i. V.. ) 2 n i ((V i. EV i. )+(EV i. V.. )) 2 = H +C, (2.2) [ ] 2 C := n i (EV i. V.. ) 2 + 2n i (EV i. V.. )(V i. EV i. ) N(N +) [ ] 2 = n i (EV i. V.. ) 2 + 2n i (EV i. V.. )(R i. ER i. ). N(N +) So by Theorem, to prove Theorem 2 it suffices to prove that under the conditions of Theorem 2, C in probability, or (2.3) EC, (2.4) Var(C), and to prove Propositions and 2 we just need to show that under the conditions of Propositions and 2, (2.5) EC, (2.6) Var(C). By (2.7), (2.8) and Lemma 2, (2.7) EV i. = n i E n i j= R ij n i E j= L ij = N n i. 2

KRUSKAL-WALLIS ONE-WAY ANALYSIS 7 By (2.7), (2.7) and (2.8), we see that EV i. V.. is actually a deterministic constant; (2.8) EV i. V.. = N n i 2 N n s V st = N s= t= s= n 2 s 2 n i 2. 2 k By (2.2) and (2.8), the random part of C is just N(N+) 2n i(ev i. V.. )R i.. Again, by (2.8) we can rewrite the major part k 2n i(ev i. V.. )R i. of the random part of C as (2.9) 2n i (EV i. V.. )R i. = = ( ) n 2 s 2n i N 2 n i 2 s= ( ) n 2 s N n i R i.. Takethe expectationand varianceonthe both sidesof(2.9). Then, bylemma 2 we have ( ) ( ) (2.2) E 2n i (EV i. V.. )R i. = n 2 s N n i E(R i. ) =, and (2.2) ( ) ( ( ) ) Var 2n i (EV i. V.. )R i. = Var n 2 s n i R i. N s= ( = N + ) 2 N n 3 i n 2 s. 2 So, by (2.2), (2.8), (2.2), and (2.2) we have (2.22) E(C) = 3f(N), (2.23) Var(C) = 2f(N), where (2.24) f(n) := N N 2 (N +) s= s= ( n 3 i s= n 2 s ) 2 Since n i = N k +f i(n) and since k f i(n) =, we have (2.25) f(n) = N N 2 (N +) ( ) 3 N k +f i(n) (. s= R i. ( ) ) 2 2 N k +f i(n)

72 Y. HONG AND S. LEE = N(N +) fi 3 (N)+ k(n +) ( 2 fi (N)) 2. N 2 (N +) fi 2 (N) Proof of Theorem 2. By dropping the last negative term in the right hand side of (2.25), we have (2.26) f(n) N(N +) fi 3 (N)+ k(n +) fi 2 (N). If by (.6) for all i, f i (N) = O(N β ), β < 2, by (2.26) as N f(n), and hence by (2.22)-(2.24) both the mean and the variance of C go to. Therefore, by (2.3) and (2.4) Theorem 2 follows. Proof of Proposition. By (.2) the first term in the right hand side of (2.25) is bounded by k N f2 i (N); ( ) (2.27) fi 3 (N) = o fi 2 (N). N(N +) N Again by (.2) (2.28) k N 2 fi 2 (N) k. So, by (2.28) among the second and third terms in the right hand side of (2.25) the second is the major term; ( ) 2 fi 2 k(n +) (N) (2.29) N 2 fi 2 (N +) (N) ( )( ) = fi 2 N + (N) k N 2 fi 2 (N). Combining(2.27),(2.28),and(2.29)weseethatf(N)isoforder N k f2 i (N); ( ) (2.3) f(n) = Ω fi 2 (N). N If by (.9) for some i, f i (N) = Ω(N βi ), 2 < β i <, by (2.3) as N f(n),

KRUSKAL-WALLIS ONE-WAY ANALYSIS 73 and hence by (2.22)-(2.24) both the mean and the variance of C go to. Therefore, by (2.5) and (2.6) Proposition follows. Proof of Proposition 2. Under the conditions(.2) and(.3) we have unequal < λ i < with k λ i =. For these λ i we consider a random variable X with P(X = λ i ) = λ i for i k. Let s calculate its variance. Since EX = k λ2 i and since EX2 = k we have 2 (2.3) Var(X) = EX 2 (EX) 2 = λi) 2. λ 3 i ( λ3 i, Since X is not a constant but finite random variable, it has a non-zero but finite variance. Therefore, by (2.3) we see that, for unequal < λ i < with k λ i = (2.32) < ( 2 λ 3 i λi) 2 <. By (2.24) and (2.32), as N ( 2 f(n) = N λ 3 i λi) 2 +o(), and hence by (2.22)-(2.24) both the mean and the variance of C go to. Therefore, by (2.5) and (2.6) Proposition 2 follows. 3. Proof of Theorem 3 In this section we use Lyapunov s CLT and prove Theorem 3. By (2.7) and (2.8) we can rewrite S N as (3.) S N = n i j= N + N n i R ij N + n i (n i +) 2(N n i ). Since {U ij := Rij N+ } have the following (joint) probability mass function ( P U ij = k ) = for k N, N + N ( P U ij = k ) N +,U i j = k = for k k N, N + N(N ) {U ij } have the following mean, variance, and covariance (3.2) EU ij = 2,

74 Y. HONG AND S. LEE (3.3) Var(U ij ) = 2N + 6N +6 4 = 2 6 N +O(N 2 ), (3.4) Cov(U ij,u i j ) = 3N2 N 2 2(N +)(N ) 4 = 2 N +O(N 2 ). Since N+ N n i λ i and since k n i(n i+) 2N(N n i) is non-random, we see that the asymptotics of S N is the same as that of T N, where T N is defined as n i (3.5) T N = A i U ij, where j= (3.6) A i = λ i. By (3.3), (3.4), and (3.6), n i (3.7) Var(T N ) = Var A i U ij = ij A 2 i ij [ = N 2 = N 2 j= ( 2 6 N +O(N 2 ) i j ij A i A i λ i ( λ i ) 2 λ i ( λ i ) 2 ) ( ) 2 N +O(N 2 ) = i = ( λ i λ i ( λ i )( λ i ) +O(N ) λ i λ i ) 2 +O(N ) Under the conditions (.5) and (.6) we have unequal < λ i < with k λ i =. For these λ i we consider a random variable Y with ( P Y = ) = λ i for i k. λ i Let s calculate its variance. Since EY = k λ i/( λ i ) and since EY 2 = k λ i/( λ i ) 2, we have ( ) 2 (3.8) Var(Y) = EY 2 (EY) 2 λ i = ( λ i ) 2 λ i. λ i Since by (.6), Y is not a constant but finite random variable, it has a nonzero but finite variance. Therefore, by (3.8) we see that, for unequal < λ i <. ]

with k λ i = (3.9) < KRUSKAL-WALLIS ONE-WAY ANALYSIS 75 λ i ( λ i ) 2 Therefore, by (3.7) and (3.9) we have ( λ i λ i (3.) Var(T N ) = Ω(N). ) 2 <. Since U ij, for any fixed δ >, E U ij EU ij 2+δ is bounded by. So, n i (3.) E A i U ij EA i U ij 2+δ = N j= n i A 2+δ i E U ij EU ij 2+δ j= [ λ i +o() ( λ i ) 2+δ ]. n i A 2+δ i By (3.) and (3.) one can easily check the Lyapunov condition; lim (Var(T n i N)) (+δ 2 ) E A i U ij EA i U ij 2+δ. n j= Hence, by Lyapunov s CLT Theorem 3 follows. References [] H. Chernoff and I. R. Savage, Asymptotic normality and efficiency of certain nonparametric test statistics, Ann. Math. Statist. 29 (958), 972 994. [2] V. Dupac and J. Hajek, Asymptotic normality of simple linear rank statistics under alternative II., Ann. Math. Statist. 4 (969), 992 27. [3] Z. Govindarajulu, L. Le Cam, and M. Raghavachari, Generalizations of theorems of Chernoff and Savage on the asymptotic normality of test statistics, Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 965/66) Vol. I: Statistics, pp. 69 638 Univ. California Press, Berkeley, Calif, 967. [4] J. Hajek, Asymptotic normality of simple linear rank statistics under alternatives, Ann. Math. Statist. 39 (968), 325 346. [5] D. Kim, A class of distribution-free treatments versus control tests based on placements, Far East J. Theor. Stat. 3 (999), no., 9 33. [6] D. Kim, S. Lee, and W. Wang, The asymptotic behavior of linear placement statistics, Statist. Probab. Lett. 8 (2), no. 2, 326 336. [7] D. Kim and D. A. Wolfe, Properties of distribution-free two-sample procedures based on placements, Far East J. Math. Sci. (993), no. 2, 79 9. [8] A. M. Mood, Introduction to the Theory of Statistics, McGraw-Hill, New York, 95. [9] J. Orban and D. A. Wolfe, A class of distribution-free two-sample tests based on placements, J. Amer. Statist. Assoc. 77 (982), no. 379, 666 672. [] R. Pyke and G. R. Shorack, Weak convergence of a two-sample empirical process and a new approach to Chernoff-Savage theorems, Ann. Math. Statist. 39 (968), 755 77.

76 Y. HONG AND S. LEE [] F. Wilcoxon, Individual comparisons by ranking methods, Biometrics (945), no. 6, 8 83. Yicheng Hong Department of Mathematics Yanbian University Yanji 332, P. R. China and Department of Mathematics Yonsei University Seoul 2-749, Korea E-mail address: tophych@gmail.com Sungchul Lee Department of Mathematics Yonsei University Seoul 2-749, Korea E-mail address: sungchul@yonsei.ac.kr