A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Efficient GMM LECTURE 12 GMM II

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 27: Optimal Estimators and Functional Delta Method

Asymptotic distribution of products of sums of independent random variables

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Convergence of random variables. (telegram style notes) P.J.C. Spreij

1 Introduction to reducing variance in Monte Carlo simulations

On the convergence rates of Gladyshev s Hurst index estimator

Gamma Distribution and Gamma Approximation

Lecture 2: Monte Carlo Simulation

Notes 27 : Brownian motion: path properties

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 19: Convergence

A NOTE ON INVARIANT SETS OF ITERATED FUNCTION SYSTEMS

Law of the sum of Bernoulli random variables

32 estimating the cumulative distribution function

Statistical Inference Based on Extremum Estimators

Self-normalized deviation inequalities with application to t-statistic

Berry-Esseen bounds for self-normalized martingales

Infinite Sequences and Series

A statistical method to determine sample size to estimate characteristic value of soil parameters

4. Partial Sums and the Central Limit Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Random Variables, Sampling and Estimation

The standard deviation of the mean

STA Object Data Analysis - A List of Projects. January 18, 2018

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

Asymptotic Results for the Linear Regression Model

Rank tests and regression rank scores tests in measurement error models

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Riesz-Fischer Sequences and Lower Frame Bounds

Lecture 33: Bootstrap

6.3 Testing Series With Positive Terms

5. Likelihood Ratio Tests

Topic 9: Sampling Distributions of Estimators


Stochastic Simulation

Central limit theorem and almost sure central limit theorem for the product of some partial sums

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Lecture 11 October 27

1 Inferential Methods for Correlation and Regression Analysis

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Boundaries and the James theorem

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Limit distributions for products of sums

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Rates of Convergence by Moduli of Continuity

On forward improvement iteration for stopping problems

The random version of Dvoretzky s theorem in l n

MAT1026 Calculus II Basic Convergence Tests for Series

Detailed proofs of Propositions 3.1 and 3.2

This is an introductory course in Analysis of Variance and Design of Experiments.

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Kernel density estimator

Solution to Chapter 2 Analytical Exercises

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Introducing a Novel Bivariate Generalized Skew-Symmetric Normal Distribution

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Stat 421-SP2012 Interval Estimation Section

Estimation of the Mean and the ACVF

THE LIM;I,TING BEHAVIOUR OF THE EMPIRICAL KERNEL DISTRIBDTI'ON FUNCTION. Pranab Kumar Sen

A NOTE ON SPECTRAL CONTINUITY. In Ho Jeon and In Hyoun Kim

-ORDER CONVERGENCE FOR FINDING SIMPLE ROOT OF A POLYNOMIAL EQUATION

SOME TRIBONACCI IDENTITIES

Numerical Solution of the First-Order Hyperbolic Partial Differential Equation with Point-Wise Advance

7.1 Convergence of sequences of random variables

Matrix Representation of Data in Experiment

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

The log-behavior of n p(n) and n p(n)/n

Lecture 3: August 31

CMSE 820: Math. Foundations of Data Sci.

Properties and Hypothesis Testing

Accuracy Assessment for High-Dimensional Linear Regression

NEW FAST CONVERGENT SEQUENCES OF EULER-MASCHERONI TYPE

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Approximation theorems for localized szász Mirakjan operators

Estimation of the essential supremum of a regression function

Sieve Estimators: Consistency and Rates of Convergence

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Chapter 6 Principles of Data Reduction

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

A Note on Matrix Rigidity

Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Transcription:

J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a uified way. We propose a test statistic based o the rak statisitcs. The asymptotic distributio uder the ull hypothesis is show to be the supremum of the 2-dimesioal stadard Browia pillow. Also, the test is show to be cosistet uder the alterative that k distributio fuctios are liearly idepedet. It is importat from practical poit of view that our test is ot oly asymptotically distributio free but also distributio free eve for fixed fiite sample. Key words ad phrases: covergece. Empirical process, ivariace priciple, rak statistic, weak 1. Itroductio This paper studies the k-sample ad chage poit problems i a uified way. Both problems have log histories. Kiefer (1959) cosidered k-sample Kolmogorov-Smirov ad Cramér-vo Mises tests, while Scholz ad Stephes (1987) studied the k-sample Aderso- Darlig test. The approaches based o the empirical distributio fuctios are aive, but the limit distributios are ofte difficult to compute. O the other had, other approaches ofte eed some restrictios o alteratives. For example, Jockheere (1954) cosidered the k-sample test agaist ordered alteratives (see also Odeh (1971)), while Mack ad Wolfe (1981) treated that agaist umbrella alteratives. Hettmasperger ad Norto (1987) cosidered the k-sample problem agaist a pattered alterative. (O the cotrary, the alterative i our approach just requires that the k distributio fuctios are liearly idepedet.) Regardig the chage poit problems, may authors have cosidered parametric ad o-parametric approaches. See, e.g., the book by Csörgő ad Horváth (1997). Sice we are iterested i a o-parametric approach, we oly review precedig results i that directio. Pettitt (1979) applied the Wilcoxo- Ma-Whitey statistic to the o-parametric chage poit problem. Lombard (1987) proposed a procedure based o quadratic form rak statistics to test oe or more chage poits. (Oe of the iterestig poits of our approach is that we do ot require prior kowledge about how may chage poits exist uder the alterative.) Recetly, based o Lombard s approach, Murakami (2010) itroduced a rak statistic for the chage poit problem of locatio-scale parameters. Praagma (1988) established the Bahadur efficiecy of some rak tests for the o-parametric chage poit problem. As for estimatio problems, Carlstei Received Jauary 4, 2011. Revised March 24, 2011. Accepted April 15, 2011. *The Istitute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japa.

68 YOICHI NISHIYAMA (1988) proposed a estimator for a chage poit without assumig ay specific structure of the uderlyig distributio, amog others. Our idea comes from a kid of CUSUM empirical process, although the resultig test is a rak statistic. Our test has the followig merits. (1) It is distributio free uder the ull hypothsis; the distributio of our test depeds oly o the total sample size. (2) The asymptotic distributio uder the ull hypothesis is the supremum of the absolute value of the 2-dimesioal stadard Browia pillow; of course, it does ot deped o k. (3) Our alterative is atural; we oly assume that the k distributio fuctios are liearly idepedet. (4) Our test is easy to compute. The orgaizatio of the rest of this paper is as follows. I Sectio 2, we state some asymptotic results uder the ull ad alterative hypotheses. These results are proved i Sectio 3. I Sectio 4, we preset some simulatio studies. 2. Asymptotic results Let us describe two problems which we cosider i this paper. The first problem is the so-called k-sample problem. Let X1 c,...,xc c, c = 1,...,k,be1-dimesioal idepedet data such that for every c = 1,...,k the data X1 c,...,xc c come from a 1-dimesioal cotiuous distributio F c. We wish to test the hypotheses: H 0 : F c s are the same for all c =1,...,k(we deote the commo distributio by F ); H 1 : F c s are distributio fuctios that are liearly idepedet: that is, for weight costats (w 1,...,w k ) such that c w c =0,itholds that c w cf c ( ) 0 imply w c =0for all c. (The additioal costrait c w c =0 isot a real restrictio for the liear idepedece because of the fact that F c ( ) =1for all c.) The secod problem is the so-called chage poit problem. Let X 1,...,X be 1-dimesioal idepedet data. We wish to test the hypothesis: H 0 : all X i s come from a certai cotiuous distributio F ; H 1 : there exist 0 = u 0 <u 1 < <u k =1such that X i, i =[u c 1 ]+ 1,...,[u c ], c =1,...,k, come from a distributio F c where F c s are fuctios that are liearly idepedet. We ca treat the secod problem as a special case of the first, by regardig c =[u c ] [u c 1 ], hece from ow o we deal with the first problem. We shall assume that γ c = lim c, where = c c,as c for every c =1,...,k. (We assume that at least two γ c s are positive.) This is a atural assumptio for the secod problem where γ c = u c u c 1, ad throughout this paper we cosider this asymptotic scheme. I the first problem, let us set {X 1,...,X } = {X 1 1,...,X 1 1,...,X k 1,...,X k k }. Let us deote by {X (1),...,X () } the order statistics ad by {R 1,...,R } the

K-SAMPLE AND CHANGE POINT PROBLEMS 69 rak statistics of the data {X 1,...,X }.Wepropose the test statistic D = 1 i max 1 i,j 1{R q j} ij (2.1). The mai result of this paper is the followig. Theorem 1. (i) Uder the ull hypothesis H 0, it holds that, as, d D sup 0 s,t 1 W (s, t), where W is a cetered Gaussia process with the covariace E[W (s 1,t 1 )W (s 2,t 2 )] = (s 1 s 2 s 1 s 2 )(t 1 t 2 t 1 t 2 ). Hece the test is asymptotically distributio free uder H 0. (ii) Uder the alterative H 1, it holds that, as c for every c =1,...,k, D max B c (x) o( ) O P (1), sup 1 c k x where B c (x) = γ c (1 u )F c (x) γ c u F c (x) c c c>c ad where u = c c γ c. (Notice that the sum of coefficiets of F c s is zero, so by assumptio sup x B c (x) is positive.) Hece the test is cosistet uder H 1. It is importat from a practical poit of view that we do ot assume ay specific structure o the distributios F c s. Our approach is purely a o-parametric oe. It is also importat to otice that the value k is ot used i the costructio of our test, so we ca treat it as a ukow parameter. Compare our way of costructig the test statistic with the oes for other tests i the cotext of the chage poit problems where the value k has ofte bee assumed to be kow. 3. Proof of Theorem 1 To begi with, let us otice that 1 D := sup sup (1{i u} u})1{x i x} u [0,1] x R = max sup 1 i 1 i x R 1{X q x} i 1{X q x} + O 1 i = max 1 i,j 1{X q X (j) } i 1{X q X (j) } + O ( ) 1 = D + O. ( 1 ) ( 1 )

70 YOICHI NISHIYAMA H0 (o chage poit) 4 0 2 4 0 10 20 30 40 H1 (a chage poit) 4 0 2 4 0 10 20 30 40 Figure 1. Plots of the data. Proof of Theorem 1 (i). Sice sup u [0,1] 1 (1{i u} u) 1, the radom variable D is asymptotically equivalet to the supremum of the absolute value of u, x 1 (1{i u} u)(1{x i x} F (x)), which coverges weakly i l ([0, 1] R) tothe cetered Gaussia process u, x W (u, F (x)). (We deote by l (T ) the space of bouded fuctios o T, ad equip it with the uiform metric. See e.g. va der Vaart ad Weller (1996) for the weak covergece theory i this space.) Hece the result follows from the cotiuous mappig theorem. Proof of Theorem 1 (ii). Let c be ay of the idex c. Wealso deote by x the argusup of x B c (x). Notice that D 1 (1{i u } u )F τ(i) (x ) 1 (1{i u } u )(1{X i x } F τ(i) (x )),

K-SAMPLE AND CHANGE POINT PROBLEMS 71 0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 Figure 2. EDF of 1000 simulatios of D 40 uder H 0 (left) ad H 1 (right). 0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 1.2 Figure 3. EDF of 1000 simulatios of D 100 uder H 0 (left) ad H 1 (right). where τ(i) =c for i = c 1 +1,..., c.bythe cetral limit theorem, the secod term o the right had side coverges weakly to a tight limit. The first term ca be writte as (B c (x )+o(1)). The proof is fiished. 4. Remark for o-asymptotic case The way of defiig the test statistic (2.1) has a merit ot oly for asymptotic study but also for the fiite simple argumet. Sice the test is defied oly through the rak statistics, it is distributio free uder H 0 as far as the uderlyig distributio F is a 1-dimesioal cotiuous distributio. So, we ca compute the p-values for fixed by computer simulatio, by settig the uderlyig distributio F to be, for example, the uiform distributio. The p-values

72 YOICHI NISHIYAMA 0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 Figure 4. EDF of 1000 simulatios of D 10, D 40 ad D 100 uder H 0. obtaied by Mote-Carlo simulatio are much better tha the oe obtaied by usig the asymptotic distributio sup s,t W (s, t), especially whe is ot very large. Here we preset some umerical results by figures. For illustratio, we demostrate the simulatio for the chage poit problem with k = 2. Figure 1 is the plots of the idepedet data X 1,...,X from the Gaussia distributio N(0, 1) [upper], ad X 1,...,X [3/4] from N(0, 1) ad X [3/4]+1,...,X from N(1, 1) [lower] (with = 40). We perform 1000 simulatios for the computatio of the test statistic D. Figure 2 ( = 40) ad Figure 3 ( = 100) preset the empirical distributio fuctios (EDF) uder H 0 ad H 1. Fially, we draw the empirical distributio fuctios of 1000 simulatios for D uder H 0 for = 10, 40, 100 (Figure 4). We set F to be the uiform distributio, but the choice of the uderlyig distributio is ot importat because we kow that the test is distributio free. So we ca compute the approximate p-values eve for small based o this kid of computer simulatio. Ackowledgemets This work was supported by Grat-i-Aid for Scietific Research (C), 21540157, from Japa Society for the Promotio of Sciece. Refereces Carlstei, E. (1988). Noparametric chage-poit estimatio, A. Statist., 16, 188 197. Csörgő, M. ad Horváth, L. (1997). Limit Theorems i Chage-Poit Aalysis, Wiley, New York. Hettmasperger, T. P. ad Norto, R. M. (1987). Test for pattered alteratives i k-sample problems, J. Amer. Statist. Assoc., 82, 292 299. Jockheere, A. R. (1954). A distributio free k-sample test agaist ordered alteratives, Biometrika, 41, 133 145.

K-SAMPLE AND CHANGE POINT PROBLEMS 73 Kiefer, J. (1959). K-sample aalogues of the Kolmogorov-Smirov ad Cramér-V. Mises tests, A. Math. Statist., 30, 420 447. Lombard, F. (1987). Rak tests for chagepoit problems, Biometrika, 74, 615 624. Mack, G. A. ad Wolfe, D. A. (1981). K-sample rak tests for umbrella alteratives, J. Amer. Statist. Assoc., 76, 175 181. Murakami, H. (2010). A rak statistic for the chage-poit problem ad its applicatio, J. Jp. Soc. Comp. Statist., 23, 27 40. Odeh, R. E. (1971). O Jockheere s k-sample test agaist ordered alteratives, Techometrics, 13, 912 918. Pettitt, A. N. (1979). A o-parametric approach to the chage-poit problem, Appl. Statist., 28, 126 135. Praagma, J. (1988). Bahadur efficiecy of rak tests for the chage-poit problem, A. Statist., 16, 198 217. Scholz, F. W. ad Stephes, M. A. (1987). K-sample Aderso-Darlig tests, J. Amer. Statist. Assoc., 82, 918 924. va der Vaart, A. W. ad Weller, J. A. (1996). Weak Covergece ad Empirical Processes: With Applicatios to Statistics, Spriger-Verlag, New York.