THE LIM;I,TING BEHAVIOUR OF THE EMPIRICAL KERNEL DISTRIBDTI'ON FUNCTION. Pranab Kumar Sen

-e ON THE LIM;I,TING BEHAVIOUR OF THE EMPIRICAL KERNEL DISTRIBDTI'ON FUNCTION By Praab Kumar Se Departmet of Biostatistics Uiversity of North Carolia at Chapel Hill Istitute of Statistic~ Mimeo Series No. 1405 July 198

ON THE LIMITING BEHAVIOUR OF THE EMPIRICAL KERNEL DISTRIBUTION FUNCTION * By PRANAB KUMAR SEN Uiversity of North Carolia, Chapel Hill. For a estimable parameter of degree m( ~ I), Gliveko-Catelli lemma type result for the empirical kerel distributio ad weak covergece of the related empirical process are studied. Some statistical applicatios of these results are also cosidered. AMS Subject Classificatios: 60F17, 6G99 Key Words &Phrases : Blackma-type estimator; degree; Gliveko-Catelli lemma; kerel; reverse (sub-)martigales; U-statistics; weak covergece. * Work partially supported by the Natioal Heart, Lug ad Blood Istitute, Cotract NIH-NHLBI-71-43-L from the Natioal Istitutes of Health.

1. Itroductio. Let {X.,i>l} be a sequece of idepedet ad idetically ~ N~_~ ~~~ 1 - distributed radom vectors (i.i.d.r.v.) with a distributio fuctio (d. f.) F, defied o the real p-space EP, for some p ~ 1. Cosider a fuctioal 8(F) of the d.f. F, for which there exists a fuctio g(xl,,x m ), such that (1.1) = f. f g(xl,,xm)df(xl) df(x m ), for every F belogig to a class c.3< of d.f.'s o E P Without ay loss of geerality, we may assume that g(.) is a symmetric fuctio of its m argumets. If m(~ 1) is the miimal sample size for which (1.1) holds, the g(xl,,x m ) is called the kerel ad m the degree of 8 (F), ad a optimal (symmetric) estimator of 8 (F) is the V-statistic [ viz., Hoeffdig (1948)J (1.) g(x.,,x. ) ; C = { 1 < i < < i < }, 1 1 l 1 m,m - m - wheever > m. Let us assume that the kerel g (.) is real-valued ad deote bye. (1.3) H(y) = p{ g(xl,,x ) < y}, Y E E. m - We are primarily iterested i the estimatio of the d.f. H(y) i (1.3).Note that 8(F) (= fydh(y) ) is also a fuctioal of the d.f. H. Aalogous to (1.), we may cosider the followig estimator of H (to be termed the empirical kerel distributio fuctio (e.k.d.f.)): (1.4) I(g(X.,.,X.) <x), X E E, ::.m. 1 1-1 m Note that like V i (1.), H, for m _>, does ot ivolve idepedet summads, ad hece, the classical results o the asymptotic properties of the sample d.f. may ot be directly applicable to H Nevertheless, like the V, such asymptotic results ca be derived by usig some (reverse) sub-martigale theory. The mai objective of the preset study is to cosider the e.k.d.f. H 1 related empirical process {~{H (x) - H(x)},x E E} ad to study their asymptotic behaviour. ad the

Sectio is devoted to the study of the Gliveko-Catelli type almost sure (a.s.) covergece result o H - H. The weak covergece of the empirical k process (H - H ) is studied i Sectio 3. The cocludig sectio deals with some statistical applicatios of the results of Sectios ad 3.. Gliveko-Catelli lemma for H Note that by (1.3) ad (1.4), ---~---~~-~---------------~ (.1) EH(x) = H(x), for every x We are iterested i showig that (.) sup{ IH (x) - H(x) I : x E: E} + 0 a.s., as + 00 Towards this, let~ be the sigma-field geerated by the uordered collectio {X 1,,X }.ad X.,j > 1, for > 1. Note that ~ is mootoe oicreasig. +J - - /'-" Lemma. L { ~~~ IH (x) - Hex) I,,J!..' > m} is a reverse sub-martigale. Proof. Note that for every x E, ~ m, e (.3) E [ H (x) I ;-.' IJ = () -1 l: /'-+ m C EII(g(X.,.,X. ) < x)1,'" 1],m 1 1 - /"+ 1 m Now, give ~ l' X.,, X. ca be ay m of the uits X ~+ 1 1 1 m 1,.,.,X + 1 with the +l -1 equal coditioal probability ( m), so that for every 1 <i < < im:, 1 (.4) E[I(g(X....,X. ) < x)1 :t'. 1J = E[I(g(X, 1,X) < x)1 tj 1] 1 1 -.r-"+ m - /L+ 1 m +l -1 = ( ) l:c I (g (X. to. OJ X. ).::. x) = H +1 (x) m + 1 J,m l J m Thus, by (.3) ad (.4), for every ~ m, (.5) Sice E [ {H (x) s~p (.) property. Q.E.D. - H(x) }, X E: E I df J = {H lex) - H(x)},x E: E. (a.e.) /~J +l + is a covex fuctioal, (.5) isures the reverse sub-martigale Let ow * = [/mj be the largest iteger cotaied i /m, for > m. Also, for every ~ m, let * -1 *' (.6) H(x) = (*) Li=l I(g(X Ci.;.l)ii1H"."X im ).::. x), x E: E. * Note that H ivolves idepedet summads, ad, as i (.4), * (..7) E{ H (x) I.] = H (x), for every -x E: E..

4- By Lemma.1, (.7) ad the Kolmogorov iequality for reverse submartigales, we obtai that for every E > 0, ~ m, (.8) Now, H* p{ sup sup IHN(X) _ H(x) I _> E } N > x E E < E- l E{ sup IH (x) - H(x) I } X E E < E-lE{ E[ sup E I H*(x) - H(x) I - X E = E-lE{ sup E IH*cX) - H(x) I } XE. I,e] I } I. ] } - H relates to the classical case of idepedet summas for which the results of Dvoretzky, Kiefer ad Wolfowitz (1956) isure that for every r > 0, 1 * (.9) P{(*)Yz sup IH (x) _ H(x) I > r} < C e- r, U * ~ 1, xee ~ - Ii_ where C is a fiite positive costat, idepedet of rad *. Sice * _ /m, as + 00, (.9) isures that the right had side of (.8) coverges to a as + 00 This completes the proof of (.). We may also ote that for every ~.!,; * ~ 1, (*) {H* (X)-H(x)}/{1- H(x)},x E E} is a martigale, ad hece, by { the Hajek-Reyi-Chow iequality, it ca be show that the right had side of (.9) may be replaced by a more crude boud r -, for every r > 1, 50 that the covergece of the right had side of (.8) (to a as + 00) remais i tact. N_~ 1 3. Weak covergece of Yz(H - H). For the sake of simplicity, we assume that H(x) is a cotiuous fuctio of x E E ad deote by H-1(t) = if{ x: H(x) ~ t}, O<t<l. For every ( ~ m), we the itroduce a stochastic process W = {W(t); O<t<l} by lettig (3.1)!,; 1 W (t) = {H (H- (t)) - t }, a < t < 1. The, W belogs to the space DIO,l]. We ited to study the weak covergece of W to some appropriate (tied-dow) Gaussia fuctio W = {W(t);O~t~l}. Towards this, ote that for arbitrary r( ~ 1), 0 ~ t, l (Ap,A ),by (1.4) ad (3.1), r < < t < 1 ad o-ull r- A =

(3.) where r ~. 1 A.W (t.) J= J J = ~ L: r A. [ H (H- 1 (t,.)) j=l J' ' J 1 ~ = ~ {V - EV }, say, t. J J (3.3) ad (3.4) Thus, V (X.,.,X. ) ~l ~m r -1 (X.,..,X. ) = L:. 1 A. I(g(X.,,X. ) < H (t.)) ~l ~m J= J ~l ~m - J is a V-statistic ad we may borrow the classical results of Hoeffdig (1948) to show that the right had side of (3.) coverges i law to a ormal distributio with 0 mea ad a fiite variace (depedig o tl,,t ad A ). m - If we deote by (3.5) { -I -1 } I'; (s,t) = P g(xl,,x) < H (s), g(x 1,,X ) < H (t) - c m - m-c+ m-c - for every (s,t) E [0,1] ad c=o,l,,m (ote that 1';0(s,t) = 0 ), the, (3.6) E{[H (H-l(s)) - s][h (H-l(t)) - t J} = ()-l "m (m) (-m).,. (s t) f (t) [0 lj m '"'c=l c 'I-c "'c', or every s, E, Note that by (3.1) ad (.3.7) EW (s)w (t) (3.6), for every (s,t) E [O,lJ, + m 1';1 (s,t) = I';(s,t), say. as + 00, Thus, if we defie a Gaussia fuctio W= {W(t)jO<t~l}, such that EW = 0 st, ad the covariace fuctio of Wis give by { I';(s,t),(s,t) E [0,1]}, the from the above discussio it follows that the fiite dimesioal distributios (f.d.d.) of {W} coverge to those of W. Further, W (0) = 0 with probsbility 1 ad Wbelogs to the C[O,l] space, i probability. Hece, to establish the weak covergece of {W} to W, it suffices to show that {W } is tight. For this, it suffices to show that for every 0 ~ sl<5 < s3~ iteger (3.8) 1, there exist a [See Theorem 15.6 of Billigsley (1968), i this cotext.] For this, we * * * defie W = {W (t)jo<t<l} as i (3.1) with H beig replaced by H The, --

6 (3.9) * O the other had, W (.) ivolves * idepedet summads, ad hece, usig the momet geeratig fuctio of the multiomial distrubutio, we obtai that { * *] [ * ) *()]} E [ W (s) - W(sl) W (s3 - W s (3.10) < 5(/*) (s- sl) (s3 - s) < 5(m+l) (s- sl)(s3 - s), for every 0~sl<s<s3 ~l ad ~ m. Thus, (3.8) follows from (3.9) ad (3.10). Hece, we arrive at the followig. Theorem 3.1. W i (3.1) coverges i law to the Gaussia fuctio Wwith EW= - - ~ o ad covariace fuctio ~(s,t), give br (3.7). Note that ~(s,t) = 0 whe s or t is equal to 0 or I, ad hece, Wis tieddow at t = 0 ad t = 1. However, for m ~, i geeral, ~(s,t) is ot equal ~. to mi(s,t) - st, so that Wis ot ecessarily a Browia bridge. 4. ~~~~_~EE~~~~~~~~~' Let Xl",.,X be i.i.d.r.v.'s with a d.f. F(x) = F ((X-~)/0) where F is a specified d.f. ad the locatio ad scale parameters o 0 ~ ad 0 are ukow. Blackma(1955) has cosidered the estimatio of the locatio parameter (whe 0 is assumed to be specified) based o the empirical d.f. We cosider here a similar estimator of 0 whe ~ is ot specified. Note that if we let g(x.,x.) = (X. _ x.)/, the Eg(X.,X.) = E(Xl-~) = Var(X) = 1. J 1. J c 1. J 0, where c is a specifed positive o costat ad depeds o the specified 0 d.f. F We assume that F admits a fiite variace, ad, without ay loss of o 0 geerality, we may set c = 1. Thus, we have a kerel of degree ad the o empirical kerel d.f. H may be defied as i (1.4) with m=. We defie H(y) as i (1.3) ad sice F is specified, we may rewrite Hey) as o + (4.1) H(y) = H0 (y/ 0), Y E E = [ 0, CXl), where H depeds o F ad is of specified form too. Let the o 0

'7 00 + (4.) M(t) = fa [ H(ty) ~ HoCY) ] dho(y), tee As a estimator of 8 = 0, we cosider (4.3) A M (8 ) = if t M (t) Note that if we rewrite M(t) as e which is a solutio of dho(y), the, for t away from 8, M(t) blows up as + 00, while, for t close to 8, we may proceed as i Pyke(1970) ad through some routie steps obtai that as + 00 (4.4) ~ A ( 8-8 ) where h + 0 Cl) P is the desity fuctio correspodig to H o 0 ~ A i (3.1). Hece, the asymptotic ormality of ( e - 8) ad W (.) is defied as ca be obtaied from Theorem 3.1 ad (4.4). A similar treatmet holds for other Blackma-type estimators of estimable parameters whe the uderlyig d.f. is specified (apart from some.~ ukow parameters). I the cotext of tests of goodess of fit whe some of the parameters are ukow, a alterative procedure may be suggested as follows. Correspodig to the ukow parameters, obtai the kerels ad for these kerels, cosider the correspodig empirical kerel d.f.'s. The, a multivariate versio of Theorem 3.1 may be employed for the goodess of fit problem, usig either the Kolmogorov- Smirov or the Cramer-vo Mises' type statistics. The theory also ca be exteded to the two-sample case o parallel lies. REFERENCES BILLINGSLEY, P. (1968). Covergece of Probability Measures. New York: Wiley. BLACKMAN, J.(1955). O the approximatio of a distributio fuctio by a empiric distributio. A. Math. Statist. 6, 56-67. DVORETZKY, A., KIEFER, J. ad Wolfowitz, J. (1956). Asymptotic miimax character of the sample distributio fuctio ad the classical multiomial estimator. A. Math. Statist. ~, 64-669.

HOEFFDING, W. (1948). A class of statistics with asymptotically ormal distributio. A. Math. Statist. ~, 93-35. PYKE, R. (1970). Asymptotic results for rak statistics. I Noparametric Techiques i Statistical Iferece (ed:m.l.puri), New York: Cambridge Uiv. Press, pp.l-37o