GOODNESS-OF-FIT TESTS VIA PHI-DIVERGENCES. BY LEAH JAGER 1 AND JON A. WELLNER 2 Grinnell College and University of Washington

Size: px
Start display at page:

Download "GOODNESS-OF-FIT TESTS VIA PHI-DIVERGENCES. BY LEAH JAGER 1 AND JON A. WELLNER 2 Grinnell College and University of Washington"

Transcription

1 The Annals of Statistics 2007, Vol. 35, No. 5, DOI: 0.24/ Institute of Mathematical Statistics, 2007 GOODNESS-OF-FIT TESTS VIA PHI-DIVERGENCES BY LEAH JAGER AND JON A. WELLNER 2 Grinnell College and University of Washington A unified family of goodness-of-fit tests based on φ-divergences is introduced and studied. The new family of test statistics S n s) includes both the remum version of the Anderson Darling statistic and the test statistic of Berk and Jones [Z. Wahrsch. Verw. Gebiete ) 47 59] as special cases s = 2ands =, resp.). We also introduce integral versions of the new statistics. We show that the asymptotic null distribution theory of Berk and Jones [Z. Wahrsch. Verw. Gebiete ) 47 59] and Wellner and Koltchinskii [High Dimensional Probability III 2003) Birkhäuser, Basel] for the Berk Jones statistic applies to the whole family of statistics S n s) with s [, 2]. On the side of power behavior, we study the test statistics under fixed alternatives and give extensions of the Poisson boundary phenomena noted by Berk and Jones for their statistic. We also extend the results of Donoho and Jin [Ann. Statist ) ] by showing that all our new tests for s [, 2] have the same optimal detection boundary for normal shift mixture alternatives as Tukey s higher-criticism statistic and the Berk Jones statistic.. Introduction. In this paper we introduce and study a new family of goodness-of-fit tests which includes both the remum version of the Anderson Darling statistic or, equivalently, Tukey s higher criticism statistics as discussed by Donoho and Jin [5]) and the test statistic of Berk and Jones [5] as special cases. The new family is based on phi-divergences somewhat analogously to the phi-divergence tests for multinomial families introduced by Cressie and Read [0], and is indexed by a real parameter s R: s = 2 gives the Anderson Darling test statistic, s = gives the Berk Jones test statistic, s = /2 gives a new Hellinger distance type) statistic, s = 0 corresponds to the reversed Berk Jones statistic studied by Jager and Wellner [24] and s = gives a Studentized or empirically weighted) version of the Anderson Darling statistic. We introduce the corresponding integral versions of the new statistics but will study them in detail elsewhere). Having a family of statistics available gives the possibility of better Received March 2006; revised December Supported in part by NSF Grants DMS and DMS VIGRE Grant). 2 Supported in part by NSF Grants DMS and DMS and by NI-AID Grant 2R0 AI AMS 2000 subject classifications. Primary 62G0, 62G20; secondary 62G30. Key words and phrases. Alternatives, combining p-values, confidence bands, goodness-of-fit, Hellinger, large deviations, multiple comparisons, normalized empirical process, phi-divergence, Poisson boundaries. 208

2 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 209 understanding of individual members of the family, as well as the ability to select particular members of the family that have different desirable properties. In Section 2 we introduce the new test statistics. In Section 3 we briefly discuss the null distribution theory of the entire family of statistics, and note that the exact distributions can be handled exactly for sample sizes up to n = 3000 via Noé s recursion formulas and possibly up to n = 0 4 via the recursion of Khmaladze and Shinjikashvili [3]) along the lines explored for the Berk Jones statistic by Owen [36]. We also generalize the asymptotic distribution theory of Jaeschke [22] and Eicker [7] for the remum version of the Anderson Darling statistic, and of Berk and Jones [5] and Wellner and Koltchinskii [43] for the Berk Jones statistic, by showing that the existing null distribution theory for s = ands = 2 applies to an appropriate version of) the whole family of statistics. We generalize the results of Owen [36] by showing that our family of test statistics provides a corresponding family of confidence bands. In Section 4 we study the behavior of the new family of test statistics under fixed alternatives. We show that for 0 <s< and fixed alternatives the test statistics always converge almost surely to their corresponding natural parameters. For <s<, we provide necessary and sufficient conditions on the alternative d.f. F for convergence to the corresponding natural parameter to hold, and show that the Poisson boundary phenomena noted by Berk and Jones for their statistic continues to hold for s< and for s<0 by identifying the Poisson boundary distributions explicitly. We also briefly discuss further large deviation results and connections between the work of Berk and Jones [5] and Groeneboom and Shorack [9]. In Section 5 we extend the results of Donoho and Jin [5] by showing that all our new tests for s [, 2] have the same optimal detection boundary for normal shift mixture alternatives as Tukey s higher-criticism statistic and the Berk Jones statistic. Our new family of test statistics not only provides a unifying framework for the study of a number of existing test statistics as special cases, but also gives the possibility of designing a new test with several different desirable properties. For example, the new statistic S n /2) satisfies both: a) it consistently estimates its natural parameter for every alternative and b) it has the same optimal detection boundary for the two point normal mixture alternatives of Donoho and Jin [5] as the existing test statistics S n 2) and S n ) considered by these authors. 2. The test statistics. Consider the classical goodness-of-fit problem: pose that X,...,X n are i.i.d. F,andletF n x) = n n i= {X i x} be the empirical distribution function of the sample. We want to test H : F = F 0 versus K : F F 0, where F 0 is continuous. By the probability integral transformation, we can, without loss of generality, pose that F 0 is the uniform distribution on [0, ],

3 2020 L. JAGER AND J. A. WELLNER F 0 x) = x ) 0, and that all the distribution functions F in the alternative K are defined on [0, ]. The basic idea behind our new family of tests is simple. For fixed x 0, ), the interval is divided into two sub-intervals [0,x] and x, ],and we can test the pointwise) null hypothesis H x : Fx)= x versus the pointwise) alternative K x : Fx) x using any of the general phi-divergence test statistics K φ F n x), x) proposed by Csiszár [] see also Csiszár [2] and Ali and Silvey [2]) and studied further in a multinomial context by Cressie and Read [0], where φ is a convex function mapping [0, ) to the extended reals R { }cf. Liese and Vajda [32], pages 0 and 22, and Vajda [39]). Then our proposed test statistics are of the form or S n φ) x K φ F n x), x) T n φ) K φ F n x), x) dx, 0 where the remum and/or integral over x may require some restriction depending on the choice of φ. In our particular case, we define φ = φ s for s R by [ s + sx x s ]/[s s)], s 0,, φ s x) xlog x ) + hx), s =, log/x) + x hx), s = 0 cf. Liese and Vajda [32], page 34), so that ) K s u, v) = vφ s u/v) + v)φ s u)/ v) = s s) { us v s u) s v) s }, s 0,. Note that this definition makes φ s continuous in s for all x in 0, ), and hence, K s is continuous in s for all u, v) 0, ) 2. Also note that K λ+ p, q) = I2 λp : q),wherep = p, p), q = q, q),andi 2 λ p : q) is as defined in 5.), Cressie and Read [0], page 456. Then our proposed test statistics S n s) and T n s) for s R are defined by K s F n x), x), if s, 0<x< ) S n s) K s F n x), x), if s< X ) x<x n) and 2) K s F n x), x) dx, if s>0, 0 T n s) Xn) K s F n x), x) dx, if s 0. X )

4 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 202 The reasons for changing the definitions of the statistics by restricting the remum or integral for different values of s will be explained in Section 3; basically, the restrictions must be imposed for some appropriate value of s in order to maintain the same null distribution theory for all values of s in [, 2]. The most notable special cases of these statistics are s {, 0, /2,, 2}: it is easily checked that K 2 u, v) = u v) 2 2 v v), ) ) u u K u, v) = u log + u) log, v v K /2 u, v) = 4 { uv u) v) } = 2 { u v ) 2 ) 2 } + u v, ) v v K 0 u, v) = K v, u) = v log + v)log, u) u K u, v) = K 2 v, u) = u v) 2 2 u u). It follows that: a) S n 2) is /2 times) the square of the remum form of the Anderson Darling statistic or, in its one-sided form, Tukey s higher criticism statistic ; see Donoho and Jin [5] and Section 5). b) S n ) is the statistic studied by Berk and Jones [5]. c) S n /2) is 4 times) the remum of the pointwise Hellinger divergences between BernoulliF n x)) and BernoulliF 0 x)); as far as we know, this is a new goodness-of-fit statistic [as are all the statistics S n s) for s/ {, 0,, 2}]. d) S n 0) is the reversed Berk Jones statistic introduced by Jager and Wellner [24]. e) S n ) is /2 times) a Studentized version of the remum form of the Anderson Darling statistic; see, for example, Eicker [7], page 6. f) T n ) is the integral form of the Berk Jones statistic introduced by Einmahl and McKeague [8]. g) T n 2) is the classical integral form of) the Anderson Darling statistic introduced by Anderson and Darling [3]. REMARK 2.. Note that K /2 r u, v) = K /2+r v, u) for r R and u, v 0, ), so the families of statistics S n s) and T n s) have a natural symmetry about s = /2. We will continue to use the s-parametrization of these families for reasons of notational simplicity.

5 2022 L. JAGER AND J. A. WELLNER 3. Distributions under the null hypothesis. 3.. Finite sample critical points via Noé s recursion. Owen [36]showedhow to use the recursions of Noé [35] to obtain finite sample critical points of the Berk Jones statistic R n = S n ) for values of n up to 000. See Shorack and Wellner [38], pages for an exposition of Noé s methods.) Jager and Wellner [24] pointed out a minor error in the derivations of Owen [36] and extended his results to the reversed Berk Jones statistic S n 0). Jager [23] gives exact finite sample computations for the whole family of statistics via Noé s recursions for values of n up to The C and R programs are available at the second author s website.) We will not give details of the finite-sample computations here but refer the interested reader to Jager and Wellner [24] and Jager [23]. See Jager and Wellner [24] and Jager [23] for plots of finite sample critical points and several finite sample approximations based on the asymptotic theory given here. During the revision of this paper we learned of an alternative finite-sample recursion for calculation of the null distribution of S n 2) proposed by Khmaladze and Shinjikashvili [3] which apparently works for n 0 4. Presumably this alternative recursion could be used for our entire family of statistics, but this has not yet been carried out Asymptotic distribution theory for S n s) under the null hypothesis. Limit distribution theory for S n 2) and S n ) under the null hypothesis follows from the work of Jaeschke [22] and Eicker [7]; see Shorack and Wellner [38], Chapter 6, pages for an exposition. These results are closely related to the classical results of Darling and Erdös [4]. Berk and Jones [5] stated the asymptotic distribution of their statistic R n = S n ). For details of the proof, see Wellner and Koltchinskii [43], with a minor correction as noted at the end of the proof here. Here we show that the limit distribution of ns n s) r n is the same doubleexponential extreme value distribution for all s 2, where r n = log 2 n + 2 log 3 n 2 log4π), with log 2 n loglog n) and log 3 n loglog 2 n). THEOREM 3. Limit distribution under null hypothesis). Suppose that the null hypothesis H holds so that F is the uniform distribution on [0, ]. Then for s 2 it follows that ns n s) r n d Y4 E 4 v, where E 4 v x) = exp 4exp x)) = PY 4 x).

6 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2023 Define b n = 2log 2 n, c n = 2log 2 n + 2 log 3 n 2 log4π), d n = n log n) 5, Z n d n x d n n Fn x) x x x). As will be seen, the proof involves the following four facts: Fact. Z n /b n p. Fact 2. b n Z n c n d Y4 E 4 v. Fact 3. /2)c2 n /b2 n = r n + o). Fact 4. ns n s) = /2)Z 2 n + o p). In the ranges s>2ands< we do not know a theorem describing the behavior of the statistics S n s) under the null hypothesis Confidence bands. Owen [36] showed how the Berk Jones statistic R n = S n ) can be inverted to obtain confidence bands for an unknown distribution function F. Similarly, the family of statistics S n s) yields a new family of confidence bands for F as follows: given a continuous d.f. F on R,define K s F n x), F x)), if s, <x< S n s, F ) K s F n x), F x)), if s<. X ) x<x n) By the inverse) probability integral transformation, P F S n s, F ) t) = P F0 S n s) t) for all t where S n s) is as defined in ) andf 0 is the uniform distribution on [0, ]. Hence, with q n s, α) denoting the upper α quantile of the distribution of S n s) under F 0 which is computable via Noé s recursion as discussed in Section 3. or can be approximated for large n via Theorem 3.), it follows that P F Sn s, F ) q n s, α) ) = P F0 Sn s) q n s, α) ) = α for each fixed α 0, ) and n. Hence, {F : S n s, F ) q n s, α)}={f : L n x; s,α) Fx) U n s; s,α) for all x R} yields a family of α confidence bands for F.HereL n x; s,α) and U n x; s,α) are random functions determined by s, α, n and the data in a straightforward way; see Owen [36], Jager and Wellner [24] and Jager [23] for details Asymptotic distribution theory for T n s) under the null hypothesis. Limit distribution theory for T n 2) was established by Anderson and Darling [3]. Einmahl and McKeague [8] noted that this carries over to T n ) for a proof, see Wellner and Koltchinskii [43]) and extended T n ) to other testing problems. Here

7 2024 L. JAGER AND J. A. WELLNER we show that the limit distribution of nt n s) is /2 times) the Anderson Darling limit distribution for all s [, 2], namely, the distribution of 3) A 2 0 [Ut)] 2 t t) dt = d j= Z 2 j jj + ), where U is a standard Brownian bridge process on [0, ] and Z,Z 2,... are i.i.d. N0, ); see, for example, Shorack and Wellner [38], pages THEOREM 3.2 Limit distribution of T n s) under the null hypothesis). Suppose that the null hypothesis H holds so that F is the uniform distribution on [0, ]. Then for <s 2 it follows that nt n s) d A 2 /2, where A 2 is the Anderson Darling limit defined in 3). We will not study the statistics T n s) further in this paper, but intend to continue their study elsewhere. 4. Limit theory under alternatives. The power behavior of individual members of our new family of statistics has previously been studied separately and somewhat in isolation: see, for example, Berk and Jones [5] for the Berk Jones statistic), Durbin, Knott and Taylor [6] and D Agostino and Stephens [3] for T n 2) compared to other integral goodness-of-fit statistics and Nikitin [34] for treatment of Bahadur efficiencies for many goodness-of-fit statistics. Interest in these test statistics has received new impetus via the use of appropriate one-sided versions of the test statistic S n 2) in the context of multiple testing problems; see, for example, Donoho and Jin [5], Jin [28] and Meinshausen and Rice [33]. See Cayón, Jin and Treaster [8] for an interesting application to detection of non- Gaussianity in the cosmic microwave background data gathered by the Wilkinson Microwave Anisotropy Probe WMAP) satellite, and see Cai, Jin and Low [7] for further work on estimation aspects of the problem in connection with the developments in Meinshausen and Rice [33].The work of Owen [36] on confidence bands derived from the Berk Jones statistic S n ) was apparently motivated in large part by the Bahadur efficiency results of Berk and Jones [4]. It is clear from the results of Donoho and Jin [5] and earlier efforts by Révész [37] to combine the strengths of the Kolmogorov and Jaeschke Eicker statistics that tests based on any of the statistics S n s) will do best against alternatives in the tails. As suggested by one of the referees of our paper, this may well be the Achilles heel of such test statistics, since the results of Révész [37] suggest that our statistics will have no asymptotic power for a large class of contiguous alternatives with departures from the null hypothesis in the middle of the distribution). On the other hand, having a family of statistics such as {S n s) : s R} available gives the possibility of choosing or designing ) a test with several desirable properties. We will return to this briefly in Section 6.

8 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2025 Here we study convergence of the family of statistics to their natural parameters under fixed alternatives, comment briefly on the Bahadur efficiency results of Berk and Jones [5] in light of the results of Groeneboom and Shorack [9], and show that the optimal detection boundary results of Donoho and Jin [5]extend to the whole family of statistics S n s) for s [, 2]. In spite of the negative results of Janssen [27] for goodness-of-fit statistics in general, much remains to be learned about the power behavior of the family {S n s)}. 4.. Almost sure convergence to natural parameter. Let F 0 be the Uniform0, ) distribution function as in Section 2. The Kolmogorov statistic D n F n F 0 has the property that for any distribution function F,ifX,...,X n are i.i.d. F, then a.s. D n F F 0 df). We call df) = F F 0 the natural parameter for the Kolmogorov statistic D n. As Berk and Jones [5] pointed out for their statistic R n = S n ), under alternatives F the convergence S n ) = R n = K F n x), x) a.s. K F x), x) rf) 0<x< 0<x< holds only under some condition on F the exact condition will be given below), and for a slightly more extreme F, namely, what we call the Poisson boundary distribution function, the behavior changes to convergence in distribution to a functional of a Poisson process rather than convergence to a natural parameter. Thus, Berk and Jones [5] showed that if Fx)= / + log/x)), then 4) d Nt) d S n ) = R n = t>0 t U, where N is a standard Poisson process and U Uniform0, ). It turns out that in the range 0 <s< the statistics S n s) behave analogously to the Kolmogorov statistic D n under fixed alternatives. Namely, we show that in this range the statistics converge almost surely to their natural parameter for all d.f. s F. PROPOSITION 4.. Suppose that X,...,X n are i.i.d. F and that 0 <s<. Then S n s) a.s. 0<x< K s F x), x) S s, F ). On the other hand, in the range s> we have the following criterion for almost sure convergence of the statistics S n s) to their natural parameters: PROPOSITION 4.2. Suppose that X,...,X n are i.i.d. F and that s>. Then S n s) a.s. 0<x< K s F x), x) S s, F ) if and only if F satisfies F u) F du <. u)))s )/s 0

9 2026 L. JAGER AND J. A. WELLNER By the inverse) probability integral transformation, the convergence in the last display is equivalent to E F [X X)] /s) <. As Berk and Jones [5] show, if for some γ>0, the distribution function F satisfies and Fx) {log/x)log 2 /x)) +γ }, x γ, Fx) { log / x) ) log 2 / x) )) +γ }, x γ, then R n S n ) a.s. 0<x< K F x), x) S,F) rf) <. It can be shown that this convergence holds if and only if 0 [x x)] Fx) Fx))dx < ; seejager[23] for details. We do not yet know sharp conditions for S n s) a.s. S s, F ) when s Poisson boundaries for s and s < 0. As noted in the previous subsection, the statistic R n = S n ) has a Poisson boundary d.f. F for which R n = S n ) d /U rather than R n = S n ) a.s. rf) S,F).Herewenote that this behavior persists for the entire range s andfors<0. For each fixed s [0, ) c, define the distribution function F s on [0, ] by F s 0) = 0 and, for 0 <x, by + x s ) /s, <s<, s 5) F s x) = ), + log/x) s =, sx s ) ) /s, s <0. Note that F s x) F x) as s for0 x. The following proposition includes the result 4) of Berk and Jones [5] when s =, and it agrees with the case b = /2 of Theorem 2 of Jager and Wellner [25]when s = 2. As in 4), let N be a standard Poisson process, and let U Uniform0, ). PROPOSITION 4.3 Poisson boundaries for s ands<0). i) Fix s and pose that X,...,X n are i.i.d. F s given in 5). Then S n s) d ) Nt) s d= s t>0 t su s. ii) Fix s<0and pose that X,...,X n are i.i.d. F s given in 5). Then S n s) d ) t s 6), s) t S Nt) where S = E is the first jump point of N.

10 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2027 REMARK 4.. The distribution of t S t/nt)), which is also the limiting distribution of {t/g n t)) : t ξ ) } where G n is the empirical distribution function of n i.i.d. Uniform0, ) random variables ξ,...,ξ n,isgivenby ) P t/nt)) > x t S k ) k = exp x) + x k exp kx), x > ; k! k= see Wellner [40], pages and Shorack and Wellner [38], page 42. This yields an explicit formula for the distribution of the random variable on the righthand side of 6). REMARK 4.2. Although the family of distributions F s satisfies F s x) exp /x )) F 0 x) as s 0, it appears that the natural limit in distribution under F 0 is S n 0) d = 0<x< K 0 F 0 x), x) in this case, so apparently convergence to the natural parameter continues to hold under F 0. We do not know if there is a more extreme) d.f. F 0 for which S n0) d gn) for a nondegenerate functional g of a standard Poisson process N. REMARK 4.3. If F K has Poisson boundary behavior at both 0 and, then natural generalizations of Proposition 4.3 involving two independent Poisson processes can easily be proved. For example, if F is the standard arcsin law with density π u /2 u) /2 0,) u), thens n s) d 2/π 2 ) max{ t>0 Nt)/ t)) 2, t>0 Ñt)/t))2 },wheren, Ñ are independent standard Poisson processes Bahadur efficiency comparisons. Berk and Jones [5] studied the Bahadur efficiency of their statistic S n ) = R n relative to weighted Kolmogorov statistics based on the work of Abrahamson []. As pointed out by Groeneboom and Shorack [9], however, the Bahadur efficacies of the weighted Kolmogorov statistics are 0 for weights heavier than the quite light) logarithmic weight function ψx) = logx x)) because the null distribution large-deviation result is degenerate for heavier weights. A second difficulty for Bahadur efficiency comparisons is that both the weighted Kolmogorov statistics and the Berk Jones statistic fail to converge almost surely to their natural parameters for sufficiently extreme alternative d.f. s F, and as noted by Berk and Jones [5] for the Berk Jones statistic and by Jager and Wellner [25] for the weighted Kolmogorov statistics, there is a certain Poisson boundary d.f. F for which the statistics converge in distribution to a functional of a Poisson process. Thus, comparisons of goodness-of-fit statistics of the remum type via Bahadur efficiency are rendered difficult by breakdowns in both the large-deviation theory under the null hypothesis and by failure of the statistics to converge almost surely under fixed alternatives. Nevertheless, it would be interesting to be able to make comparisons where possible.

11 2028 L. JAGER AND J. A. WELLNER To this end, we consider variants of our statistics in the range 0 <s< with the remum unrestricted as follows: Sn ur,+ s) = K s + F nx), x), 0<x< Sn ur s) = K s F n x), x), 0<x< Sn ur, s) = Ks F nx), x), 0<x< where K s u, v), if 0 <v<u<, K s + u, v) = 0, if 0 u v,, otherwise, K s u, v), if 0 <u<v<, Ks u, v) = 0, if 0 v u,, otherwise. Also, set Ku,v) = K u, v) = u logu/v) + u log u)/ v)) for u, v) 0, ) 2 and K + u, v) = K + u, v). Although we do not yet have large deviation results for the statistics S n s) or S n + s) X ) x<x n) K s + F nx), x),we can establish the following large deviation results for Sn ur,+ s) and Sn ur, s). THEOREM 4.4. Suppose that X,...,X n are i.i.d. with continuous d.f. F 0, the uniform distribution on 0, ). Fix s 0, ). Then n log P 0 S ur,+ n s) a ) inf 0<x< K+ τ s + x, a), x) 7) log[ s s)a] = g s + s a) for each 0 a</[s s)], where τ s + x, a) = inf{t : K+ s t, x) a}. Furthermore, n log P 0 S ur, n s) a ) inf 0<x< K τs x, a), x) log[ s s)a] = gs s a) for each a 0, where τs x, a) = {t : K s t, x) a}. Combining Theorem 4.4 with Proposition 4., we have the following corollary for the Bahadur efficacies of the statistics Sn ur,+ s) and Sn ur, s) with 0 <s<. COROLLARY 4.5. Let F be a continuous distribution function on [0, ]. Then the Bahadur efficacy of Sn ur,+ s) at the alternative F is ɛ s ± F ) = g± s S± s, F )) = g+ s S± s, F )), where g s + is defined in 7) and S ± s, F ) 0<x< K± s F x), x).

12 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2029 REMARK. Note that lim s g s + page 50, of Berk and Jones [5]. a) = a, in agreement with Theorem 2.2, REMARK. Since g s + a) = g s a) sa as s 0, the Bahadur efficacies of the statistics Sn ur,± s) tend to be smaller than the efficacies of the Berk Jones statistic S +,F)= rf) when the latter exists), and especially so for small s. This,together with extensive numerical computations of Jager [23], strengthens the case in favor of the statistics S n + s) = X ) x<x n) K s + F nx), x) with restricted remum. Unfortunately we do not yet know the large deviation behavior of these statistics with restricted remum. 5. Attainment of the Ingster Donoho Jin optimal detection boundary. Jin [28] and Donoho and Jin [5] consider testing in a sparse heterogenous mixture problem defined as follows: Suppose that Y,...,Y n are i.i.d. G on R and consider testing H 0 : G =, the standard N0, ) distribution function versus H : G = ɛ) + ɛ µ) for some ɛ 0, ), µ > 0. In particular, they consider the n-dependent alternatives H n) given by 8) H n) : G n = ɛ n ) + ɛ n µ n ) for ɛ n = n β,µ n = 2r log n, where /2 <β<and0<r<. By transforming to X i Y i ) i.i.d. F = G )) with the X i s taking values in [0, ]), the testing problem becomes test H 0 : F = F 0, the Uniform0, ) distribution function versus H : F = F 0 u) + ɛ { u) u) µ )} >F 0 u). [The corresponding n-dependent sequence is F n u) = u+ɛ n { u) u) µ n )} with the same choice of ɛ n and µ n as in 8).] Donoho and Jin [5] consider several different test statistics, among which the principal contenders are Tukey s higher criticism statistic HCn defined by HC n X ) x<x [α0 n]) nfn x) x) x x) for some α 0 > 0 they seem to usually take α 0 = /2), and a one-sided version of the Berk Jones statistic BJ n + n X ) x</2 K+ F nx), x),wherek s + u, v) K s u, v){0 <v<u<}.

13 2030 L. JAGER AND J. A. WELLNER Jin [28] see also Ingster [20, 2]) showed that the likelihood ratio test of H 0 versus H n) has a detection boundary defined in terms of the parameters β /2, ) and r 0, ) involved in 8) which is described as follows: set { β /2, /2 <β 3/4, ρ β) = ) 2, β 3/4 <β<. Then for r>ρ β), the likelihood ratio test which makes use of knowledge of β and r) is size and power consistent against H n) as n. Donoho and Jin [5] show that the tests of H 0 versus H n) based on HCn and BJ+ n are also size and power consistent as n and that both of these tests dominate several other tests based on multiple comparison procedures such as the sample range, sample maximum, FDR False Discovery Rate) and Fisher s method; see, for example, Figure of Donoho and Jin [5] and their Theorems.4 and.5. We show here that the tests based on appropriate one-sided versions of the statistics S n s), namely, ns + n s) n X ) x /2 K + s F nx), x), have the same detection boundary for testing H 0 versus H n) as the statistics HC n and BJ + n. More formally, define a function ρ sβ) such that if µ n = 2r log n and if we use a sequence of levels α n 0 slowly enough [slowly enough so that with q n s, α) as defined in Section 3.2, α n satisfies nq n s, α n ) = + o)) log log n], then for r>ρ s β), the resulting sequence of tests has power tending to as n, while for r<ρ s β), the sequence of tests has power tending to zero. In terms of the functions ρ s β), our theorem is as follows. THEOREM 5.. For each s [, 2], ρ s β) = ρ β) for /2 <β<. While this may not be too rising for s 2 in view of the Donoho Jin results for ns n + ) = BJ+ n and ns+ n 2) = /2)H C n )2, it seems new and interesting for s [, ). Figure gives smoothed histograms of the values of the statistics ns n s) r n under the null hypothesis H 0 solid line) and under the alternative hypothesis H n) dotted line) for n = , r = 0.5 and β = /2. This should be compared with Figure 2 on page 978 of Donoho and Jin [5] showing values of HCn and HC+ n, corresponding to our s = 2; their HC n + /n x /2 nfn x) x) + / x x). 6. Discussion and some further problems. In Section 4 we have shown that the statistics {S n s) : s R} behave quite differently for 0 <s<, for s 0and s. In particular, for 0 <s<, the statistics S n s) converge almost surely to their natural parameters for fixed F F 0. Moreover, the different Poisson boundary behaviors for s<0ands suggest that the statistics S n s) with

14 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 203 FIG.. Smoothed histograms of reps) values of the statistics S + n s) under the null hypothesis H 0 solid line) and alternative hypothesis H n) dotted line) with r = 0.5, β = /2 for s = 0, 0.2, 0.5,. s are geared toward heavy tails, while the statistics S n s) with s 0are geared more toward light tails. This also becomes apparent from plots of the functions K F x), x), K F 0 x), x), and of K 0 F x), x) and K 0 F 0 x), x), where F x) = / + log/x)) and F 0 x) = exp /x )); see, for example, Jager [23], page. In Section 5 we have shown that all of the statistics {S n s) : s 2} have the same optimal detection boundary for the two-point normal mixture testing problem considered by Donoho and Jin [5]. Thus, we have some flexibility in designing a test to detect these subtle tail alternatives, and yet behaving very stably under fixed alternatives in the sense of always consistently estimating a natural parameter).

15 2032 L. JAGER AND J. A. WELLNER Thus, it seems that the Hellinger-type) statistic S n /2) may be a very reasonable compromise test statistic. Here is a brief listing of some of the remaining open problems: Problem : What is the limit distribution of S n s) under the null hypothesis when s< ors>2? Problem 2: What are necessary and sufficient conditions for S n s) to converge to its natural parameter under fixed alternatives F for s 0? Problem 3: What is the large deviation behavior of S n s) under the null hypothesis for 0 <s<? Problem 4: Is there an appropriate contiguity theory for the statistics S n s)? The only example involving something similar of which we are aware is Theorem A of Bickel and Rosenblatt [6], but their results do not seem to apply to the statistics S n s).) It is fairly easy to construct versions of our statistics S n s) in more general settings by replacing the intervals [0,x] and x, ] with sets C and C c for C in some class of sets C. Then for testing H 0 : P = P 0 versus H : P P 0,anatural generalization of the statistics S n s) is S n s, C) = K s P n C), P 0 C)), C C where P n is the empirical measure of X,...,X n i.i.d. P. Problem 5: Do the statistics S n s, C) have reasonable power behavior for some of the chimeric alternatives of Khmaladze [30] for some choice of C? 7. Proofs. 7.. Proofs for Section 3. PROOF OF THEOREM 3.. We first carry out the proof for s<, and then indicate the changes that are necessary for s 2. Fix s [, ). Note that ) ) u K su, v) u u = φ s φ s = φ s ) φ s ) = 0 u=v v v u=v and 2 ) u s 2 ) u 2 Ku,v) = u s 2 v v + v v D su, v). Hence, it follows by Taylor expansion of u K s u, v) about u = v that K s u, v) = K s v, v) + u K su, v) u v) + 2 u=v 2 u 2 K su, v) u v) 2 u=u = u v)2 D s u,v)= 2 u v)2 D s u,v)

16 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2033 for some u satisfying u v u v. This yields 9) K s F n x), x) = 2 Fn x) x ) 2 Ds F n x), x) for 0 <x<, where F n x) x F nx) x ;thatis,x F n x) F nx) on the event x F n x) and F n x) F n x) x on the event F nx) x. We can write 9) as 0) where K s F n x), x) = F n x) x) 2 { + x x)d s F n x), x) }, 2 x x) Rem n x) x x)d s F n x), x) { = F ) x x) n x) s 2 F ) x x + n x) s 2 } x x ) = x 2 s ) x 2 s x) F n x) + x F n x) x) x ) x) x 2 s ) F n x) x 2 s + x F n x). Fix δ 0, /2). Nowforx [δ, δ], F n x) [δ/2, δ/2] a.s. for n N ω, so, much as in Wellner and Koltchinskii [43], Rem n x) =O p n /2 ). δ x δ For 0 <v /2 and s 2, the function u D s u, v) is monotone for u 0, /2], while for 0 <v /2 and s, the function u D s u, v) is monotone for u 0,bv,s)],wherebv,s) /+cv,s)) with cv,s) v)/v) s)/3 s),so x x)d s F n x), x) x x){d sx, x) D s F n x), x)} on the set {F n x) < /2 bx,s)}. SincePF n δ) /2 bδ,s)) 0forδ< /2 ands [, 2],weget ) ) Rem n x) x 2 s X ) x δ X ) x δ F n x) ) + x 2 s X ) x δ F n x) = O p ) + o p ) = O p )

17 2034 L. JAGER AND J. A. WELLNER by Shorack and Wellner [38], inequality, page 45, and inequality 2, 0.3.6), page 46. Alternatively, see Wellner [42], Lemma 2, page 75, and Remark ii). Similarly, ) Rem n x) x 2 s δ x<x n) F n x) 2) Now we write where S n s, I) S n s, II) δ x δ X ) x δ δ x<x n) + δ x<x n) = o p ) + O p ) = O p ). ) x 2 s F n x) S n s) = S n s, I) S n s, II) S n s, III), K s F n x), x) = K s F n x), x) = 2 δ x δ X ) x δ 2 Fn x) x ) 2 Ds F n x), x), Fn x) x ) 2 Ds F n x), x), S n s, III) K s F n x), x) = 2 Fn x) x ) 2 Ds F n x), x). δ x<x n) δ x<x n) By the monotonicity of u D s u, v) for u /2 again, with probability tending to, S n s, II) 2 Fn x) x ) 2 {Ds F n x), x) D s x, x)} X ) x δ Fn x) x ) 2 {Ds F n x), x) D s x, x)}, 2 X ) x δ and similarly, by the monotonicity of u D s u, v) for /2 u<, For S n s, I), S n s, I) = 2 S n s, III) δ x δ Fn x) x ) 2 {Ds F n x), x) D s x, x)} δ x<x n) Fn x) x ) 2 {Ds F n x), x) D s x, x)}. δ x<x n) F n x) x) 2 { + O p n /2 )} x x) Fn x) x ) 2 {Ds F n x), x) D s x, x)}{ + O p n /2 )}. δ x δ

18 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2035 In the second region the argument above leading to ) yields S n s, II) 2 Fn x) x ) 2 {Ds F n x), x) D s x, x)} 2 X ) x δ X ) x δ and similarly for S n s, III). It follows that S n s) 2 Fn x) x ) 2 X ) x<x n) Fn x) x ) 2 {Ds F n x), x) D s x, x)}, 3) = 2 {D s F n x), x) D s x, x)}{ + O p n /2 )} { Fn x) x) 2 } { x x)d s F n x), x)} X ) x<x n) x x) and, on the other hand, S n s) 2 { + O p n /2 )} Fn x) x ) 2 {Ds F n x), x) D s x, x)} X ) x<x n) 4) = 2 { + O p n /2 )} { Fn x) x) 2 } { x x)d s F n x), x)} X ) x<x n) x x) { + O p n /2 )}. Now we break the remum into the regions [X ),d n ], [d n, d n ] and [ d n,x n) ) with d n = log n) k /n for any k. Then we have F n x) x) 2 n = o p bn 2 X ) x d n x x) ), where b n = 2log 2 n; see Shorack and Wellner [38], 26), page 602. Moreover, so x x)d s F n x), x) =O p ), X ) x d n F n x) x) 2 5) n #x x)ds F n x), x) ) = o p bn 2 X ) x d n x x) ) for # = or # =, and similarly for the region [ d n,x n) ] by a symmetric argument. On the other hand, if we define n Fn x) x Z n, x x) d n x d n

19 2036 L. JAGER AND J. A. WELLNER then, for k 5, Z n p 6) b n and 7) d b n Z n c n Y4 Ev 4, where c n = 2log 2 n + /2) log 3 n /2) log4π) see, e.g., Shorack and Wellner [38], page 600, 6..20) and 6..7)). [Note that for the middle bracket in 5)wehave ) x 2 s ) x 2 s x x)d s F n x), x) = x) + x, F n x) F n x) so ) x 2 s x x)d s F n x), x) + o p ), X ) x X ) F n x) X ) x d n so by Wellner [42], Remark, the probability of large values of the main term can be bounded by ) x 2 s ) x P >λ) = P X ) x d n F n x) X ) x d n F n x) >λ/2 s) ) x P X ) x F n x) >λ/2 s) e λ /2 s) exp λ /2 s)) ]. Furthermore, 8) almost surely where F n x) x x = Oa n ) d n an 2 log 2 n = log 2 n nd n log n) k 0; see Shorack and Wellner [38], page 424, 4.5.0) and 4.5.). It follows from 3), 4) and5) 8) that ns n s) = { nf n x) x) 2 + Op a n ) ) } o p bn 2 2 d n x d n x x) ) 9) { + O p n /2 )} = 2 {Z2 n o pb 2 n )}+o p).

20 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2037 Hence, we can write 2 Z2 n = 2 Z n c n /b n )Z n + c n /b n ) + 2 cn 2 bn 2 It follows that = 2 b nz n c n /b n ) Z n + c n /b n + b n 2 cn 2 bn 2. 20) ns n s) 2 cn 2 bn 2 = b n Z n c n /b n ) Z n + c n /b n 2b n = b n Z n c n /b n ) Z n/b n + c n /bn 2 2 d + Y 4 { }=Y 4 ; 2 o p bn 2 ) 2 here we used cn 2/b2 n b2 n in the second equality. Since 2 this yields 2) cn 2 bn 2 cn 2 bn 2 ) + o p ) o p ) /2 ) b 2 n + o p) = log 2 n + /2) log 3 n /2) log4π)+ o) = r n + o), P ns n s) r n x ) exp 4exp x)), and completes the proof of Theorem 3.. Note that the centering cn 2/2b2 n ) emerges naturally in the course of this proof. This completes the proof for the case s [, ). For s 2, there are two additional terms that enter, and both of these are o p bn 2 ) from the arguments in the previous section. The case s = 2 is easy since in this case v v)d s u, v) = forallu, while the result was stated for the case s = by Berk and Jones [4] and proved in Wellner and Koltchinskii [43]. Wellner and Koltchinskii [43] incorrectly claim page 324) that KF n x), x) = 0 if x<x ) ; in fact, the remum over this region is stochastically bounded and, hence, can be neglected.) PROOF OF THEOREM 3.2. The fact that nt n 2) = A 2 n /2 d A 2 /2 is classical; see Shorack and Wellner [38], page 48. That nt n ) d A 2 /2 was noted by Einmahl and McKeague [8] and proved by Wellner and Koltchinskii [43]. The proof for s, 2 proceeds along the same lines as the proof in Wellner and Koltchinskii [43] for the case s =, and hence will not be given here. For details, see Jager [23] or Jager and Wellner [26].

21 2038 L. JAGER AND J. A. WELLNER 7.2. Proofs for Section 4. PROOF OF PROPOSITION 4.. We first prove the claim for the unrestricted version of the statistics Sn ur s) defined by Sur n s) 0<x< K sf n x), x), and then show that the difference between S n s) and Sn ur s) is negligible. Now for s 0, ) and C s /s s)),wehave S ur n s) S s, F ) C s 0<x< { F n x) s x s F n x) ) s x) s } { Fx) s x s Fx) ) s x) s } C s { x Fn x) s Fx) s) x s + x { Fn x) ) s Fx) ) s } x s C s { F n x) s Fx) s + Fn x) ) s ) s Fx) x x a.s. 0. Thus, the proposition will be proved if we show that 22) Sn ur s) S ns) a.s. 0. Now write Sn 0s) = max{r n,m n,l n },where and M n K s F n x), x) = S n s), X ) x<x n) L n K s F n x), x) x<x ) R n x Xn) K s F n x), x). Note that 0, if M n L n R n, Sn 0 s) S ns) = max{l n,m n,r n } M n = L n M n, if L n >M n R n, R n M n, if R n >M n L n. Now set α 0 α 0 F ) = {x : Fx)= 0} 0, α α F ) = inf{x : Fx)= }. } }

22 Note that and, on the other hand, Similarly, while GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2039 { ) s } L n = K s F n x), x) = X) x<x ) s s) a.s. s s) { α 0) s } l 0 s, F ), ) ) ) M n = K s F n x), x) K s Fn X),X) = Ks /n,x) X ) x<x n) = { /n) s X) s /n) s ) s } X ) L 0 s s) n a.s. s s) { α 0) s }=l 0 s, F ). R n = x X n) K s F n x), x) = s s) a.s. { α s } r 0 s, F ), s s) { X s} n) M n = K s F n x), x) K s Fn Xn) ) ) ),X n) = Ks /n,xn) X ) x<x n) = { /n) s Xn) s /n) s ) s } X n) R 0 s s) n a.s. { α s }=r 0 s, F ). s s) By combining these pieces, it follows that 0 Sn 0 s) S ns) = max{l n,m n,r n } M n 0, if M n L n R n, 0, if M n L n R n, = L n M n, if L n >M n R n, L n L 0 n, if L n >L 0 n R n, R n M n, if R n >M n L n R n Rn 0, if R n >Rn 0 L n a.s. 0. This shows that 22) holds and completes the proof. PROOF OF PROPOSITION 4.2. Recall that when s = 2 such a condition follows from Theorem 3 in Jager and Wellner [25]: taking b = /2 and applying

23 2040 L. JAGER AND J. A. WELLNER continuous mapping, we conclude that S n 2) a.s. K 2 F x), x) if and only if E{[X X)] /2 } <. 0<x< Similarly, for <s<, { S n s) = Fn x) s x s + F n x) ) s x) s } 0<x< ss ) a.s. { Fx) s x s + Fx) ) s x) s } 0<x< ss ) if and only if F n x) s x s Fx) s x s a.s. 0 and F n x) ) s x) s Fx) ) s x) s a.s. 0, if and only if ) Fn x) s ) Fx) s a.s. x s )/s x s )/s 0 and ) Fn x) s ) Fx) s a.s. x) s )/s x) s )/s 0, if and only if ) Gn F x)) s ) Fx) s a.s. x s )/s x s )/s 0 and ) Gn F x)) s ) Fx) s a.s. x) s )/s x) s )/s 0. Since gu) = u s is uniformly continuous on bounded sets, these last two convergences occur if and only G n F x)) x s )/s Fx) x s )/s a.s. 0 and G n F x)) Fx) x) s )/s x) s )/s a.s. 0. These, in turn, hold if and only if G n u) u F u) s )/s a.s. G 0 and n u) u) F u)) s )/s a.s. 0.

24 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 204 But in view of Wellner [4], these convergences hold if and only if F satisfies 0 F u) F du <. u)))s )/s By the inverse) probability integral transformation, the convergence in the last display is equivalent to E[X X)] s)/s <. This completes the proof of the claimed equivalences. PROOF OF PROPOSITION 4.3. For s =, this follows from Berk and Jones [5], pages Thus, it suffices to prove the claimed convergences for s>ands<0. For s>, fix α 0, ). We begin by breaking the remum over 0, ) into the regions 0 <F s x) < n α, n α F s x) n α and n α <F s x) < : { S n s) = Fn x) s x s + F n x) ) x x) s } 0<x< ss ) { = Fn x) s x s + F n x) ) x x) s } x:0<f s x)<n α ss ) { Fn x) s x s + F n x) ) x x) s } x:n α <F s x)< n α ss ) x: n α <F s x)< I n s) II n s) III n s). { Fn x) s x s + F n x) ) x x) s } ss ) For the main term, I n s), letg n be the empirical d.f. of n i.i.d. Uniform0, ) d random variables and use F n = Gn F s ) to write { ) ss )I n s) = d Gn F s x)) s F s x) s x s 0<F s x)<n α F s x) ) Gn F s x)) s + Fs x) ) s x) s }, F s x) where ) Gn F s x)) s ) Gn t) s ) d Nt) s =, x:f s x)<n α F s x) 0<t<n α t t>0 t F s x) s x s = uniformly in x [0,n α ], while G n F s x)) F s x) x:f s x)<n α x s + x s )/s ) s 0 a.s.

25 2042 L. JAGER AND J. A. WELLNER and Fs x) ) s x) s 0. x:f s x)<n α Combining these last five displays shows that I n s) d s t>0 Nt)/t) s ; note that the limit variable is /s almost surely. To handle the term II n s), write { ) II n s) = d Gn F s x)) s F s x) s x s n α <F s x)< n α F s x) ) Gn F s x)) s + F s x) F s x) ) } s x) s ss ), where now the two terms involving the ratio of the empirical d.f. to the true d.f. F s converge almost surely to. Hence, we conclude that II n s) a.s. K s F s x), x) = 0<x< s, where the equality follows after some calculation. Finally, it is easily shown that III n s) a.s. 0. For s<0, fix α 0, ). We begin by breaking the remum over 0, ) into the regions X ) x<fs n α ), Fs n α ) x Fs n α ) and Fs n α )<x<x n) : { S n s) = Fn x) s x s + F n x) ) x x) s } X ) x<x n) ss ) { = Fn x) s x s + F n x) ) x x) s } x:x ) x<fs n α ss ) ) { Fn x) s x s x:fs n α ) x Fs n α ) x:fs n α )<x<x n) + F n x) ) x x) s } ss ) { Fn x) s x s + F n x) ) x x) s } ss ) I n s) II n s) III n s).

26 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2043 For the main term, I n s), letg n be the empirical d.f. of n i.i.d. Uniform0, ) d random variables and use F n = Gn F s ) to write { ) ss )I n s) = d Fs x) s F s x) s x s x:x ) x<fs n α G ) n F s x)) ) Fs x) s + G n F s x)) F s x) ) } s x) s, where ) Fs x) s ) t s ) d t s =, x:x ) x<fs n α G ) n F s x)) ξ ) t<n α G n t) t S Nt) F s x) s x s = x s s x s) )) s uniformly in x [0,F n α )], while F s x) x:f s x)<n α G n F s x)) a.s. 0 and Fs x) ) s x) s 0. x:f s x)<n α Combining these last five displays shows that I n s) d s) t S t/ Nt)) s ; note that the limit variable is / s) almost surely. To handle the term II n s), write { ) II n s) = d Fs x) s F s x) s x s G n F s x)) n α <F s x)< n α ) Fs x) s + G n F s x)) F s x) ) } s x) s ss ), where now the two terms involving the ratio of the empirical d.f. to the true d.f. F s converge almost surely to. Hence, we conclude that II n s) a.s. K s F s x), x) = 0<x< s, where the equality follows after some calculation. Finally, it is easily shown that III n s) a.s. 0.

27 2044 L. JAGER AND J. A. WELLNER To prove Theorem 4.4 and its corollary, we will use the following lemma from Chernoff [9]. LEMMA 7.. Let X,X 2,...be i.i.d. with continuous distribution F 0. a) If t<f 0 x), then n log PF n x) t) K t, F 0 x)). b) If t>f 0 x), then n log PF n x) t) K + t, F 0 x)). In both cases, the convergence is from below. PROOF. This follows from Theorem of Chernoff [9]. PROOF OF THEOREM 4.4. We first prove the theorem for Sn ur,+ s). Since K s + t, x) is continuous in t and strictly increasing on x, ), then for 0 <a< x s )/[s s)], there is a unique τ = τx) in x, ) for which K s + τ, x) = a and {t : K s + t, x) a}=[τ, ).Ifa x s )/[s s)],thenτ = necessarily. For any fixed x 0, ),wehave by Lemma 7.. So n log P S ur,+ n s) a ) n log P K + s F nx), x) a ) = n log P F n x) τ ) K + τ, x) 23) lim inf n n log P Sn ur,+ s) a ) inf 0<x< K+ τx), x). Now consider the reverse inequality. Let x i denote the remum for X i) x<x i+).sincef n x) = i/n on this range, we have K s + F nx), x) = K + ) s i/n,xi) K + ) s i/n,xi+) = K + ) s i/n,xi). x i Note that for x<x ),wehavef n x) = 0, and so K s + F nx), x) = 0also.So we can write Sn ur,+ s) = max i n {K s + i/n, X i))} =max i n {K s + F nx i) ), X i) )}. Now, using monotonicity of τ, n log P Sn ur,+ s) a ) = ) n log P { max K + ) )} s Fn Xi),Xi) a i n n n log P K s + ) ) ) Fn Xi),Xi) a i= = n n log P ) )) F n Xi) τ Xi) i=

28 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2045 = n n log P i/n τ )) n X i) = n log P τ ) i/n) X i) i= i= n n log P F n τ )) i/n)) F n Xi) i= = n n log P F n τ i/n)) i/n ) i= n log n e nk+ i/n,τ i/n)) i= [by Lemma 7.b)] n n log e n min i n K + i/n,τ i/n)) n n log i= e n inf 0<x< K + x,τ x)) i= = n n log e n inf 0<x< K + τx),x) = n log[ ne n inf 0<x< K + τx),x) ] i= = inf 0<x< K+ τx), x) + log n n. So we conclude that 24) lim n n log P S ur,+ n s) a ) inf 0<x< K+ τx), x). Combining this last display with 23) yields the convergence part of 7). To prove the explicit formula for g s +, note that K+ τ + x), x) is a decreasing function of x until x =[ s s)a] / s),wherek + τ + x), x) =. Thus, inf 0<x< K+ τ + x), x) = lim K + τ + [ s s)a] / s) ɛ ), [ s s)a] / s) ɛ ) ɛ 0 = log[ s s)a]/ s), so the given formula for the infimum in 7) holds. This completes the proof for Sn ur,+ s). The proof for Sn ur, s) is analogous using Lemma 7.a) Proofs for Section 5. The following lemma extends Lemma A.4, page 988, Donoho and Jin [5]. LEMMA 7.2. i) For 0 <v u /2, and s 2, 25) K s + u, v) u v) 2 2 v v) K 2u, v).

29 2046 L. JAGER AND J. A. WELLNER ii) Let <s 2 and v = vu) satisfy 0 <v u<. Then, as u 0, K 2 u, v) [ + O v/u) 2 s) O v)/ u) ) 2 s )], K s u, v) = if u/v, vφ s u/v) + o) ) = u{u/v) s s} + o) ) / ss ) ), if u/v. iii) Let s = and v = vu) satisfy 0 <v u<. Then, as u 0, { K2 u, v) [ + O u + u/v) )], if u/v, K u, v) = u logu/v) + o) ), if u/v. iv) Let s [, ) \{0} and v = vu) satisfy 0 <v u<. Then, as u 0, K 2 u, v) [ + O v/u) 2 s) O v)/ u) ) 2 s )], K s u, v) = if u/v, s u + o) ), if u/v. v) Let s = 0 and v = vu) satisfy 0 <v u<. Then, as u 0, K 2 u, v) [ + O v/u) 2) O v)/ u) ) 2 )], K 0 u, v) = if u/v, u + o) ), if u/v. REMARK. Note that for <s 2, as u 0andu/v, { vφ s u/v) = v s) + s u ) u s } v v s s) ) u u s = + s ) ss ){ v } v u s ) u u s } ) s + o), ss ){ v where the right-hand side converges to u logu/v) + o)) as s. PROOF OF LEMMA 7.2. i) Letting u = tv, it suffices to show that for 0 < v /2 and t /2v), K s tv,v) t ) 2 2 v v or, equivalently, since K s tv,v) = vφ s t) + v)φ s tv)/ t)), ) φ s t) + v φ s tv v ) t ) 2 2 v.

30 Let GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2047 ) f s t) φ s t) + v φ s tv v ) t ) 2 2 v. Now by direct calculation, f s ) = 0, and ) ) tv v f s t) = φ s )φ t) + v s t v v v ) tv = φ s t) φ s t v v, so that f s ) = 0. Furthermore, f s t) = φ s t) + = { v = v v v v φ s v)φ s ) tv v v tv t) + vφ s { ) 2 s v) + v t { v) t + v v tv ) } v ) v 2 s } tv )) 2 s } since x 2 s is concave for s 2) = { ) v 2 s } 0 v t tv) using the fact that v v) vt vt) for 0 v vt /2 implies v)/t tv)). Here we have used φ s s sxs x) = s s) = xs s, φ s x) = xs 2. Since t /2v), it follows that v vt /2 <, v vt /2 > 0 and vt)/ v) /2 v)). Whens<, we calculate f s t) = φ s t) v v ) 2 φ s ) tv v and note that f s )<0, while f s t) = 0 has a unique root, so to show f s t) 0, it suffices to show f s /2v)) 0for0 v /2. By a straightforward calculation, we get f s /2v)) = 2v)2 s + v v ) v 2 s /2 v,

31 2048 L. JAGER AND J. A. WELLNER which is 0for0 v /2 ifs. This shows that f s t) 0 in the range s<, and completes the proof of i). ii) By expanding K s u, v) as a function of u as in 0), K s u, v) = K 2 u, v){ + v v)d s u,v) } with u v u v ; since0<v u, we necessarily have 0 <v u u. Here ) v 2 s } { ) v 2 s v v)d s u,v) = v){ u + v u } I + II. Now 0 <v u u implies u /v u/v, sov/u v/u and v u u implies v)/ u) v)/ u ). Thus, I 0and II 0. It follows that { ) v 2 s v v)d s u,v) v }. u Similarly, { ) v 2 s v v)d s u,v) v) } u in this range, and the claimed bound in the first part of ii) follows. To prove the second part of ii), note that when u/v, we can write K s u, v) vφ s u/v) = u/v)s v u)/ v)) s v) v{ s + su/v) u/v) s } where, for <s 2, and = u/v)s v + u)/ v)) s v) u/v) s v v s) su = +[ u)/ v))s v) ]/[u/v) s v] [v s) + su]/[u/v) s v] Bu,v) = + Au, v) Bu,v), v s) + su u/v) s v Au, v) = u)/ v))s v) u/v) s v = s u/v) s + sv/u)s = o) = u)/ v))s [ v) ]+ u)/ v)) s u/v) s v

32 GOODNESS-OF-FIT VIA PHI-DIVERGENCES 2049 u)/ v))s = u/v) s + u)/ v))s u/v) s v su + sv = o) + u/v) s + o) v = o) + s u/v) u/v) s = o) sv/u) s = o). Thus, the second part of ii) holds. The first part of iv) is proved exactly as in ii). To prove the second part of iv), we write { K s u, v) = u/v) s v u)/ v) ) s } v) s s) = s s) = = s s) { ) u s ) v + v) v { ) u s v s ) u + v u) + ) u s )} v } v)su v) + ou) + ov)) sv + ov) { ) ) su v/u) + o) uv/u) s { v/u) s } } s s) = s u + o) ). v) The proof of v) is similar to the proof of iv). LEMMA 7.3. Suppose that X,...,X n are i.i.d. F n with 0 <ρ β) < r < β/3. Then r</4and for any 0 <r 0 <r, F n x) p 26) x 0. n 4r <x<n 4r 0 PROOF. Note that F n ) = d G n F n )),whereg n is the empirical d.f. of n i.i.d. U0, ) random variables ξ,...,ξ n and { F n x) = x + ɛ n x) )} x) µ n x. Thus, with b a a t b ft), F n x) x 27) n 4r <x<n 4r 0 = Fn x) x ) + n 4r 0 n 4r F nx) x ) + n 4r 0 n 4r

33 2050 L. JAGER AND J. A. WELLNER ) d Gn F = n x)) + n 4r 0 x G ) nf n x)) + n 4r 0 28). n 4r x n 4r The second term in this last display converges to 0 in probability easily since F n x)/x implies that it is bounded by G ) nf n x)) + F n x) G ) + nt) p 0 n 4r t n 4r by Theorem 0 of Wellner [42]. On the other hand, G n F n x)) = G nf n x)) F n x) x F n x) x ) ) Gn F n x)) Fn x) Fn x) = +, F n x) x x so again by Theorem 0 of Wellner [42], the first term of 27) convergesto0in probability if ) lim F n x)/x Fn x) n n 4r < and 0. n 4r <x<n 4r 0 x But this holds by a straightforward analysis using the asymptotics of when r<β/3. Now we have the tools in place to prove our extension of the results of Donoho and Jin [5]. PROOF OF THEOREM 5.. First consider <s<2. As in Donoho and Jin [5], we first consider the case r<β/3. Then r</4 and we can choose 0 < r 0 <r</4. From Lemma 7.3 the convergence 26) holds. Thus, by part ii) of Lemma 7.2, it follows that for n 4r <x<n 4r 0,wehave nk s + F nx), x) = Fn x) x) + ) 2 + op ) ), 2 x x) and hence, ns + n s) n 4r <x<n 4r 0 nk + s F nx), x) 2 HC 2 n,r,r 0 + op ) ). Thus, ns + n s) separates H 0 and H n) for s, 2) and r<β/3. Now pose that r> β) 2 and still <s<2). Since r + β)/2 r) <, we can pick a constant q< such that r + β) 2 r r< q<.

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis

A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis Jon A. Wellner and Vladimir Koltchinskii Abstract. Proofs are given of the limiting null distributions of the

More information

Accepted for publication in: Comm. Statist. Theory and Methods April 14, 2015 ASYMPTOTICS OF GOODNESS-OF-FIT TESTS BASED ON MINIMUM P-VALUE STATISTICS

Accepted for publication in: Comm. Statist. Theory and Methods April 14, 2015 ASYMPTOTICS OF GOODNESS-OF-FIT TESTS BASED ON MINIMUM P-VALUE STATISTICS Accepted for publication in: Comm. Statist. Theory and Methods April 14, 2015 ASYMPTOTICS OF GOODNESS-OF-FIT TESTS BASED ON MINIMUM P-VALUE STATISTICS Veronika Gontscharuk and Helmut Finner Institute for

More information

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES

STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES STATISTICAL INFERENCE UNDER ORDER RESTRICTIONS LIMIT THEORY FOR THE GRENANDER ESTIMATOR UNDER ALTERNATIVE HYPOTHESES By Jon A. Wellner University of Washington 1. Limit theory for the Grenander estimator.

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, 2011 R. M. Dudley 1 A. Dvoretzky, J. Kiefer, and J. Wolfowitz 1956 proved the Dvoretzky Kiefer Wolfowitz

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Asymptotic normality of the L k -error of the Grenander estimator

Asymptotic normality of the L k -error of the Grenander estimator Asymptotic normality of the L k -error of the Grenander estimator Vladimir N. Kulikov Hendrik P. Lopuhaä 1.7.24 Abstract: We investigate the limit behavior of the L k -distance between a decreasing density

More information

Estimation and Confidence Sets For Sparse Normal Mixtures

Estimation and Confidence Sets For Sparse Normal Mixtures Estimation and Confidence Sets For Sparse Normal Mixtures T. Tony Cai 1, Jiashun Jin 2 and Mark G. Low 1 Abstract For high dimensional statistical models, researchers have begun to focus on situations

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm. for Grenander s Estimator A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016 Based on joint work with: Lutz Dümbgen and Malcolm Wolff

More information

Analysis Qualifying Exam

Analysis Qualifying Exam Analysis Qualifying Exam Spring 2017 Problem 1: Let f be differentiable on R. Suppose that there exists M > 0 such that f(k) M for each integer k, and f (x) M for all x R. Show that f is bounded, i.e.,

More information

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x . Define f n, g n : [, ] R by f n (x) = Advanced Calculus Math 27B, Winter 25 Solutions: Final nx2 + n 2 x, g n(x) = n2 x 2 + n 2 x. 2 Show that the sequences (f n ), (g n ) converge pointwise on [, ],

More information

arxiv: v3 [stat.me] 2 Oct 2014

arxiv: v3 [stat.me] 2 Oct 2014 arxiv: 3.390 The Calibrated Kolmogorov-Smirnov Test arxiv:3.390v3 stat.me 2 Oct 204 Amit Moscovich-Eiger,* Boaz Nadler,** and Clifford Spiegelman 2, Department of Computer Science and Applied Mathematics,

More information

Logarithmic scaling of planar random walk s local times

Logarithmic scaling of planar random walk s local times Logarithmic scaling of planar random walk s local times Péter Nándori * and Zeyu Shen ** * Department of Mathematics, University of Maryland ** Courant Institute, New York University October 9, 2015 Abstract

More information

Continued Fraction Digit Averages and Maclaurin s Inequalities

Continued Fraction Digit Averages and Maclaurin s Inequalities Continued Fraction Digit Averages and Maclaurin s Inequalities Steven J. Miller, Williams College sjm1@williams.edu, Steven.Miller.MC.96@aya.yale.edu Joint with Francesco Cellarosi, Doug Hensley and Jake

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics

Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee 1 and Jon A. Wellner 2 1 Department of Statistics, Department of Statistics, 439, West

More information

Strong approximations for resample quantile processes and application to ROC methodology

Strong approximations for resample quantile processes and application to ROC methodology Strong approximations for resample quantile processes and application to ROC methodology Jiezhun Gu 1, Subhashis Ghosal 2 3 Abstract The receiver operating characteristic (ROC) curve is defined as true

More information

A Note on the Central Limit Theorem for a Class of Linear Systems 1

A Note on the Central Limit Theorem for a Class of Linear Systems 1 A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.

More information

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables Deli Li 1, Yongcheng Qi, and Andrew Rosalsky 3 1 Department of Mathematical Sciences, Lakehead University,

More information

11.8 Power Series. Recall the geometric series. (1) x n = 1+x+x 2 + +x n +

11.8 Power Series. Recall the geometric series. (1) x n = 1+x+x 2 + +x n + 11.8 1 11.8 Power Series Recall the geometric series (1) x n 1+x+x 2 + +x n + n As we saw in section 11.2, the series (1) diverges if the common ratio x > 1 and converges if x < 1. In fact, for all x (

More information

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Refining the Central Limit Theorem Approximation via Extreme Value Theory Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

The Codimension of the Zeros of a Stable Process in Random Scenery

The Codimension of the Zeros of a Stable Process in Random Scenery The Codimension of the Zeros of a Stable Process in Random Scenery Davar Khoshnevisan The University of Utah, Department of Mathematics Salt Lake City, UT 84105 0090, U.S.A. davar@math.utah.edu http://www.math.utah.edu/~davar

More information

Reflected Brownian Motion

Reflected Brownian Motion Chapter 6 Reflected Brownian Motion Often we encounter Diffusions in regions with boundary. If the process can reach the boundary from the interior in finite time with positive probability we need to decide

More information

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

BY JIAN LI AND DAVID SIEGMUND Stanford University

BY JIAN LI AND DAVID SIEGMUND Stanford University The Annals of Statistics 2015, Vol. 43, No. 3, 1323 1350 DOI: 10.1214/15-AOS1312 Institute of Mathematical Statistics, 2015 HIGHER CRITICISM: p-values AND CRITICISM BY JIAN LI AND DAVID SIEGMUND Stanford

More information

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

Self-normalized Cramér-Type Large Deviations for Independent Random Variables Self-normalized Cramér-Type Large Deviations for Independent Random Variables Qi-Man Shao National University of Singapore and University of Oregon qmshao@darkwing.uoregon.edu 1. Introduction Let X, X

More information

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES JONATHAN LUK These notes discuss theorems on the existence, uniqueness and extension of solutions for ODEs. None of these results are original. The proofs

More information

Goodness-of-Fit Testing with Empirical Copulas

Goodness-of-Fit Testing with Empirical Copulas with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven Department of Econometrics and Operations Research Tilburg University January 19, 2011 with Empirical Copulas Overview of Copulas with Empirical

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

Strong log-concavity is preserved by convolution

Strong log-concavity is preserved by convolution Strong log-concavity is preserved by convolution Jon A. Wellner Abstract. We review and formulate results concerning strong-log-concavity in both discrete and continuous settings. Although four different

More information

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx. Math 321 Final Examination April 1995 Notation used in this exam: N 1 π (1) S N (f,x) = f(t)e int dt e inx. 2π n= N π (2) C(X, R) is the space of bounded real-valued functions on the metric space X, equipped

More information

Hypothesis testing for Stochastic PDEs. Igor Cialenco

Hypothesis testing for Stochastic PDEs. Igor Cialenco Hypothesis testing for Stochastic PDEs Igor Cialenco Department of Applied Mathematics Illinois Institute of Technology igor@math.iit.edu Joint work with Liaosha Xu Research partially funded by NSF grants

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Exact goodness-of-fit tests for censored data

Exact goodness-of-fit tests for censored data Exact goodness-of-fit tests for censored data Aurea Grané Statistics Department. Universidad Carlos III de Madrid. Abstract The statistic introduced in Fortiana and Grané (23, Journal of the Royal Statistical

More information

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES

AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES Lithuanian Mathematical Journal, Vol. 4, No. 3, 00 AN INEQUALITY FOR TAIL PROBABILITIES OF MARTINGALES WITH BOUNDED DIFFERENCES V. Bentkus Vilnius Institute of Mathematics and Informatics, Akademijos 4,

More information

On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments

On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments UDC 519.2 On probabilities of large and moderate deviations for L-statistics: a survey of some recent developments N. V. Gribkova Department of Probability Theory and Mathematical Statistics, St.-Petersburg

More information

Research Article Some Monotonicity Properties of Gamma and q-gamma Functions

Research Article Some Monotonicity Properties of Gamma and q-gamma Functions International Scholarly Research Network ISRN Mathematical Analysis Volume 11, Article ID 375715, 15 pages doi:1.54/11/375715 Research Article Some Monotonicity Properties of Gamma and q-gamma Functions

More information

arxiv:math/ v1 [math.st] 5 Oct 2004

arxiv:math/ v1 [math.st] 5 Oct 2004 The Annals of Statistics 2004, Vol. 32, No. 3, 962 994 DOI: 0.24/009053604000000265 c Institute of Mathematical Statistics, 2004 arxiv:math/040072v [math.st] 5 Oct 2004 HIGHER CRITICISM FOR DETECTING SPARSE

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

On a class of stochastic differential equations in a financial network model

On a class of stochastic differential equations in a financial network model 1 On a class of stochastic differential equations in a financial network model Tomoyuki Ichiba Department of Statistics & Applied Probability, Center for Financial Mathematics and Actuarial Research, University

More information

First passage time for Brownian motion and piecewise linear boundaries

First passage time for Brownian motion and piecewise linear boundaries To appear in Methodology and Computing in Applied Probability, (2017) 19: 237-253. doi 10.1007/s11009-015-9475-2 First passage time for Brownian motion and piecewise linear boundaries Zhiyong Jin 1 and

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work

More information

arxiv:math/ v2 [math.st] 17 Jun 2008

arxiv:math/ v2 [math.st] 17 Jun 2008 The Annals of Statistics 2008, Vol. 36, No. 3, 1031 1063 DOI: 10.1214/009053607000000974 c Institute of Mathematical Statistics, 2008 arxiv:math/0609020v2 [math.st] 17 Jun 2008 CURRENT STATUS DATA WITH

More information

Stability of Stochastic Differential Equations

Stability of Stochastic Differential Equations Lyapunov stability theory for ODEs s Stability of Stochastic Differential Equations Part 1: Introduction Department of Mathematics and Statistics University of Strathclyde Glasgow, G1 1XH December 2010

More information

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 ) Electronic Journal of Differential Equations, Vol. 2006(2006), No. 5, pp. 6. ISSN: 072-669. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) A COUNTEREXAMPLE

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

arxiv: v1 [math.ap] 17 May 2017

arxiv: v1 [math.ap] 17 May 2017 ON A HOMOGENIZATION PROBLEM arxiv:1705.06174v1 [math.ap] 17 May 2017 J. BOURGAIN Abstract. We study a homogenization question for a stochastic divergence type operator. 1. Introduction and Statement Let

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

Asymptotic distribution of the sample average value-at-risk

Asymptotic distribution of the sample average value-at-risk Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample

More information

10 Transfer Matrix Models

10 Transfer Matrix Models MIT EECS 6.241 (FALL 26) LECTURE NOTES BY A. MEGRETSKI 1 Transfer Matrix Models So far, transfer matrices were introduced for finite order state space LTI models, in which case they serve as an important

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by . Sanov s Theorem Here we consider a sequence of i.i.d. random variables with values in some complete separable metric space X with a common distribution α. Then the sample distribution β n = n maps X

More information

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University

A Brief Analysis of Central Limit Theorem. SIAM Chapter Florida State University 1 / 36 A Brief Analysis of Central Limit Theorem Omid Khanmohamadi (okhanmoh@math.fsu.edu) Diego Hernán Díaz Martínez (ddiazmar@math.fsu.edu) Tony Wills (twills@math.fsu.edu) Kouadio David Yao (kyao@math.fsu.edu)

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

1 Delayed Renewal Processes: Exploiting Laplace Transforms

1 Delayed Renewal Processes: Exploiting Laplace Transforms IEOR 6711: Stochastic Models I Professor Whitt, Tuesday, October 22, 213 Renewal Theory: Proof of Blackwell s theorem 1 Delayed Renewal Processes: Exploiting Laplace Transforms The proof of Blackwell s

More information

1 Review of di erential calculus

1 Review of di erential calculus Review of di erential calculus This chapter presents the main elements of di erential calculus needed in probability theory. Often, students taking a course on probability theory have problems with concepts

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

A remark on the maximum eigenvalue for circulant matrices

A remark on the maximum eigenvalue for circulant matrices IMS Collections High Dimensional Probability V: The Luminy Volume Vol 5 (009 79 84 c Institute of Mathematical Statistics, 009 DOI: 04/09-IMSCOLL5 A remark on the imum eigenvalue for circulant matrices

More information

On the mean connected induced subgraph order of cographs

On the mean connected induced subgraph order of cographs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,

More information

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models

1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models Draft: February 17, 1998 1 Local Asymptotic Normality of Ranks and Covariates in Transformation Models P.J. Bickel 1 and Y. Ritov 2 1.1 Introduction Le Cam and Yang (1988) addressed broadly the following

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

OPTIMAL STOPPING OF A BROWNIAN BRIDGE

OPTIMAL STOPPING OF A BROWNIAN BRIDGE OPTIMAL STOPPING OF A BROWNIAN BRIDGE ERIK EKSTRÖM AND HENRIK WANNTORP Abstract. We study several optimal stopping problems in which the gains process is a Brownian bridge or a functional of a Brownian

More information

A generalization of Strassen s functional LIL

A generalization of Strassen s functional LIL A generalization of Strassen s functional LIL Uwe Einmahl Departement Wiskunde Vrije Universiteit Brussel Pleinlaan 2 B-1050 Brussel, Belgium E-mail: ueinmahl@vub.ac.be Abstract Let X 1, X 2,... be a sequence

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Brownian Motion and Stochastic Calculus

Brownian Motion and Stochastic Calculus ETHZ, Spring 17 D-MATH Prof Dr Martin Larsson Coordinator A Sepúlveda Brownian Motion and Stochastic Calculus Exercise sheet 6 Please hand in your solutions during exercise class or in your assistant s

More information

Point Process Control

Point Process Control Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued

More information

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 200, Article ID 20956, 8 pages doi:0.55/200/20956 Research Article Strong Convergence Bound of the Pareto Index Estimator

More information

Defining the Integral

Defining the Integral Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be

More information

Convergence of Multivariate Quantile Surfaces

Convergence of Multivariate Quantile Surfaces Convergence of Multivariate Quantile Surfaces Adil Ahidar Institut de Mathématiques de Toulouse - CERFACS August 30, 2013 Adil Ahidar (Institut de Mathématiques de Toulouse Convergence - CERFACS) of Multivariate

More information

Regularity of the density for the stochastic heat equation

Regularity of the density for the stochastic heat equation Regularity of the density for the stochastic heat equation Carl Mueller 1 Department of Mathematics University of Rochester Rochester, NY 15627 USA email: cmlr@math.rochester.edu David Nualart 2 Department

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Chapter 9: Basic of Hypercontractivity

Chapter 9: Basic of Hypercontractivity Analysis of Boolean Functions Prof. Ryan O Donnell Chapter 9: Basic of Hypercontractivity Date: May 6, 2017 Student: Chi-Ning Chou Index Problem Progress 1 Exercise 9.3 (Tightness of Bonami Lemma) 2/2

More information

arxiv:math/ v1 [math.st] 11 Feb 2006

arxiv:math/ v1 [math.st] 11 Feb 2006 The Annals of Statistics 25, Vol. 33, No. 5, 2228 2255 DOI: 1.1214/95365462 c Institute of Mathematical Statistics, 25 arxiv:math/62244v1 [math.st] 11 Feb 26 ASYMPTOTIC NORMALITY OF THE L K -ERROR OF THE

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Partial Differential Equations

Partial Differential Equations Part II Partial Differential Equations Year 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2015 Paper 4, Section II 29E Partial Differential Equations 72 (a) Show that the Cauchy problem for u(x,

More information

UPPER DEVIATIONS FOR SPLIT TIMES OF BRANCHING PROCESSES

UPPER DEVIATIONS FOR SPLIT TIMES OF BRANCHING PROCESSES Applied Probability Trust 7 May 22 UPPER DEVIATIONS FOR SPLIT TIMES OF BRANCHING PROCESSES HAMED AMINI, AND MARC LELARGE, ENS-INRIA Abstract Upper deviation results are obtained for the split time of a

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information