Poisson Approximation for Two Scan Statistics with Rates of Convergence

Size: px

Start display at page:

Download "Poisson Approximation for Two Scan Statistics with Rates of Convergence"

Warren Charles
5 years ago
Views:

1 Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work with David Siegmund) National University of Singapore May 28, 2015

2 Outline The first scan statistic The second scan statistic Other scan statistics

3 A statistical testing problem Let {X 1,..., X n } be an independent sequence of random variables. We want to test the hypothesis against the alternative H 0 : X 1,..., X n F θ0 ( ) H 1 : for some i < j, X i+1,..., X j F θ1 ( ) X 1,..., X i, X j+1,..., X n F θ0 ( ) i and j are called change-points. They are not specified in the alternative hypothesis. θ 0 may be given, or may need to be estimated. θ 1 may be given, or may be a nuisance parameter.

4 The first scan statistic If j i = t is given and F θ0 ( ) and F θ1 ( ) have different mean values, a natural statistic is M n;t = max T i, T i = X i + + X i+t 1. 1 i n t 1 We are interested in its p-value: Assume X 1,..., X n F θ0 ( ), P(M n;t b) = P( max T i b) 1 i n t+1 =?

5 Known results Let Y i = I (T i b). {max 1 i n t+1 T i b} = { n t+1 i=1 Y i 1}. Dembo and Karlin (1992) proved that if t is fixed and b, n plus mild conditions on F θ0 ( ), then n t+1 P(M n;t b) = P( Y i 1) 1 e λ where λ = (n t + 1)E(Y 1 ). i=1 Mild conditions on F θ0 ( ) ensures that P(Y i+1 = 1 Y i = 1) 0.

6 t : If X i Bernoulli(p) and b is an integer, Arratia, Gordon and Waterman (1990) prove that P(M n;t b) (1 e λ ) C(e ct + t )(λ 1) (1) n where λ = (n t + 1)P(T 1 = b)( b t p). Haiman (2007) derived more accurate approximations using the distribution function of Z k := max{t 1,..., T kt+1 } for k = 1 and 2. The distribution functions of Z k for k = 1 and 2 are only known for Bernoulli and Poisson random variables. Our objective is to extend (1) to other random variables.

7 Preparation for the main result: Let µ 0 = E(X 1 ). We assume b = at where a > µ 0. P( max T i b) = P( 1 i n t+1 max 1 i n t+1 X i + + X i+t 1 t a). We assume the distribution of X 1 can be imbedded in an exponential family of distributions df θ (x) = e θx Ψ(θ) df (x), θ Θ. (2) It is known that F θ has mean Ψ (θ) and variance Ψ (θ). Assume θ 0 = 0, i.e., X 1 F and there exists θ a Θ o such that Ψ (θ a ) = a. Example: X 1 N(0, 1), Ψ(θ) = θ2 2, θ a = a, F θa N(a, 1).

8 Assumption (2) is used in two places: 1 To obtain an accurate approximation to the marginal probability P(T 1 at) by change of measure. 2 Local limit theorem Diaconis and Freedman (1988): d TV (L(X 1,..., X m T 1 = at), L(X a 1,..., X a m)) Cm t where X a 1,..., X a m are i.i.d. and X a 1 F θ a. Let D k = k i=1 (X a i X i ). Let σ 2 a = Ψ (θ a ).

9 Theorem Under the assumption (2), for some constant C depending only on the exponential family (2), µ 0, and a, we have P(M n;t at) (1 e λ ) C( (log t)2 + t where if X 1 is nonlattice plus mild conditions, (log t log(n t)) )(λ 1), n t λ = (n t + 1)e [aθa Ψ(θa)]t θ a σ a (2πt) 1/2 exp[ k=1 1 k E(e θad+ k )], and if X 1 is integer-valued with span 1, λ = (n t + 1)e (aθa Ψ(θa))t e θa( at at) (1 e θa )σ a (2πt) 1/2 exp[ k=1 1 k E(e θad+ k )].

10 Remarks: We don t have an explicit expression for the constant C. The relative error 0 if t, n t. Let g(x) = Ee ixd 1 and ξ(x) = log{1/[1 g(x)]}. Woodroofe (1979) proved that for the nonlattice case, 1 k E(e θad+ k ) = log[(a µ0 )θ a ] 1 π k=1 + 1 π 0 0 θ 2 a[i ξ(x) π 2 ] x(θ 2 a + x 2 ) dx θ a {Rξ(x) + log[(a µ 0 )x]} θa 2 + x 2 dx where R and I denote real and imaginary parts. Tu and Siegmund (1999) proved that for the arithmetic case, 1 k E(e θad+ k ) = log(a µ0 ) k= π 2π 0 { ξ(x)e θ a ix 1 e θa ix + ξ(x) + log[(a µ 0)(1 e ix } )] 1 e ix dx.

11 Example 1: Normal distribution. n t a p 1 p

12 Example 2: Bernoulli distribution. n t µ 0 a p 1 p /

13 Sketch of proof: Let m = C(log t log(n t)). Let Let Y i = I (T i at,t i+1 < T i,..., T i+m < T i T i 1 < T i,..., T i m < T i ). W = n t+1 i=1 P(M n;t at) P(W 1). Y i, λ 1 = EW = (n t + 1)EY 1. From the Poisson approximation theorem of Arratia, Goldstein and Gordon (1990), we have P(W 1) (1 e λ 1 ) C( 1 t + 1 )(λ 1). n t

14 Approximating λ 1 by λ: EY 1 =P(T 1 at, T 2 < T 1,..., T 1+m < T 1 ; T 0 < T 1,..., T 1 m < T 1 ) P(T 1 at)p 2 (T 1 T 2 > 0,..., T 1 T 1+m > 0 T 1 at) Note that T 1 T 2 = X 1 X t+1 and that given T 1 at, X 1 F θa approximately and X t+1 F. Thus, {T 1 T 2 > 0} {D 1 > 0} where D 1 = X a 1 X 1. Similarly, {T 1 T k+1 > 0} {D k > 0}, D k = k Therefore, i=1 (X a EY 1 P(T 1 at)p 2 (D k > 0, k = 1, 2,... ). i X i ). Recall λ = (n t + 1)e [aθa Ψ(θa)]t θ a σ a (2πt) 1/2 exp[ k=1 1 k E(e θad+ k )].

15 Corollary Let {X 1,..., X n } be i.i.d. random variables with distribution function F that can be imbedded in an exponential family, as in (2). Let EX 1 = µ 0. Assume X 1 is integer-valued with span 1. Suppose a = sup{x : p x := P(X 1 = x) > 0} is finite. Let b = at. Then we have, with constants C and c depending only on p a, P(M n;t b) (1 e λ ) C(λ 1)e ct where λ = (n t)p t a(1 p a ) + p t a.

16 The second scan statistic Recall that we want to test against the alternative H 0 : X 1,..., X n F θ0 ( ) H 1 : for some i < j, X i+1,..., X j F θ1 ( ) X 1,..., X i, X j+1,..., X n F θ0 ( ) Now assume j i is not given, and F θ0 and F θ1 are from the same exponential family of distributions df θ (x) = e θx Ψ(θ) df (x), θ Θ. Then the log likelihood ratio statistic is j max (θ 1 θ 0 )(X k Ψ(θ 1) Ψ(θ 0 ) ). 0 i<j n θ 1 θ 0 k=i+1

17 It reduces to the following problem: Let {X 1,..., X n } be independent, identically distributed random variables. Let EX 1 = µ 0 < 0. Let S 0 = 0 and S i = i j=1 X j for 1 i n. We are interested in the distribution of M n := max (S j S i ). 0 i<j n Iglehart (1972) observed that it can be interpreted as the maximum waiting time of the first n customers in a single server queue. Karlin, Dembo and Kawabata (1990) discussed genomic applications.

18 The limiting distribution was derived by Iglehart (1972): Assume the distribution of X 1 can be imbedded in an exponential family of distributions df θ (x) = e θx Ψ(θ) df (x), θ Θ. Assume EX 1 = Ψ (0) = µ 0 < 0 and there exists a positive θ 1 Θ such that Ψ (θ 1 ) = µ 1, Ψ(θ 1 ) = 0. When X 1 is nonlattice, we have lim n P(M n log n θ 1 + x) = 1 exp( K e θ 1x ).

19 Theorem Let h(b) > 0 be any function such that h(b), h(b) = O(b 1/2 ) as b. Suppose n b/µ 1 > b 1/2 h(b). We have, P(M n b) (1 e λ ) { } ( Cλ 1 + b/h2 (b) ) e ch2 (b) + b1/2 h(b) n b/µ 1 n b µ 1 where if X 1 is nonlattice plus mild conditions, λ = (n b µ 1 ) e θ 1b θ 1 µ 1 exp( 2 k=1 1 k E θ 1 e θ 1S + k ), and if X 1 is integer-valued with span 1 and b is an integer, λ = (n b e θ 1b 1 ) µ 1 (1 e θ exp( 2 1 )µ1 k E θ 1 e θ 1S + k ). k=1

20 Remarks: By choosing h(b) = b 1/2, we get P(M n b) (1 e λ ) Cλ{e cb + b n } By choosing h(b) = C(log b) 1/2 with large enough C, we can see that the relative error in the Poisson approximation goes to zero under the conditions b, (b log b) 1/2 n b/µ 1 = O(e θ 1b ), where n b/µ 1 = O(e θ 1b ) ensures that λ is bounded. For the smaller range (in which case λ 0) b, δb n b/µ 1 = o(e 1 2 θ 1b ) for some δ > 0, Siegmund (1988) obtained more accurate estimates by a technique different from ours.

21 Let G(z) = 0 p kz k + 1 q kz k, and let z 0 denote the unique root > 1 of G(z) = 1. For the case p k = 0 for k > 1, using the notation Q(z) = k q kz k, one can show for large values of n and b that λ nz0 b {[Q(1) Q(z 1 0 )] (1 z 1 0 )z 1 0 Q (z0 1 )}. For the case q k = 0 for k > 1, λ nz0 b (1 z 1 0 )) G (1) 2 /G (z 0 ). In particular if q 1 = q and p 1 = p, where p + q = 1, both these results specialize to λ n(p/q) b (q p) 2 /q.

22 Sketch of proof (for the case h(b) = b 1/2 ): Recall S i = ik=1 X k. Define T b := inf{n 1 : S n / [0, b)}. For a positive integer m, let ω + m be the m-shifted sample path of ω := {X 1,..., X n }. Let t = b µ 1 + b and m = cb such that m < t. For 1 i n t, let Y i = I ( S i < S i j, 1 j m; T b (ω + i ) t, S Tb (ω + i ) b ). That is, Y i is the indicator of the event that the sequence {S 1,... S n } reaches a local minimum at i and the i-shifted sequence {S i (ω + α )} exits the interval [0, b) within time t and the first exiting position is b. Let W = n t i=1 Y i.

23 Sketch of proof (cont.) P(M n b) P(W 1). P(W 1) (1 e λ 1 ) Cλe cb. λ 1 = (n t)ey 1 (n t)p(τ 0 = )P(S Tb b) where τ 0 := inf{n 1 : S n 0}. λ 1 λ.

24 Other statistics Recall again that we want to test H 0 : X 1,..., X n F θ0 ( ) against the alternative H 1 : for some i < j, X i+1,..., X j F θ1 ( ) X 1,..., X i, X j+1,..., X n F θ0 ( ) 1. If θ 0 is not given, we need to consider P(M n;t b S n ) and P(M n b S n ).

25 2. If θ 0 is given but θ 1 is a nuisance parameter, then the log likelihood ratio statistic is max max[θ(s j S i ) (j i)ψ(θ)]. 0 i<j n For normal distribution, it reduces to θ (S j S i ) 2 max 0 i<j n 2(j i). The limit of is only know for normal distribution and for n b 2 [Siegmund and Venkatraman (1995)].

26 3. Frick, Munk and Sieling (2014) proposed the following multiscale statistic: { Sj S i max 2 log ( n ) }. 0 i<j n j i j i The penalty term 2 log(n/(j i)) was first studied in Dümbgen and Spokoiny (2001) and motivated by Lévy s modulus of continuity theorem.

27 4. Let X 1,..., X m be an independent sequence of Gaussian random variables with mean EX i = µ i and variance 1. We are interested in testing the null hypothesis H 0 : µ 1 = = µ m against the alternative hypothesis that there exist 1 τ 1 < < τ K m 1 such that H 1 :µ 1 =... µ τ1 µ τ1 +1 = = µ τ2 = µ τk µ τk +1 = = µ m.

28 4. (cont.) If K = 1, the log likelihood ratio statistic is S m S t m t St t max. 1 t m 1 1 t + 1 m t If K > 1, an appropriate statistic is max 0 i<j<k m { S j S i j i S k S j k j 1 j i + 1 k j }.

29 Thank you!

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments