Scan Statistics: Theory and Applications

Size: px

Start display at page:

Download "Scan Statistics: Theory and Applications"

Leslie Gilmore
5 years ago
Views:

1 Scan Statistics: Theory and Applications Alexandru Am rioarei Laboratoire de Mathématiques Paul Painlevé Département de Probabilités et Statistique Université de Lille 1, INRIA Modal Team Seminaire de Probabilité et Statistique Laboratoire de Mathématiques Paul Painlevé 5 March, 2014, Lille A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

2 Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

3 Introduction Example Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

4 Introduction Example Epidemiology example 1 year JFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASOND Observation of disease cases over time: N = 19 cases over a period of T = 5 years Observation The epidemiologist notes a one year period (from April 93 - through April 94) with 8 cases: 42%! Question Given 19 cases over 5 years, how unusual is it to have a 1 year period containing as many as 8 cases? A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

5 Introduction Example The answer: A First approach A First approach: X = the number of cases falling in [April 93, April 94] X Bin(19, 0.2) P(X 8) = Conclusion: atypical situation! But: it is not the answer to our question: the one year period is not xed but identied after the scanning process! A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

6 Introduction Example The answer: Correct approach The scan statistics: S = the maximum number of cases over any continuous one year period in [0, T ] Thus, P(S 8) = gives the answer to the epidemiologist question. Conclusion: no unusual situation! A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

7 One Dimensional Scan Statistics Denitions and Notations Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

8 One Dimensional Scan Statistics Denitions and Notations Introducing the Scan Statistics Model Let T 1 be a positive integer and X 1, X 2,..., X T1 a sequence of i.i.d. r.v.'s. For m 1 T 1 integer, consider the moving sums Y i1 = i 1 +m 1 1 j=i 1 The discrete one dimensional scan statistics Problem S m1 (T 1 ) = X j max Y i 1. 1 i 1 T 1 m 1 +1 Approximate the distribution of one dimensional scan statistic P (S m1 (T 1 ) k). Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

9 One Dimensional Scan Statistics Denitions and Notations Related Statistics Let X 1,..., X T1 be a sequence of i.i.d. 0 1 Bernoulli of parameter p W m1,k - the waiting time until we rst observe at least k successes in a window of size m 1 P ( W m1,k T 1 ) = P (Sm1 (T 1 ) k) D T1 (k) - the length of the smallest window that contains at least k successes L T1 P (D T1 (k) m 1 ) = P (S m1 (T 1 ) k) - the length of the longest success run P (L T1 m 1 ) = P (S m1 (T 1 ) m 1 ) = P (S m1 (T 1 ) = m 1 ) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

10 One Dimensional Scan Statistics Denitions and Notations Literature A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

11 One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

12 One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Approach Fu (2001) applied the Markov Chain Imbedding Technique to nd the distribution of binary scan statistics. Main Idea Express the distribution of the S m1 (T 1 ) in terms of the waiting time distribution of a special compound pattern dene for 0 k m 1 m1 {}}{ F m 1,k = {Λ i Λ 1 = , Λ }{{} 2 = ,..., Λ }{{} l = } }{{} k k 1 k 1 Fm 1,k the compound pattern Λ = l i=1 Λ i, Λ i F m 1,k m1 k1 ( k 2 + j ) = j j=0 P(S m 1 (T 1) < k) = P(W (Λ) T 1 + 1). P(S m 1 (T 1) < k) = ξn T 1 1, where ξ = (1, 0,..., 0) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

13 One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Example Consider the i.i.d. two-state sequence (X i ) i {1,2,...,T1 } with p = P(X 1 = 1) and q = P(X 1 = 0). A realisation for T 1 = For k = 3 and m 1 = 4 F 4,3 = {Λ 1 = 111, Λ 2 = 1011, Λ 3 = 1101} The state space the principal matrix: N = 0 q p q p q p q p q q q Ω = {, 0, 1, 10, 11, 101, 110, α 1, α 2, α 3 } A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

14 One Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

15 One Dimensional Scan Statistics Naus Product Type Approximation Considering that T 1 = L 1 m 1, Naus (1982) gave the following product type approximation ( ) T 1 Q3m 1 P (S m1 (T 1 ) k) Q 2 m 1 2m1 Q 2m, 1 where Q 2m1 = P (S m1 (2m 1 ) k) and Q 3m1 = P (S m1 (3m 1 ) k). Idea of proof: P (S m 1 (T 1) k) = P(E 1 )P(E 2 E 1 ) P ( E L 1 where for j {1, 2,..., L 1}, { } E j = max Y i 1 k, (j 1)m1+1 i1 jm1 and use the Markov-like approximation (exchangeability) l 1 P E l E i P (E l E l 1 ) P (E 2 E 1 ) P (E i=1 1 ) L 2 i=1 E i ) = Q 3m1. Q 2m 1, A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

16 One Dimensional Scan Statistics Product Type Approximation for Bernoulli For Bernoulli r.v.'s, Naus (1982) derived exact formulas for Q 2m1 and Q 3m1 : Q 2m 1 = F 2 (k; m 1, p) kb(k + 1; m 1, p)f (k 1; m 1, p) + m 1 pb(k + 1; m 1, p)f (k 2, m 1 1), Q 3m 1 = F 3 (k; m 1, p) A 1 + A 2 + A 3 A 4, where A 1 = 2b(k + 1; m 1, p)f (k; m 1, p)[kf (k 1; m 1, p) m 1 pf (k 2; m 1 1, p)], A 2 = 0.5b 2 (k + 1; m 1, p) [k(k 1)F (k 2; m 1, p) 2(k 1)m 1 F (k 3; m 1 1, p) +m 1 (m 1 1)p 2 F (k 4; m 1 2, p) ], k A 3 = b(2(k + 1) r; m 1, p)f 2 (r 1; m 1, p), r=1 k A 4 = b(2(k + 1) r; m 1, p)b(r + 1; m 1, p)[rf (r 1; m 1, p) m 1 pf (r 2; m 1 1, p)]. r=2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

17 One Dimensional Scan Statistics Product Type Approximation for Binomial and Poisson If X i Bin(n, p) or X i Pois(λ), we have the approximation P (S m 1 (T 1) k) Q 2m 1 Q 2m 1 1 ( Q3m 1 Q 2m 1 ( Q2m 1 ) T 1 m 1 2, T 1 3m 1 Q 2m 1 1 ) T 1 2m1+1, T 1 2m 1 where Q 2m1 1, Q 2m1 and Q 3m1 are computed by Karwe and Naus (1996) recurrence. Karwe Naus algorithm for Q 2m 1 1 and Q2m 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

18 One Dimensional Scan Statistics Bounds Glaz and Naus (1991) developed a variety of tight bounds: Lower Bounds Q 2m 1 P (S m 1 (T 1) k) [ ], T 1 + Q2m 1 1 Q2m T 1 2m 1 1 2m1 1 Q 2m 1 1Q2m 1 Q 3m 1 [ 1 + Q 2m 1 1 Q2m 1 Q 3m 1 1 ] T 1 3m1, T 1 3m 1 Upper Bounds P (S m 1 (T 1) k) Q 2m 1 [1 Q 2m1 1 + Q 2m 1 ]T 1 2m1, T 1 2m 1 Q 3m 1 [1 Q 2m1 1 + Q 2m 1 ]T 1 3m1, T 1 3m 1 The values Q 2m1 1, Q 2m1, Q 3m1 1, Q 3m1 are computed using Karwe and Naus (1996) algorithm. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

19 One Dimensional Scan Statistics Haiman Type Approximation: Main Observation Haiman proposed in 2000 a dierent approach Main Observation The scan statistic r.v. can be viewed as a maximum of a sequence of 1-dependent stationary r.v.. The idea: discrete and continuous one dimensional scan statistic: Haiman (2000,2007) discrete and continuous two dimensional scan statistic: Haiman and Preda (2002,2006) discrete three dimensional scan statistic: Am rioarei and Preda (2013) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

20 One Dimensional Scan Statistics Haiman Type Approximation: Writing the Scan as an Extreme of 1-Dependent R.V.'s Let T 1 = L 1 m 1 positive integer Dene for j {1, 2,..., L 1 1} Z j = (Z j ) j is 1-dependent and stationary max (j 1)m 1 +1 i 1 jm 1 +1 Y i 1 {}}{ X 1, X 2,..., X m 1, X m m1+1,..., X 2m 1 }{{}, X2m1+1,..., X 3m 1, X 3m1+1,..., X 4m 1 }{{} Z1 Z2 Z3 Observe S m1 (T 1 ) = max 1 j L 1 1 Z j A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

21 One Dimensional Scan Statistics Haiman Type Approximation: Main Tool Let (Z j ) j 1 be a strictly stationary 1-dependent sequence of r.v.'s and let q m = q m (x) = P(max(Z 1,..., Z m ) x), with x < sup{u P(Z 1 u) < 1}. Theorem (Haiman 1999) For any x such that P(Z 1 > x) = 1 q and any integer m > 3, q 6(q 1 q 2 ) 2 + 4q 3 3q 4 m (1 + q 1 q 2 + q 3 q 4 + 2q q2 2 5q 1q 2 ) m H 1 (1 q 1 ) 3, q 2q 1 q 2 m [1 + q 1 q 2 + 2(q 1 q 2 ) 2 ] m H 2 (1 q 1 ) 2, H 1 = m[ m(1 q 1) 3 ] H 1 = (1 q 1) + 3.3m[ m(1 q 1 ) 2 ]. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

22 One Dimensional Scan Statistics Haiman Type Approximation: Improvement Main Theorem (Amarioarei 2012) For x such that P(Z 1 > x) = 1 q 1 α < 0.1 and m > 3 we have q 6(q 1 q 2 ) 2 + 4q 3 3q 4 m (1 + q 1 q 2 + q 3 q 4 + 2q q2 2 5q 1q 2 ) m 1(1 q 1 ) 3, q 2q 1 q 2 m [1 + q 1 q 2 + 2(q 1 q 2 ) 2 ] m 2(1 q 1 ) 2, 1 = 1 (α, q 1, m) = Γ(α) + mk(α) [ 2 = me(α, q 1, m) = m Increased range of applicability m + K(α)(1 q 1) + Γ(α)(1 q 1) m Sharp bounds values (ex. α = 0.025: and ) Selected values for K(α) and Γ(α) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61 ].

23 One Dimensional Scan Statistics Dierence between the results: 1 q 1 = H 2 Value of the coefficients Length of the sequence (n) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

24 One Dimensional Scan Statistics Haiman Type Approximation: Result T 1 Observe that Q 2m1 = P(Z 1 k) Q 3m1 = P(Z 1 k, Z 2 k) X 1 Xm1 X m1 +1 X 2m1 X 2m1 +1 m 1 2m 1 Q 2m1 X 3m1 X T1 If 1 Q 2m1 0.1 the approximation 3m 1 Q 3m1 P(S k) 2Q2m 1 Q 3m 1 [1+Q2m 1 Q 3m 1 +2(Q 2m 1 Q 3m 1) 2]L 1 1 where S = S m1 (T 1 ) Approximation error, about (L 1 1)E(α, L 1 1) (1 Q 2m 1 )2 with E(α, α, m) = E(α, m). Q 2m1 and Q 3m1 are evaluated using the previous approaches A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

25 One Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

26 One Dimensional Scan Statistics Comparison between methods Table 1 : n = 1, p = 0.005, m 1 = 10, T 1 = 1000, It App = 10 4 k Exact Glaz and Naus Haiman Approximation Lower Upper Product type Approximation Error Bound Bound Table 2 : n = 5, p = 0.05, m 1 = 25, T 1 = 500, It App = 10 4, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error Bound Bound A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

27 Two Dimensional Scan Statistics Model Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

28 Two Dimensional Scan Statistics Model Introducing the Model Let T 1, T 2 be positive integers T 2.. j. 1 X i,j T 1 T 2 Rectangular region R = [0, T 1 ] [0, T 2 ] (X ij ) 1 i T1 1 j T 2 i.i.d. integer r.v.'s Bernoulli(B(1, p)) Binomial(B(n, p)) Poisson(P(λ)) X ij number of observed events in the elementary subregion r ij = [i 1, i] [j 1, j] 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

29 Two Dimensional Scan Statistics Model Dening the Scan Statistic Let m 1, m 2 be positive integers T 1 Dene for 1 i j T j m j + 1, Y i1 i 2 = i 1 +m 1 1 i 2 +m 2 1 X ij i=i 1 j=i 2 m 2 T 2 The two dimensional scan statistic, m 1 S m1,m 2 (T 1, T 2 ) = max Y i1 i 2 1 i 1 T 1 m i 2 T 2 m 2 +1 Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

30 Two Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

31 Two Dimensional Scan Statistics Product Type Approximation Bernoulli Case Boutsikas and Koutras (2000) using Markov Chain Imbedding approach proposed the approximation Where, P (S m 1,m2 (T 1, T 2 ) k) Q(m 1,m2) (T 1 m 1 1)(T 2 m 2 1) Q(m1+1,m2+1) (T 1 m 1 )(T 2 m 2 ) Q(m1,m2+1) (T 1 m 1 1)(T 2 m 2 ) Q(m1+1,m2) (T 1 m 1 )(T 2 m 2 1) Q(m1, m2) = F (k; m1m2, p) k Q(m1 + 1, m2) = F 2 (k s; m2, p) b (s; (m1 1)m2, p) s=0 k k 1 Q(m1 + 1, m2 + 1) = b(s1; m1 1, p)b(s2; m1 1, p)b(t1; m2 1, p) s 1,s 2 =0 t 1,t 2 =0 i 1,i 2,i 3,i 4 =0 i b(t2; m2 1, p)p j (1 p) 4 i j F (u; (m 1 1)(m2 1), p) u = min {k s1 t1 i1, k s2 t1 i2, k s1 t2 i3, k s2 t2 i4} ( ) n b(s; n, p) = p s (1 p) n s s s F (s; n, p) = b(i; n, p) i =0 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

32 Two Dimensional Scan Statistics Bounds for the Bernoulli Case The following bounds were established by Boutsikas and Koutras (2003) Lower Bound LB = (1 Q 1 ) (T 1 m1)(t2 m2) (1 Q 2 ) T 1 m1 (1 Q 3 ) T 2 m2 (1 Q 4 ) Upper Bound ( UB = (1 Q1) 1 q (m 1 1)(3m 2 2)+(2m 1 1)(m 2 1) (T 1 Q1) m 1 1)(T 2 m 2 1) ( 1 q m 1 (m 2 1) ) T 2 Q1 m 2 1 ( 1 q (m 1 1)(2m 2 1)+(m 1 1)(m 2 1) T 1 Q1) m 1 1 ( 1 q (m 1 1)(2m 2 1)+m 1 (m 2 1) ) T 1 Q2 m 1 ( 1 q (m 1 1)(3m 2 2)+m 1 (m 2 1)+(m 1 1)(m 2 1) T 2 Q3) m 2 ( 1 q (m 1 1)(2m 2 1)+m 1 (m 2 1) ) Q4. Where X ij B(p), q = 1 p and Q1 = F c k+1,m 1 m 2 qm2 F c k+1,(m 1 1)m 2 qm1 F c k+1,m 1 (m 2 1) + qm 1 +m 2 1 F c k+1,(m 1 1)(m 2 1), Q2 = F c k+1,m 1 m 2 qm2 F c k+1,(m 1 1)m 2, Q 3 = F c k+1,m 1 m 2 qm1 F c k+1,m 1 (m 2 1), Q 4 = F c k+1,m 1 m 2 Fi c,m = 1 F (i 1; m, p). A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

33 Two Dimensional Scan Statistics Product Type Approximation Binomial and Poisson For X ij Bin(n, p) or X ij Pois(λ), Chen and Glaz (2009) proposed the product type approximation Q(m (T 1,2m2 1) 1 m1 1)(T2 2m2) Q(m1+1,m2) (T 1 m 1 )(T 2 m 2 1) Q(m1,2m2) (T 1 m 1 1)(T 2 2m 2 +1) P (S m 1,m2 (T 1, T 2 ) k) Q(m 1+1,m2+1) (T 1 m 1 )(T 2 m 2 ) Where, Q(m 1, 2m 2 1) = P (S m1,m 2 (m 1, 2m 2 1) k) Q(m 1, 2m 2 ) = P (S m1,m 2 (m 1, 2m 2 ) k) To compute the unknown variables we use Q(m 1, 2m 2 1) and Q(m 1, 2m 2 ) - adaptation of Karwe and Naus algorithm Q(m 1 + 1, m 2 ) and Q(m 1 + 1, m 2 + 1) - conditioning Formulas for Q(m1 + 1, m2) and Q(m1 + 1, m2 + 1) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

34 Two Dimensional Scan Statistics Lower Bound for Binomial and Poisson Glaz and Chen (1996) gave a lower bound applying Hoover Bonferroni type inequality of order r 3, P ( S m 1,m 2 (T 1, T2) k ) T 1 m 1 +1 = P i 1 =1 T 1 m 1 +1 i 1 =1 T 1 m 1 +1 i 1 =1 T 2 m 2 +1 T 2 m 2 i 2 =1 r 1 l =2 A i i 2 =1 1,i 2 T 1 m 1 +1 i 1 =1 T 2 m 2 +1 i 2 =1 ) P (A T 1 i 1,i 2 m 1 Ai 1,i 2 +1 T 2 m 2 +1 l i 2 =1 Where A i1,i 2 = {Y i1,i 2 k} and for r = 4, Lower Bound i 1 =1 ( ) P A i 1,i 2 ) P (A i 1,1 Ai 1 +1,1 ) P (A i 1,i 2 Aci 1,i Aci 1,i 2 +l 1 Ai 1,i 2 +l P ( S m 1,m 2 (T 1, T2) k ) (T1 m1) (Q(m1 + 1, m2) 2Q(m1, m2)) (T1 m1 + 1)(T2 m2 3) Q(m1, m2 + 2) + (T1 m1 + 1)(T2 m2 2)Q(m1, m2 + 3). Q(m 1 + 1, m 2 ), Q(m 1, m 2 ), Q(m 1, m 2 + 2), Q(m 1, m 2 + 3) - Karwe and Naus algorithm (variant) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

35 Two Dimensional Scan Statistics Upper Bound for Binomial and Poisson For the upper bound we adapt the inequality of Kuai et al. (2000) to the two dimensional framework: where P ( S m 1,m 2 We have T 1 m 1 +1 (T 1, T2) k) = 1 P i 1 =1 T 1 m 1 +1 T 2 m 2 +1 [ 1 Σ(i1, i2) = i 1 =1 T 1 m 1 +1 j 1 =1 i 2 =1 T 2 m 2 +1 j 2 =1 ( ) P A i 1,i 2 Aj 1,j = 2 ) P (Y i 1,i 2 n, Yj 1,j 2 n = Z = T 2 m 2 +1 A i i 2 =1 1,i 2 θ i 1,i 2 P(Ai 1,i 2 )2 Σ(i1, i2) + (1 θ i 1,i 2 )P(Ai 1,i 2 ) + (1 θi 1,i 2 )P(Ai 1,i 2 ) 2 ] Σ(i1, i2) θ i 1,i 2 P(Ai 1,i 2 ) ( ) P A i 1,i 2 Aj 1,j and θ 2 i 1,i 2 = Σ(i 1,i 2 ) P(A i 1,i 2 ) Σ(i 1,i 2 ) P(A i 1,i 2 ). { [1 Q(m 1, m2)] 2, if i1 j1 m1 or i2 ) j2 m2, 1 2Q(m1, m2) + P (Y i 1,i 2 n, Yj 1,j 2 n, otherwise n P(Z = k)p(y i 1,i 2 Z n k)2, k=0 (i 1 +m 1 1) (j 1 +m 1 1) s=i 1 j 1 (i 2 +m 2 1) (j 2 +m 2 1) t=i 2 j 2 X s,t. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

36 Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Writing the Scan as an Extreme of 1-Dependent R.V.'s Let L j = T j m j, j {1, 2} positive integers Dene for l {1, 2,..., L 2 1} Z l = max Y i1 i 2 1 i 1 (L 1 1)m 1 +1 (l 1)m 2 +1 i 2 lm 2 +1 (Z l ) l is 1-dependent and stationary Observe S m1,m 2 (T 1, T 2 ) = max 1 k L 2 1 Z k T 1 T 2 m 2 m 1 Z 3 Z 1 Z 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

37 Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: First Step Approximation Using Main Theorem we obtain Dene Q 2 = P(Z 1 k) Q 3 = P(Z 1 k, Z 2 k) If 1 Q 2 α 1 < 0.1 the (rst) approximation T 2 m 2 m 1 T 1 R P(S k) 2Q2 Q3 [1+Q2 Q3+2(Q2 Q3) 2 ] L 2 1 T1 T1 where S = S m1,m 2 (T 1, T 2 ) Approximation error (L 2 1)E(α 1, L 2 1)(1 Q 2 ) 2 T2 m1 Q 2 2m2 T2 m1 Q 3 3m2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

38 Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Second Step Approximation Q 2 : For s {1, 2,..., L 1 1} Z (2) s = max (s 1)m1+1 i1 sm1+1 1 i2 m2+1 Y i 1i2 ( ) Q 2 = P max Z (2) s k 1 s L1 1 Dene Q22 = P(Z (2) 1 k) Q 32 = P(Z (2) 1 k, Z (2) 2 k) Approximation (1 Q 22 α 2 ) Q 2 2Q22 Q32 [1+Q22 Q32+2(Q22 Q32) 2 ] L 1 1 (L 1 1)E(α 2, L 1 1)(1 Q 22 ) 2 Q 3 : For s {1, 2,..., L 1 1} Z (3) s = max (s 1)m1+1 i1 sm1+1 1 i2 2m2+1 Y i 1i2 ( ) Q 3 = P max Z (3) s k 1 l L1 1 Dene Q23 = P(Z (3) 1 k) Q 33 = P(Z (3) 1 k, Z (3) 2 k) Approximation (1 Q 23 α 2 ) Q 3 2Q23 Q33 [1+Q23 Q33+2(Q23 Q33) 2 ] L 1 1 (L 1 1)E(α 2, L 1 1)(1 Q 23 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

39 Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Illustration of the Approximation Process T1 T2 R m2 T1 m1 T1 T2 T2 Q 2 2m2 Q 3 3m2 m1 m1 T1 T1 T1 T1 T2 T2 T2 T2 2m1 Q 22 2m2 3m1 Q 32 2m2 Q 23 2m1 3m2 3m1 Q 33 3m2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

40 Two Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

41 Two Dimensional Scan Statistics Comparison between methods Table 3 : n = 1, p = 0.005, m 1 = m 2 = 6, T 1 = T 2 = 30, It App = 10 3, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error(App+Sim) Bound Bound Table 4 : n = 5, p = 0.002, m 1 = 5, m 2 = 10, T 1 = 50, T 2 = 80, It App = 10 4, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error(App+Sim) Bound Bound A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

42 Three Dimensional Scan Statistics Model and Approximations Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

43 Three Dimensional Scan Statistics Model and Approximations Introducing the Model T 3 X ijk T 2 T 1 Let T 1, T 2, T 3 be positive integers Rectangular region R = [0, T 1 ] [0, T 2 ] [0, T 3 ] ( ) Xijk 1 i T 1 i.i.d. integer r.v.'s 1 j T 2 1 k T 3 Bernoulli(B(1, p)) Binomial(B(n, p)) Poisson(P(λ)) X ijk number of observed events in the elementary subregion r ijk = [i 1, i] [j 1, j] [k 1, k] A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

44 Three Dimensional Scan Statistics Model and Approximations Dening the Scan Statistic Let m 1, m 2, m 3 be positive integers Dene for 1 i j T j m j + 1, Y i1 i 2 i 3 = i 1 +m 1 1 i 2 +m 2 1 i 3 +m 3 1 X ijk i=i 1 j=i 2 k=i 3 T 3 m 3 Y i1 i 2 i 3 The three dimensional scan statistic, S m1,m 2,m 3 = max 1 i 1 T 1 m i 2 T 2 m i 3 T 3 m 3 +1 Y i1 i 2 i 3. m 2 T T 1 2 m 1 Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

45 Three Dimensional Scan Statistics Model and Approximations Product Type Approximation Glaz et al. proposed in the product type approximation P ( S m 1,m 2,m 3 k) Q(m1 + 1, m2 + 1, m3 + 1) (T 1 m 1 )(T 2 m 2 )(T 3 m 3 ) Q(m1 + 1, m2, m3) (T 1 m 1 )(T 2 m 2 1)(T 3 m 3 1) Q(m1, m2, m3) (T 1 m 1 1)(T 2 m 2 1)(T 3 m 3 1) Q(m1 + 1, m2 + 1, m3) (T 1 m 1 )(T 2 m 2 )(T 3 m 3 1) Q(m1, m2 + 1, m3) (T 1 m 1 1)(T 2 m 2 )(T 3 m 3 1) Q(m1, m2, m3 + 1) (T 1 m 1 1)(T 2 m 2 1)(T 3 m 3 1) Q(m1 + 1, m2, m3 + 1) (T 1 m 1)(T 2 m 2 1)(T 3 m 3) Q(m1, m2 + 1, m3 + 1) (T 1 m 1 1)(T 2 m 2 )(T 3 m 3) Where, Q(N 1, N 2, N 3 ) = P (S m1,m 2,m 3 (N 1, N 2, N 3 ) k) The approximation also works for binomial and Poisson distribution Three Poisson Type Approximation A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

46 Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Writing the Scan as an Extreme of 1-Dependent R.V.'s m3 m2 m1 Let L j = T j m j, j {1, 2, 3} positive integers Dene for l {1, 2,..., L 3 1} Z l = max Y i1 i 2 i 3 1 i 1 (L 1 1)m i 2 (L 2 1)m 2 +1 (l 1)m 3 +1 i 3 lm 3 +1 Z3 Z2 Z1 T 3 (Z l ) l is 1-dependent and stationary Observe S m1,m 2,m 3 = max 1 l L 3 1 Z l T 2 T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

47 Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: First Step Approximation 2m3 R T 3 T 2 T 1 Q 2 Q 3 T 2 T 1 3m3 T 2 T 1 Dene Q 2 = P(Z 1 k) Q 3 = P(Z 1 k, Z 2 k) If 1 Q 2 α 1 < 0.1 the (rst) approximation P(S k) 2Q2 Q3 [1+Q2 Q3+2(Q2 Q3) 2 ] L 3 1 where S = S m1,m 2,m 3 (T 1, T 2, T 3 ) Approximation error (L 3 1)E(α 1, L 3 1)(1 Q 2 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

48 Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Approximation for Q 2 and Q 3 Q 2 : For l {1, 2,..., L 2 1} Z (2) = max Y l i 1i2i3 1 i1 (L1 1)m1+1 (l 1)m2+1 i2 lm2+1 1 i3 m3+1 ( ) Q 2 = P max Z (2) n 1 l L2 1 l Dene Q22 = P(Z (2) 1 n) Q 32 = P(Z (2) 1 n, Z (2) 2 n) Approximation (1 q 22 α 2 ) Q 2 2Q22 Q32 [1+Q22 Q32+2(Q22 Q32) 2 ] L 2 1 (L 2 1)E(α 2, L 2 1)(1 Q 22 ) 2 Q 3 : For l {1, 2,..., L 2 1} Z (3) = max Y l i 1i2i3 1 i1 (L1 1)m1+1 (l 1)m2+1 i2 lm2+1 1 i3 2m3+1 ( ) Q 3 = P max Z (3) n 1 l L2 1 l Dene Q23 = P(Z (3) 1 n) Q 33 = P(Z (3) 1 n, Z (3) 2 n) Approximation (1 q 23 α 2 ) Q 3 2Q23 Q33 [1+Q23 Q33+2(Q23 Q33) 2 ] L 2 1 (L 2 1)E(α 2, L 2 1)(1 Q 23 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

49 Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Last Step (Approximating Q ts ) Applying again the second part of the Main Theorem... For s, t {2, 3} and j {1, 2,..., L 1 1} dene Observe Z (ts) j = max (j 1)m1+1 i1 jm1+1 1 i2 (t 1)m2+1 1 i3 (s 1)m3+1 Y i 1i2i3 ( ) Q ts = P max Z (ts) n 1 j L1 1 j ( Dene for r, s, t {2, 3}, Q rts = P Q ts r 1 j=1 ) j n} {Z (ts) If 1 Q 2ts α 3 then the approximation and the error 2Q2ts Q3ts (L [1+Q2ts Q3ts +2(Q2ts Q3ts ) 2 ] L )E(α 3, L 1 1)(1 Q 2ts ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

50 Three Dimensional Scan Statistics Model and Approximations An Illustration of the Approximation Chain (Q 2 ) Q 2 2m3 Q 22 T2 T1 Q 32 2m3 2m3 2m2 T1 3m2 T1 Q 222 Q322 Q 232 Q 332 2m3 2m3 2m3 2m3 2m2 2m1 2m2 3m1 3m2 2m1 3m2 3m1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

51 Three Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

52 Three Dimensional Scan Statistics Bernoulli Case Comparing with existing results: Table 5 : n = 1, p = , m 1 = m 2 = m 3 = 5, T 1 = T 2 = T 3 = 60, It App = 10 5 k ˆP(S k) Glaz et al. Our Approximation Simulation Total Product type Approximation Error Error Error Table 6 : n = 1, p = , m 1 = m 2 = m 3 = 5, T 1 = T 2 = T 3 = 60, It App = 10 5 k ˆP(S k) Glaz et al. Our Approximation Simulation Total Product type Approximation Error Error Error A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

53 References Glaz, J., Naus, J., Wallenstein, S.: Scan statistic. Springer (2001). Glaz, J., Pozdnyakov, V., Wallenstein, S.: Scan statistic: Methods and Applications. Birkhauser (2009). Amarioarei, A.: Approximation for the distribution of extremes of one dependent stationary sequences of random variables, arxiv: v1 (submitted) Amarioarei, A., Preda, C.: Approximation for the distribution of three dimensional discrete scan statistic, Methodol Comput Appl Probab (2013) DOI: /s Amarioarei, A., Preda, C.: Approximation for two dimensional discrete scan statistics in some block-factor type dependent models, arxiv: (submitted) Boutsikas, M.V., Koutras, M.: Reliability approximations for Markov chain imbeddable systems. Methodol Comput Appl Probab 2 (2000), A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

54 References Boutsikas, M. and Koutras, M. Bounds for the distribution of two dimensional binary scan statistics, Probability in the Engineering and Information Sciences, 17, , Chen, J. and Glaz, J. Two-dimensional discrete scan statistics, Statistics and Probability Letters 31, 5968, Glaz, J., Guerriero, M., Sen, R.: Approximations for three dimensional scan statistic. Methodol Comput Appl Probab 12 (2010), Haiman, G.: First passage time for some stationary sequence. Stoch Proc Appl 80 (1999), Haiman, G.: Estimating the distribution of scan statistics with high precision. Extremes 3 (2000), Haiman, G., Preda, C.: A new method for estimating the distribution of scan statistics for a two-dimensional Poisson process. Methodol Comput Appl Probab 4 (2002), A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

55 References Haiman, G., Preda, C.: Estimation for the distribution of two-dimensional scan statistics. Methodol Comput Appl Probab 8 (2006), Haiman, G.: Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary sequences. J. Stat Plan Infer 137 (2007), Kuai, H., Alajaji, F., Takahara, G.: A lower bound on the probability of a nite union of events. Discrete Mathematics 215 (2000), A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

56 Block-Factor Type Model Introducing the Model Let 1 c s Ts, s {1, 2} integers ( ) Xij i.i.d. r.v.'s 1 i T 1 1 j T 2 conguration matrix in (i, j) C (i,j) = ( C (i,j) (k, l) ) 1 k c 2 1 l c 1 C (i,j) (k, l) = Xi+l 1,j+c2 k transformation Π : M c2,c 1 (R) R X i,j+c X i,j... X i,j Π Xi +c 1 1,j+c 2 1. X i +c 1 1,j Dene the block-factor model, T 1 = T1 c 1 + 1, T 2 = T2 c X i,j = Π ( C (i,j) ), 1 i T 1 1 j T 2 Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

57 Block-Factor Type Model Model: case c 1 = c 2 = 3 To simplify the presentation we consider c 1 = c 2 = 3 X i,j+2 X i+1,j+2 X i+2,j+2 X i,j+1 X i+1,j+1 X i+2,j+1 T 2 X i,j X i+1,j X i+2,j X i,j T 2. j. 1 Π. j. 1 1 i T 1 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

58 Block-Factor Type Model Model: case c 1 = c 2 = 3 To simplify the presentation we consider c 1 = c 2 = 3 X i,j+2 X i+1,j+2 X i+2,j+2 X i,j+1 X i+1,j+1 X i+2,j+1 T 2 X i,j X i+1,j X i+2,j X i,j T 2. j. 1 Π. j. 1 1 i T 1 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

Block-Factor Type Model Example: A game of

59 Block-Factor Type Model Example: A game of minesweeper Model: X i,j B(p) (presence, absence of a mine) number of neighboring mines T (C (i,j) ) = (s,t) {0,1,2} 2 (s,t) (1,1) X i,j = Π ( C (i,j) ) X i+s,j+t Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

60 Block-Factor Type Model Computation of Q(m 1 + 1, m 2 ) and Q(m 1 + 1, m 2 + 1) Consider X ij B(n, p) and the notation Y j 1,j2 i1,i2 = j1 i1=1 i2=1 j2 X ij, P ( S m 1,m 2 (m 1 + 1, m2) k ) k (m 1 1)m 2 n = P 2 ( Y 1,m ) ( 2 1,1 k y P Y m 1,m ) 2 2,1 = y y=0 P ( S m 1,m 2 (m 1 + 1, m2 + 1) k ) = k (m 1 1)(m 2 1)n y 1 =0 [k y 1 y 2 y 3 ] (m 1 1)n y 4 =0 ( P Y 1,m 2 +1 ) 1,m 2 +1 a 2 ( P Y m 1,m ) 2 2,2 = y1 P ( P Y m 1,1 ) 2,1 = y4 P (k y 1 ) (m 2 1)n y 2 =0 (k y 1 ) (m 2 1)n y 3 =0 [k y 1 y 2 y 3 ] (m 1 1)n ( P Y m 1 +1,1 ( P Y1,1 1,1 ) a 1 y 5 =0 ) ( m 1 +1,1 a 3 P Y m 1 +1,m 2 +1 ) m 1 +1,m 2 +1 a 4 ) ( P Y m 1 +1,m ) m ,2 = y3 ) ( Y 1,m 2 1,2 = y2 ( Y m 1,m ,m 2 +1 = y5 a1 = k y1 y2 y4, a2 = k y1 y2 y5, a3 = k y1 y3 y4, a4 = k y1 y3 y5, Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

61 Block-Factor Type Model Karwe Naus recursive methods Dene We have the recurrence relations b k 2(m) (y) = P (Sm(2m) k, Y m+1(m) = y) f (y) = P(X1 = y) Q2m k = P (Sm(2m) k) k b k 2(1) (y) = f (j) f (y) j=0 y b k 2(m) (y) = η=0 ν=0 k y+η k Q2m k = b k 2(m) (y) y=0 k Q2m 1 k = f (x)q k x 2(m 1) x=0 b k ν (y η)f (ν)f (η) 2(m 1) Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

62 Block-Factor Type Model Selected Values for K(α) and Γ(α) α K(α) Γ(α) Table 7 : Selected values for K(α) and Γ(α) Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille / 61

Approximations for two-dimensional discrete scan statistics in some dependent models

Approximations for two-dimensional discrete scan statistics in some dependent models Alexandru Am rioarei Cristian Preda Laboratoire de Mathématiques Paul Painlevé Département de Probabilités et Statistique