Scan Statistics: Theory and Applications Alexandru Am rioarei Laboratoire de Mathématiques Paul Painlevé Département de Probabilités et Statistique Université de Lille 1, INRIA Modal Team Seminaire de Probabilité et Statistique Laboratoire de Mathématiques Paul Painlevé 5 March, 2014, Lille A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 1 / 61
Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 2 / 61
Introduction Example Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 3 / 61
Introduction Example Epidemiology example 1 year JFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJASOND 1991 1992 1993 1994 1995 Observation of disease cases over time: N = 19 cases over a period of T = 5 years Observation The epidemiologist notes a one year period (from April 93 - through April 94) with 8 cases: 42%! Question Given 19 cases over 5 years, how unusual is it to have a 1 year period containing as many as 8 cases? A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 4 / 61
Introduction Example The answer: A First approach A First approach: X = the number of cases falling in [April 93, April 94] X Bin(19, 0.2) P(X 8) = 0.023 Conclusion: atypical situation! But: it is not the answer to our question: the one year period is not xed but identied after the scanning process! A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 5 / 61
Introduction Example The answer: Correct approach The scan statistics: S = the maximum number of cases over any continuous one year period in [0, T ] Thus, P(S 8) = 0.379 gives the answer to the epidemiologist question. Conclusion: no unusual situation! A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 6 / 61
One Dimensional Scan Statistics Denitions and Notations Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 7 / 61
One Dimensional Scan Statistics Denitions and Notations Introducing the Scan Statistics Model Let T 1 be a positive integer and X 1, X 2,..., X T1 a sequence of i.i.d. r.v.'s. For m 1 T 1 integer, consider the moving sums Y i1 = i 1 +m 1 1 j=i 1 The discrete one dimensional scan statistics Problem S m1 (T 1 ) = X j max Y i 1. 1 i 1 T 1 m 1 +1 Approximate the distribution of one dimensional scan statistic P (S m1 (T 1 ) k). Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 8 / 61
One Dimensional Scan Statistics Denitions and Notations Related Statistics Let X 1,..., X T1 be a sequence of i.i.d. 0 1 Bernoulli of parameter p W m1,k - the waiting time until we rst observe at least k successes in a window of size m 1 P ( W m1,k T 1 ) = P (Sm1 (T 1 ) k) D T1 (k) - the length of the smallest window that contains at least k successes L T1 P (D T1 (k) m 1 ) = P (S m1 (T 1 ) k) - the length of the longest success run P (L T1 m 1 ) = P (S m1 (T 1 ) m 1 ) = P (S m1 (T 1 ) = m 1 ) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 9 / 61
One Dimensional Scan Statistics Denitions and Notations Literature A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 10 / 61
One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 11 / 61
One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Approach Fu (2001) applied the Markov Chain Imbedding Technique to nd the distribution of binary scan statistics. Main Idea Express the distribution of the S m1 (T 1 ) in terms of the waiting time distribution of a special compound pattern dene for 0 k m 1 m1 {}}{ F m 1,k = {Λ i Λ 1 = 1... 1, Λ }{{} 2 = 10 1... 1,..., Λ }{{} l = 1... 1 0... 01} }{{} k k 1 k 1 Fm 1,k the compound pattern Λ = l i=1 Λ i, Λ i F m 1,k m1 k1 ( k 2 + j ) = j j=0 P(S m 1 (T 1) < k) = P(W (Λ) T 1 + 1). P(S m 1 (T 1) < k) = ξn T 1 1, where ξ = (1, 0,..., 0) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 12 / 61
One Dimensional Scan Statistics Exact Formula by MCIT (Bernoulli Case) Example Consider the i.i.d. two-state sequence (X i ) i {1,2,...,T1 } with p = P(X 1 = 1) and q = P(X 1 = 0). A realisation for T 1 = 20 00101011101101010110 For k = 3 and m 1 = 4 F 4,3 = {Λ 1 = 111, Λ 2 = 1011, Λ 3 = 1101} The state space the principal matrix: N = 0 q p 0 0 0 0 0 q p 0 0 0 0 0 0 0 q p 0 0 0 q 0 0 0 p 0 0 0 0 0 0 0 q 0 0 0 q 0 0 0 0 q 0 0 0 0 0 Ω = {, 0, 1, 10, 11, 101, 110, α 1, α 2, α 3 } A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 13 / 61
One Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 14 / 61
One Dimensional Scan Statistics Naus Product Type Approximation Considering that T 1 = L 1 m 1, Naus (1982) gave the following product type approximation ( ) T 1 Q3m 1 P (S m1 (T 1 ) k) Q 2 m 1 2m1 Q 2m, 1 where Q 2m1 = P (S m1 (2m 1 ) k) and Q 3m1 = P (S m1 (3m 1 ) k). Idea of proof: P (S m 1 (T 1) k) = P(E 1 )P(E 2 E 1 ) P ( E L 1 where for j {1, 2,..., L 1}, { } E j = max Y i 1 k, (j 1)m1+1 i1 jm1 and use the Markov-like approximation (exchangeability) l 1 P E l E i P (E l E l 1 ) P (E 2 E 1 ) P (E i=1 1 ) L 2 i=1 E i ) = Q 3m1. Q 2m 1, A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 15 / 61
One Dimensional Scan Statistics Product Type Approximation for Bernoulli For Bernoulli r.v.'s, Naus (1982) derived exact formulas for Q 2m1 and Q 3m1 : Q 2m 1 = F 2 (k; m 1, p) kb(k + 1; m 1, p)f (k 1; m 1, p) + m 1 pb(k + 1; m 1, p)f (k 2, m 1 1), Q 3m 1 = F 3 (k; m 1, p) A 1 + A 2 + A 3 A 4, where A 1 = 2b(k + 1; m 1, p)f (k; m 1, p)[kf (k 1; m 1, p) m 1 pf (k 2; m 1 1, p)], A 2 = 0.5b 2 (k + 1; m 1, p) [k(k 1)F (k 2; m 1, p) 2(k 1)m 1 F (k 3; m 1 1, p) +m 1 (m 1 1)p 2 F (k 4; m 1 2, p) ], k A 3 = b(2(k + 1) r; m 1, p)f 2 (r 1; m 1, p), r=1 k A 4 = b(2(k + 1) r; m 1, p)b(r + 1; m 1, p)[rf (r 1; m 1, p) m 1 pf (r 2; m 1 1, p)]. r=2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 16 / 61
One Dimensional Scan Statistics Product Type Approximation for Binomial and Poisson If X i Bin(n, p) or X i Pois(λ), we have the approximation P (S m 1 (T 1) k) Q 2m 1 Q 2m 1 1 ( Q3m 1 Q 2m 1 ( Q2m 1 ) T 1 m 1 2, T 1 3m 1 Q 2m 1 1 ) T 1 2m1+1, T 1 2m 1 where Q 2m1 1, Q 2m1 and Q 3m1 are computed by Karwe and Naus (1996) recurrence. Karwe Naus algorithm for Q 2m 1 1 and Q2m 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 17 / 61
One Dimensional Scan Statistics Bounds Glaz and Naus (1991) developed a variety of tight bounds: Lower Bounds Q 2m 1 P (S m 1 (T 1) k) [ ], T 1 + Q2m 1 1 Q2m T 1 2m 1 1 2m1 1 Q 2m 1 1Q2m 1 Q 3m 1 [ 1 + Q 2m 1 1 Q2m 1 Q 3m 1 1 ] T 1 3m1, T 1 3m 1 Upper Bounds P (S m 1 (T 1) k) Q 2m 1 [1 Q 2m1 1 + Q 2m 1 ]T 1 2m1, T 1 2m 1 Q 3m 1 [1 Q 2m1 1 + Q 2m 1 ]T 1 3m1, T 1 3m 1 The values Q 2m1 1, Q 2m1, Q 3m1 1, Q 3m1 are computed using Karwe and Naus (1996) algorithm. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 18 / 61
One Dimensional Scan Statistics Haiman Type Approximation: Main Observation Haiman proposed in 2000 a dierent approach Main Observation The scan statistic r.v. can be viewed as a maximum of a sequence of 1-dependent stationary r.v.. The idea: discrete and continuous one dimensional scan statistic: Haiman (2000,2007) discrete and continuous two dimensional scan statistic: Haiman and Preda (2002,2006) discrete three dimensional scan statistic: Am rioarei and Preda (2013) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 19 / 61
One Dimensional Scan Statistics Haiman Type Approximation: Writing the Scan as an Extreme of 1-Dependent R.V.'s Let T 1 = L 1 m 1 positive integer Dene for j {1, 2,..., L 1 1} Z j = (Z j ) j is 1-dependent and stationary max (j 1)m 1 +1 i 1 jm 1 +1 Y i 1 {}}{ X 1, X 2,..., X m 1, X m m1+1,..., X 2m 1 }{{}, X2m1+1,..., X 3m 1, X 3m1+1,..., X 4m 1 }{{} Z1 Z2 Z3 Observe S m1 (T 1 ) = max 1 j L 1 1 Z j A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 20 / 61
One Dimensional Scan Statistics Haiman Type Approximation: Main Tool Let (Z j ) j 1 be a strictly stationary 1-dependent sequence of r.v.'s and let q m = q m (x) = P(max(Z 1,..., Z m ) x), with x < sup{u P(Z 1 u) < 1}. Theorem (Haiman 1999) For any x such that P(Z 1 > x) = 1 q 1 0.025 and any integer m > 3, q 6(q 1 q 2 ) 2 + 4q 3 3q 4 m (1 + q 1 q 2 + q 3 q 4 + 2q1 2 + 3q2 2 5q 1q 2 ) m H 1 (1 q 1 ) 3, q 2q 1 q 2 m [1 + q 1 q 2 + 2(q 1 q 2 ) 2 ] m H 2 (1 q 1 ) 2, H 1 = 561 + 88m[1 + 124m(1 q 1) 3 ] H 1 = 9 + 561(1 q 1) + 3.3m[1 + 4.7m(1 q 1 ) 2 ]. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 21 / 61
One Dimensional Scan Statistics Haiman Type Approximation: Improvement Main Theorem (Amarioarei 2012) For x such that P(Z 1 > x) = 1 q 1 α < 0.1 and m > 3 we have q 6(q 1 q 2 ) 2 + 4q 3 3q 4 m (1 + q 1 q 2 + q 3 q 4 + 2q1 2 + 3q2 2 5q 1q 2 ) m 1(1 q 1 ) 3, q 2q 1 q 2 m [1 + q 1 q 2 + 2(q 1 q 2 ) 2 ] m 2(1 q 1 ) 2, 1 = 1 (α, q 1, m) = Γ(α) + mk(α) [ 2 = me(α, q 1, m) = m Increased range of applicability 1 + 3 m + K(α)(1 q 1) + Γ(α)(1 q 1) m Sharp bounds values (ex. α = 0.025: 561 145 and 88 17.5) Selected values for K(α) and Γ(α) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 22 / 61 ].
One Dimensional Scan Statistics Dierence between the results: 1 q 1 = 0.025 9 8 2 H 2 Value of the coefficients 7 6 5 4 3 2 1 0 50 100 150 200 250 300 350 400 450 500 Length of the sequence (n) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 23 / 61
One Dimensional Scan Statistics Haiman Type Approximation: Result T 1 Observe that Q 2m1 = P(Z 1 k) Q 3m1 = P(Z 1 k, Z 2 k) X 1 Xm1 X m1 +1 X 2m1 X 2m1 +1 m 1 2m 1 Q 2m1 X 3m1 X T1 If 1 Q 2m1 0.1 the approximation 3m 1 Q 3m1 P(S k) 2Q2m 1 Q 3m 1 [1+Q2m 1 Q 3m 1 +2(Q 2m 1 Q 3m 1) 2]L 1 1 where S = S m1 (T 1 ) Approximation error, about (L 1 1)E(α, L 1 1) (1 Q 2m 1 )2 with E(α, α, m) = E(α, m). Q 2m1 and Q 3m1 are evaluated using the previous approaches A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 24 / 61
One Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 25 / 61
One Dimensional Scan Statistics Comparison between methods Table 1 : n = 1, p = 0.005, m 1 = 10, T 1 = 1000, It App = 10 4 k Exact Glaz and Naus Haiman Approximation Lower Upper Product type Approximation Error Bound Bound 1 0.810209 0.810216 0.810404 0.001111 0.809903 0.810439 2 0.995764 0.995764 0.995764 3 10 7 0.995764 0.995764 3 0.999950 0.999950 0.999950 4 10 11 0.999950 0.999950 Table 2 : n = 5, p = 0.05, m 1 = 25, T 1 = 500, It App = 10 4, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error Bound Bound 13 0.712750 0.705787 0.714699 0.039308 0.697431 0.706948 14 0.867498 0.862184 0.865029 0.012502 0.859543 0.862407 15 0.946912 0.943329 0.946177 0.004169 0.942552 0.943362 16 0.980230 0.978959 0.979822 0.001354 0.978733 0.978963 17 0.993486 0.992821 0.993134 0.000433 0.992756 0.992822 18 0.997802 0.997726 0.997849 0.000127 0.997708 0.997726 19 0.999362 0.999327 0.999358 3 10 5 0.999322 0.999327 20 0.999819 0.999813 0.999825 9 10 6 0.999812 0.999813 21 0.999954 0.999951 0.999953 2 10 6 0.999951 0.999951 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 26 / 61
Two Dimensional Scan Statistics Model Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 27 / 61
Two Dimensional Scan Statistics Model Introducing the Model Let T 1, T 2 be positive integers T 2.. j. 1 X i,j T 1 T 2 Rectangular region R = [0, T 1 ] [0, T 2 ] (X ij ) 1 i T1 1 j T 2 i.i.d. integer r.v.'s Bernoulli(B(1, p)) Binomial(B(n, p)) Poisson(P(λ)) X ij number of observed events in the elementary subregion r ij = [i 1, i] [j 1, j] 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 28 / 61
Two Dimensional Scan Statistics Model Dening the Scan Statistic Let m 1, m 2 be positive integers T 1 Dene for 1 i j T j m j + 1, Y i1 i 2 = i 1 +m 1 1 i 2 +m 2 1 X ij i=i 1 j=i 2 m 2 T 2 The two dimensional scan statistic, m 1 S m1,m 2 (T 1, T 2 ) = max Y i1 i 2 1 i 1 T 1 m 1 +1 1 i 2 T 2 m 2 +1 Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 29 / 61
Two Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 30 / 61
Two Dimensional Scan Statistics Product Type Approximation Bernoulli Case Boutsikas and Koutras (2000) using Markov Chain Imbedding approach proposed the approximation Where, P (S m 1,m2 (T 1, T 2 ) k) Q(m 1,m2) (T 1 m 1 1)(T 2 m 2 1) Q(m1+1,m2+1) (T 1 m 1 )(T 2 m 2 ) Q(m1,m2+1) (T 1 m 1 1)(T 2 m 2 ) Q(m1+1,m2) (T 1 m 1 )(T 2 m 2 1) Q(m1, m2) = F (k; m1m2, p) k Q(m1 + 1, m2) = F 2 (k s; m2, p) b (s; (m1 1)m2, p) s=0 k k 1 Q(m1 + 1, m2 + 1) = b(s1; m1 1, p)b(s2; m1 1, p)b(t1; m2 1, p) s 1,s 2 =0 t 1,t 2 =0 i 1,i 2,i 3,i 4 =0 i b(t2; m2 1, p)p j (1 p) 4 i j F (u; (m 1 1)(m2 1), p) u = min {k s1 t1 i1, k s2 t1 i2, k s1 t2 i3, k s2 t2 i4} ( ) n b(s; n, p) = p s (1 p) n s s s F (s; n, p) = b(i; n, p) i =0 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 31 / 61
Two Dimensional Scan Statistics Bounds for the Bernoulli Case The following bounds were established by Boutsikas and Koutras (2003) Lower Bound LB = (1 Q 1 ) (T 1 m1)(t2 m2) (1 Q 2 ) T 1 m1 (1 Q 3 ) T 2 m2 (1 Q 4 ) Upper Bound ( UB = (1 Q1) 1 q (m 1 1)(3m 2 2)+(2m 1 1)(m 2 1) (T 1 Q1) m 1 1)(T 2 m 2 1) ( 1 q m 1 (m 2 1) ) T 2 Q1 m 2 1 ( 1 q (m 1 1)(2m 2 1)+(m 1 1)(m 2 1) T 1 Q1) m 1 1 ( 1 q (m 1 1)(2m 2 1)+m 1 (m 2 1) ) T 1 Q2 m 1 ( 1 q (m 1 1)(3m 2 2)+m 1 (m 2 1)+(m 1 1)(m 2 1) T 2 Q3) m 2 ( 1 q (m 1 1)(2m 2 1)+m 1 (m 2 1) ) Q4. Where X ij B(p), q = 1 p and Q1 = F c k+1,m 1 m 2 qm2 F c k+1,(m 1 1)m 2 qm1 F c k+1,m 1 (m 2 1) + qm 1 +m 2 1 F c k+1,(m 1 1)(m 2 1), Q2 = F c k+1,m 1 m 2 qm2 F c k+1,(m 1 1)m 2, Q 3 = F c k+1,m 1 m 2 qm1 F c k+1,m 1 (m 2 1), Q 4 = F c k+1,m 1 m 2 Fi c,m = 1 F (i 1; m, p). A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 32 / 61
Two Dimensional Scan Statistics Product Type Approximation Binomial and Poisson For X ij Bin(n, p) or X ij Pois(λ), Chen and Glaz (2009) proposed the product type approximation Q(m (T 1,2m2 1) 1 m1 1)(T2 2m2) Q(m1+1,m2) (T 1 m 1 )(T 2 m 2 1) Q(m1,2m2) (T 1 m 1 1)(T 2 2m 2 +1) P (S m 1,m2 (T 1, T 2 ) k) Q(m 1+1,m2+1) (T 1 m 1 )(T 2 m 2 ) Where, Q(m 1, 2m 2 1) = P (S m1,m 2 (m 1, 2m 2 1) k) Q(m 1, 2m 2 ) = P (S m1,m 2 (m 1, 2m 2 ) k) To compute the unknown variables we use Q(m 1, 2m 2 1) and Q(m 1, 2m 2 ) - adaptation of Karwe and Naus algorithm Q(m 1 + 1, m 2 ) and Q(m 1 + 1, m 2 + 1) - conditioning Formulas for Q(m1 + 1, m2) and Q(m1 + 1, m2 + 1) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 33 / 61
Two Dimensional Scan Statistics Lower Bound for Binomial and Poisson Glaz and Chen (1996) gave a lower bound applying Hoover Bonferroni type inequality of order r 3, P ( S m 1,m 2 (T 1, T2) k ) T 1 m 1 +1 = P i 1 =1 T 1 m 1 +1 i 1 =1 T 1 m 1 +1 i 1 =1 T 2 m 2 +1 T 2 m 2 i 2 =1 r 1 l =2 A i i 2 =1 1,i 2 T 1 m 1 +1 i 1 =1 T 2 m 2 +1 i 2 =1 ) P (A T 1 i 1,i 2 m 1 Ai 1,i 2 +1 T 2 m 2 +1 l i 2 =1 Where A i1,i 2 = {Y i1,i 2 k} and for r = 4, Lower Bound i 1 =1 ( ) P A i 1,i 2 ) P (A i 1,1 Ai 1 +1,1 ) P (A i 1,i 2 Aci 1,i 2 +1... Aci 1,i 2 +l 1 Ai 1,i 2 +l P ( S m 1,m 2 (T 1, T2) k ) (T1 m1) (Q(m1 + 1, m2) 2Q(m1, m2)) (T1 m1 + 1)(T2 m2 3) Q(m1, m2 + 2) + (T1 m1 + 1)(T2 m2 2)Q(m1, m2 + 3). Q(m 1 + 1, m 2 ), Q(m 1, m 2 ), Q(m 1, m 2 + 2), Q(m 1, m 2 + 3) - Karwe and Naus algorithm (variant) A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 34 / 61
Two Dimensional Scan Statistics Upper Bound for Binomial and Poisson For the upper bound we adapt the inequality of Kuai et al. (2000) to the two dimensional framework: where P ( S m 1,m 2 We have T 1 m 1 +1 (T 1, T2) k) = 1 P i 1 =1 T 1 m 1 +1 T 2 m 2 +1 [ 1 Σ(i1, i2) = i 1 =1 T 1 m 1 +1 j 1 =1 i 2 =1 T 2 m 2 +1 j 2 =1 ( ) P A i 1,i 2 Aj 1,j = 2 ) P (Y i 1,i 2 n, Yj 1,j 2 n = Z = T 2 m 2 +1 A i i 2 =1 1,i 2 θ i 1,i 2 P(Ai 1,i 2 )2 Σ(i1, i2) + (1 θ i 1,i 2 )P(Ai 1,i 2 ) + (1 θi 1,i 2 )P(Ai 1,i 2 ) 2 ] Σ(i1, i2) θ i 1,i 2 P(Ai 1,i 2 ) ( ) P A i 1,i 2 Aj 1,j and θ 2 i 1,i 2 = Σ(i 1,i 2 ) P(A i 1,i 2 ) Σ(i 1,i 2 ) P(A i 1,i 2 ). { [1 Q(m 1, m2)] 2, if i1 j1 m1 or i2 ) j2 m2, 1 2Q(m1, m2) + P (Y i 1,i 2 n, Yj 1,j 2 n, otherwise n P(Z = k)p(y i 1,i 2 Z n k)2, k=0 (i 1 +m 1 1) (j 1 +m 1 1) s=i 1 j 1 (i 2 +m 2 1) (j 2 +m 2 1) t=i 2 j 2 X s,t. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 35 / 61
Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Writing the Scan as an Extreme of 1-Dependent R.V.'s Let L j = T j m j, j {1, 2} positive integers Dene for l {1, 2,..., L 2 1} Z l = max Y i1 i 2 1 i 1 (L 1 1)m 1 +1 (l 1)m 2 +1 i 2 lm 2 +1 (Z l ) l is 1-dependent and stationary Observe S m1,m 2 (T 1, T 2 ) = max 1 k L 2 1 Z k T 1 T 2 m 2 m 1 Z 3 Z 1 Z 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 36 / 61
Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: First Step Approximation Using Main Theorem we obtain Dene Q 2 = P(Z 1 k) Q 3 = P(Z 1 k, Z 2 k) If 1 Q 2 α 1 < 0.1 the (rst) approximation T 2 m 2 m 1 T 1 R P(S k) 2Q2 Q3 [1+Q2 Q3+2(Q2 Q3) 2 ] L 2 1 T1 T1 where S = S m1,m 2 (T 1, T 2 ) Approximation error (L 2 1)E(α 1, L 2 1)(1 Q 2 ) 2 T2 m1 Q 2 2m2 T2 m1 Q 3 3m2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 37 / 61
Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Second Step Approximation Q 2 : For s {1, 2,..., L 1 1} Z (2) s = max (s 1)m1+1 i1 sm1+1 1 i2 m2+1 Y i 1i2 ( ) Q 2 = P max Z (2) s k 1 s L1 1 Dene Q22 = P(Z (2) 1 k) Q 32 = P(Z (2) 1 k, Z (2) 2 k) Approximation (1 Q 22 α 2 ) Q 2 2Q22 Q32 [1+Q22 Q32+2(Q22 Q32) 2 ] L 1 1 (L 1 1)E(α 2, L 1 1)(1 Q 22 ) 2 Q 3 : For s {1, 2,..., L 1 1} Z (3) s = max (s 1)m1+1 i1 sm1+1 1 i2 2m2+1 Y i 1i2 ( ) Q 3 = P max Z (3) s k 1 l L1 1 Dene Q23 = P(Z (3) 1 k) Q 33 = P(Z (3) 1 k, Z (3) 2 k) Approximation (1 Q 23 α 2 ) Q 3 2Q23 Q33 [1+Q23 Q33+2(Q23 Q33) 2 ] L 1 1 (L 1 1)E(α 2, L 1 1)(1 Q 23 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 38 / 61
Two Dimensional Scan Statistics Haiman Type Approximation and Error Bounds: Illustration of the Approximation Process T1 T2 R m2 T1 m1 T1 T2 T2 Q 2 2m2 Q 3 3m2 m1 m1 T1 T1 T1 T1 T2 T2 T2 T2 2m1 Q 22 2m2 3m1 Q 32 2m2 Q 23 2m1 3m2 3m1 Q 33 3m2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 39 / 61
Two Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 40 / 61
Two Dimensional Scan Statistics Comparison between methods Table 3 : n = 1, p = 0.005, m 1 = m 2 = 6, T 1 = T 2 = 30, It App = 10 3, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error(App+Sim) Bound Bound 2 0.915903 0.914013 0.920211 0.041483 0.901935 0.945623 3 0.994292 0.994395 0.994578 0.000803 0.993785 0.996638 4 0.999747 0.999757 0.999760 2 10 5 0.999737 0.999858 5 0.999992 0.999992 0.999992 7 10 7 0.999992 0.999995 Table 4 : n = 5, p = 0.002, m 1 = 5, m 2 = 10, T 1 = 50, T 2 = 80, It App = 10 4, It Sim = 10 3 k ˆP(S k) Glaz and Naus Haiman Total Lower Upper Product type Approximation Error(App+Sim) Bound Bound 4 0.894654 0.873256 0.893724 0.037136 0.803422 0.944318 5 0.988003 0.986249 0.988144 0.002125 0.981418 0.993451 6 0.998963 0.998847 0.998963 0.000152 0.998543 0.999401 7 0.999926 0.999919 0.999925 9 10 6 0.999903 0.999955 8 0.999995 0.999995 0.999995 5 10 7 0.999994 0.999997 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 41 / 61
Three Dimensional Scan Statistics Model and Approximations Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 42 / 61
Three Dimensional Scan Statistics Model and Approximations Introducing the Model T 3 X ijk T 2 T 1 Let T 1, T 2, T 3 be positive integers Rectangular region R = [0, T 1 ] [0, T 2 ] [0, T 3 ] ( ) Xijk 1 i T 1 i.i.d. integer r.v.'s 1 j T 2 1 k T 3 Bernoulli(B(1, p)) Binomial(B(n, p)) Poisson(P(λ)) X ijk number of observed events in the elementary subregion r ijk = [i 1, i] [j 1, j] [k 1, k] A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 43 / 61
Three Dimensional Scan Statistics Model and Approximations Dening the Scan Statistic Let m 1, m 2, m 3 be positive integers Dene for 1 i j T j m j + 1, Y i1 i 2 i 3 = i 1 +m 1 1 i 2 +m 2 1 i 3 +m 3 1 X ijk i=i 1 j=i 2 k=i 3 T 3 m 3 Y i1 i 2 i 3 The three dimensional scan statistic, S m1,m 2,m 3 = max 1 i 1 T 1 m 1 +1 1 i 2 T 2 m 2 +1 1 i 3 T 3 m 3 +1 Y i1 i 2 i 3. m 2 T T 1 2 m 1 Used for testing the null hypotheses of randomness against the alternative hypothesis of clustering A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 44 / 61
Three Dimensional Scan Statistics Model and Approximations Product Type Approximation Glaz et al. proposed in the product type approximation P ( S m 1,m 2,m 3 k) Q(m1 + 1, m2 + 1, m3 + 1) (T 1 m 1 )(T 2 m 2 )(T 3 m 3 ) Q(m1 + 1, m2, m3) (T 1 m 1 )(T 2 m 2 1)(T 3 m 3 1) Q(m1, m2, m3) (T 1 m 1 1)(T 2 m 2 1)(T 3 m 3 1) Q(m1 + 1, m2 + 1, m3) (T 1 m 1 )(T 2 m 2 )(T 3 m 3 1) Q(m1, m2 + 1, m3) (T 1 m 1 1)(T 2 m 2 )(T 3 m 3 1) Q(m1, m2, m3 + 1) (T 1 m 1 1)(T 2 m 2 1)(T 3 m 3 1) Q(m1 + 1, m2, m3 + 1) (T 1 m 1)(T 2 m 2 1)(T 3 m 3) Q(m1, m2 + 1, m3 + 1) (T 1 m 1 1)(T 2 m 2 )(T 3 m 3) Where, Q(N 1, N 2, N 3 ) = P (S m1,m 2,m 3 (N 1, N 2, N 3 ) k) The approximation also works for binomial and Poisson distribution Three Poisson Type Approximation A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 45 / 61
Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Writing the Scan as an Extreme of 1-Dependent R.V.'s m3 m2 m1 Let L j = T j m j, j {1, 2, 3} positive integers Dene for l {1, 2,..., L 3 1} Z l = max Y i1 i 2 i 3 1 i 1 (L 1 1)m 1 +1 1 i 2 (L 2 1)m 2 +1 (l 1)m 3 +1 i 3 lm 3 +1 Z3 Z2 Z1 T 3 (Z l ) l is 1-dependent and stationary Observe S m1,m 2,m 3 = max 1 l L 3 1 Z l T 2 T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 46 / 61
Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: First Step Approximation 2m3 R T 3 T 2 T 1 Q 2 Q 3 T 2 T 1 3m3 T 2 T 1 Dene Q 2 = P(Z 1 k) Q 3 = P(Z 1 k, Z 2 k) If 1 Q 2 α 1 < 0.1 the (rst) approximation P(S k) 2Q2 Q3 [1+Q2 Q3+2(Q2 Q3) 2 ] L 3 1 where S = S m1,m 2,m 3 (T 1, T 2, T 3 ) Approximation error (L 3 1)E(α 1, L 3 1)(1 Q 2 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 47 / 61
Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Approximation for Q 2 and Q 3 Q 2 : For l {1, 2,..., L 2 1} Z (2) = max Y l i 1i2i3 1 i1 (L1 1)m1+1 (l 1)m2+1 i2 lm2+1 1 i3 m3+1 ( ) Q 2 = P max Z (2) n 1 l L2 1 l Dene Q22 = P(Z (2) 1 n) Q 32 = P(Z (2) 1 n, Z (2) 2 n) Approximation (1 q 22 α 2 ) Q 2 2Q22 Q32 [1+Q22 Q32+2(Q22 Q32) 2 ] L 2 1 (L 2 1)E(α 2, L 2 1)(1 Q 22 ) 2 Q 3 : For l {1, 2,..., L 2 1} Z (3) = max Y l i 1i2i3 1 i1 (L1 1)m1+1 (l 1)m2+1 i2 lm2+1 1 i3 2m3+1 ( ) Q 3 = P max Z (3) n 1 l L2 1 l Dene Q23 = P(Z (3) 1 n) Q 33 = P(Z (3) 1 n, Z (3) 2 n) Approximation (1 q 23 α 2 ) Q 3 2Q23 Q33 [1+Q23 Q33+2(Q23 Q33) 2 ] L 2 1 (L 2 1)E(α 2, L 2 1)(1 Q 23 ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 48 / 61
Three Dimensional Scan Statistics Model and Approximations Haiman Type Approximation: Last Step (Approximating Q ts ) Applying again the second part of the Main Theorem... For s, t {2, 3} and j {1, 2,..., L 1 1} dene Observe Z (ts) j = max (j 1)m1+1 i1 jm1+1 1 i2 (t 1)m2+1 1 i3 (s 1)m3+1 Y i 1i2i3 ( ) Q ts = P max Z (ts) n 1 j L1 1 j ( Dene for r, s, t {2, 3}, Q rts = P Q ts r 1 j=1 ) j n} {Z (ts) If 1 Q 2ts α 3 then the approximation and the error 2Q2ts Q3ts (L [1+Q2ts Q3ts +2(Q2ts Q3ts ) 2 ] L 1 1 1 1)E(α 3, L 1 1)(1 Q 2ts ) 2 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 49 / 61
Three Dimensional Scan Statistics Model and Approximations An Illustration of the Approximation Chain (Q 2 ) Q 2 2m3 Q 22 T2 T1 Q 32 2m3 2m3 2m2 T1 3m2 T1 Q 222 Q322 Q 232 Q 332 2m3 2m3 2m3 2m3 2m2 2m1 2m2 3m1 3m2 2m1 3m2 3m1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 50 / 61
Three Dimensional Scan Statistics Outline 1 Introduction Example 2 One Dimensional Scan Statistics Denitions and Notations Exact Formula by MCIT (Bernoulli Case) 3 Two Dimensional Scan Statistics Model 4 Three Dimensional Scan Statistics Model and Approximations 5 References A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 51 / 61
Three Dimensional Scan Statistics Bernoulli Case Comparing with existing results: Table 5 : n = 1, p = 0.00005, m 1 = m 2 = m 3 = 5, T 1 = T 2 = T 3 = 60, It App = 10 5 k ˆP(S k) Glaz et al. Our Approximation Simulation Total Product type Approximation Error Error Error 1 0.841806 0.841424 0.851076 0.011849 0.064889 0.076738 2 0.999119 0.999142 0.999192 0.000000 0.000170 0.000170 3 0.999997 0.999998 0.999997 0.000000 3 10 7 3 10 7 Table 6 : n = 1, p = 0.0001, m 1 = m 2 = m 3 = 5, T 1 = T 2 = T 3 = 60, It App = 10 5 k ˆP(S k) Glaz et al. Our Approximation Simulation Total Product type Approximation Error Error Error 2 0.993294 0.993241 0.993192 0.000010 0.001367 0.001377 3 0.999963 0.999964 0.999963 0.000000 0.000005 0.000005 4 0.999999 0.999999 0.999999 0.000000 2 10 9 2 10 9 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 52 / 61
References Glaz, J., Naus, J., Wallenstein, S.: Scan statistic. Springer (2001). Glaz, J., Pozdnyakov, V., Wallenstein, S.: Scan statistic: Methods and Applications. Birkhauser (2009). Amarioarei, A.: Approximation for the distribution of extremes of one dependent stationary sequences of random variables, arxiv:1211.5456v1 (submitted) Amarioarei, A., Preda, C.: Approximation for the distribution of three dimensional discrete scan statistic, Methodol Comput Appl Probab (2013) DOI:10.1007/s11009-013-9382-3. Amarioarei, A., Preda, C.: Approximation for two dimensional discrete scan statistics in some block-factor type dependent models, arxiv:1401.2822 (submitted) Boutsikas, M.V., Koutras, M.: Reliability approximations for Markov chain imbeddable systems. Methodol Comput Appl Probab 2 (2000), 393412. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 53 / 61
References Boutsikas, M. and Koutras, M. Bounds for the distribution of two dimensional binary scan statistics, Probability in the Engineering and Information Sciences, 17, 509525, 2003. Chen, J. and Glaz, J. Two-dimensional discrete scan statistics, Statistics and Probability Letters 31, 5968, 1996. Glaz, J., Guerriero, M., Sen, R.: Approximations for three dimensional scan statistic. Methodol Comput Appl Probab 12 (2010), 731747. Haiman, G.: First passage time for some stationary sequence. Stoch Proc Appl 80 (1999), 231248. Haiman, G.: Estimating the distribution of scan statistics with high precision. Extremes 3 (2000), 349361. Haiman, G., Preda, C.: A new method for estimating the distribution of scan statistics for a two-dimensional Poisson process. Methodol Comput Appl Probab 4 (2002), 393407. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 54 / 61
References Haiman, G., Preda, C.: Estimation for the distribution of two-dimensional scan statistics. Methodol Comput Appl Probab 8 (2006), 373381. Haiman, G.: Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary sequences. J. Stat Plan Infer 137 (2007), 821828. Kuai, H., Alajaji, F., Takahara, G.: A lower bound on the probability of a nite union of events. Discrete Mathematics 215 (2000), 147158. A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 55 / 61
Block-Factor Type Model Introducing the Model Let 1 c s Ts, s {1, 2} integers ( ) Xij i.i.d. r.v.'s 1 i T 1 1 j T 2 conguration matrix in (i, j) C (i,j) = ( C (i,j) (k, l) ) 1 k c 2 1 l c 1 C (i,j) (k, l) = Xi+l 1,j+c2 k transformation Π : M c2,c 1 (R) R X i,j+c 2 1.... X i,j... X i,j Π Xi +c 1 1,j+c 2 1. X i +c 1 1,j Dene the block-factor model, T 1 = T1 c 1 + 1, T 2 = T2 c 2 + 1 X i,j = Π ( C (i,j) ), 1 i T 1 1 j T 2 Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 56 / 61
Block-Factor Type Model Model: case c 1 = c 2 = 3 To simplify the presentation we consider c 1 = c 2 = 3 X i,j+2 X i+1,j+2 X i+2,j+2 X i,j+1 X i+1,j+1 X i+2,j+1 T 2 X i,j X i+1,j X i+2,j X i,j T 2. j. 1 Π. j. 1 1 i T 1 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 57 / 61
Block-Factor Type Model Model: case c 1 = c 2 = 3 To simplify the presentation we consider c 1 = c 2 = 3 X i,j+2 X i+1,j+2 X i+2,j+2 X i,j+1 X i+1,j+1 X i+2,j+1 T 2 X i,j X i+1,j X i+2,j X i,j T 2. j. 1 Π. j. 1 1 i T 1 1 i T 1 A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 57 / 61
Block-Factor Type Model Example: A game of minesweeper Model: X i,j B(p) (presence, absence of a mine) number of neighboring mines T (C (i,j) ) = (s,t) {0,1,2} 2 (s,t) (1,1) X i,j = Π ( C (i,j) ) X i+s,j+t Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 58 / 61
Block-Factor Type Model Computation of Q(m 1 + 1, m 2 ) and Q(m 1 + 1, m 2 + 1) Consider X ij B(n, p) and the notation Y j 1,j2 i1,i2 = j1 i1=1 i2=1 j2 X ij, P ( S m 1,m 2 (m 1 + 1, m2) k ) k (m 1 1)m 2 n = P 2 ( Y 1,m ) ( 2 1,1 k y P Y m 1,m ) 2 2,1 = y y=0 P ( S m 1,m 2 (m 1 + 1, m2 + 1) k ) = k (m 1 1)(m 2 1)n y 1 =0 [k y 1 y 2 y 3 ] (m 1 1)n y 4 =0 ( P Y 1,m 2 +1 ) 1,m 2 +1 a 2 ( P Y m 1,m ) 2 2,2 = y1 P ( P Y m 1,1 ) 2,1 = y4 P (k y 1 ) (m 2 1)n y 2 =0 (k y 1 ) (m 2 1)n y 3 =0 [k y 1 y 2 y 3 ] (m 1 1)n ( P Y m 1 +1,1 ( P Y1,1 1,1 ) a 1 y 5 =0 ) ( m 1 +1,1 a 3 P Y m 1 +1,m 2 +1 ) m 1 +1,m 2 +1 a 4 ) ( P Y m 1 +1,m ) m 2 1 +1,2 = y3 ) ( Y 1,m 2 1,2 = y2 ( Y m 1,m 2 +1 2,m 2 +1 = y5 a1 = k y1 y2 y4, a2 = k y1 y2 y5, a3 = k y1 y3 y4, a4 = k y1 y3 y5, Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 59 / 61
Block-Factor Type Model Karwe Naus recursive methods Dene We have the recurrence relations b k 2(m) (y) = P (Sm(2m) k, Y m+1(m) = y) f (y) = P(X1 = y) Q2m k = P (Sm(2m) k) k b k 2(1) (y) = f (j) f (y) j=0 y b k 2(m) (y) = η=0 ν=0 k y+η k Q2m k = b k 2(m) (y) y=0 k Q2m 1 k = f (x)q k x 2(m 1) x=0 b k ν (y η)f (ν)f (η) 2(m 1) Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 60 / 61
Block-Factor Type Model Selected Values for K(α) and Γ(α) α K(α) Γ(α) 0.1 38.63 480.69 0.05 21.28 180.53 0.025 17.56 145.20 0.01 15.92 131.43 Table 7 : Selected values for K(α) and Γ(α) Return A. Am rioarei (Lab. P. Painlevé) Scan Statistics Lille 2014 61 / 61