A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

Size: px

Start display at page:

Download "A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data"

Constance Harmon
6 years ago
Views:

1 A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data David S. Matteson Department of Statistical Science Cornell University Joint work with: Nicholas A. James, ORIE, Cornell University Sponsorship: National Science Foundation 2014 October David S. Matteson Change Point Analysis 2014 October 1 / 40

2 Introduction Change Point Analysis The process of detecting distributional changes within time ordered data Framework: Retrospective, offline analysis Multivariate observations Estimation: number of change points and their positions Hierarchical algorithms Applications: Genetics Finance Emergency Medical Services David S. Matteson Change Point Analysis 2014 October 2 / 40

3 Introduction Change Point Analysis Given independent, time ordered observations X 1, X 2,..., X n R d Partition into k homogeneous, temporally contiguous subsets k is unknown Size of each subset is unknown David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

4 Introduction Change Point Analysis Given independent, time ordered observations X 1, X 2,..., X n R d Partition into k homogeneous, temporally contiguous subsets k is unknown Size of each subset is unknown David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

5 Introduction Change Point Analysis Given independent, time ordered observations X 1, X 2,..., X n R d Partition into k homogeneous, temporally contiguous subsets k is unknown Size of each subset is unknown David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

6 Introduction Change Point Analysis Given independent, time ordered observations X 1, X 2,..., X n R d Partition into k homogeneous, temporally contiguous subsets k is unknown Size of each subset is unknown David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 3 / 40

7 Cluster Analysis Cluster Analysis Change point analysis is similar to cluster analysis In cluster analysis we also wish to partition the observations into homogeneous subsets Subsets may not be contiguous in time without some constraints David S. Matteson Change Point Analysis 2014 October 4 / 40

Cluster Analysis Cluster Analysis Change point analysis is similar to cluster analysis In cluster analysis we also wish to partition the observations into

8 Cluster Analysis Cluster Analysis Change point analysis is similar to cluster analysis In cluster analysis we also wish to partition the observations into homogeneous subsets Subsets may not be contiguous in time without some constraints David S. Matteson Change Point Analysis 2014 October 4 / 40

9 Cluster Analysis Cluster Analysis Change point analysis is similar to cluster analysis In cluster analysis we also wish to partition the observations into homogeneous subsets Subsets may not be contiguous in time without some constraints David S. Matteson Change Point Analysis 2014 October 4 / 40

10 Hierarchical Estimation Hierarchical Estimation Apply methods from clustering to find change points Exhaustive search is not practical: O(n k ), in general. May consider Dynamic Programming We use a hierarchical or sequential approach: O(kn 2 ) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

11 Hierarchical Estimation Hierarchical Estimation Apply methods from clustering to find change points Exhaustive search is not practical: O(n k ), in general. May consider Dynamic Programming We use a hierarchical or sequential approach: O(kn 2 ) Divisive: Clusters are divided until each observation is its own cluster avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

12 Hierarchical Estimation Hierarchical Estimation Apply methods from clustering to find change points Exhaustive search is not practical: O(n k ), in general. May consider Dynamic Programming We use a hierarchical or sequential approach: O(kn 2 ) Divisive: Clusters are divided until each observation is its own cluster Agglomerative: Clusters are merged until all observations belong to a single cluster avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 5 / 40

13 Hierarchical Estimation Hierarchical Estimation: Divisive Progression David S. Matteson Change Point Analysis 2014 October 6 / 40

14 Hierarchical Estimation Hierarchical Estimation: Divisive Progression David S. Matteson Change Point Analysis 2014 October 6 / 40

15 Hierarchical Estimation Hierarchical Estimation: Divisive Progression David S. Matteson Change Point Analysis 2014 October 6 / 40

16 Hierarchical Estimation Hierarchical Estimation: Agglomerative Progression David S. Matteson Change Point Analysis 2014 October 7 / 40

17 Hierarchical Estimation Hierarchical Estimation: Agglomerative Progression David S. Matteson Change Point Analysis 2014 October 7 / 40

18 Hierarchical Estimation Hierarchical Estimation: Agglomerative Progression David S. Matteson Change Point Analysis 2014 October 7 / 40

19 Hierarchical Estimation Hierarchical Estimation: Agglomerative Progression David S. Matteson Change Point Analysis 2014 October 7 / 40

20 Multivariate Homogeneity Measuring Multivariate Homogeneity Suppose X, Y R d with X F x Y F y Let φ x (t) = E ( e i t,x ) and φ y (t) = E ( e i t,y ) characteristic functions Define a divergence between F x and F y as E(X, Y; w) = φ x (t) φ y (t) 2 w(t) dt, R d w(t) denotes an arbitrary positive weight function, for which E exists avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 8 / 40

21 Multivariate Homogeneity A Weight Function A convenient choice for w(t) > 0 (Székely and Rizzo, 2005): w(t; α) = in which Γ(x) is the gamma function ( ) 1 2π d/2 Γ(1 α/2) α2 α Γ((d + α)/2) t d+α Note: for any fixed (d, α), w(t; α) t (d+α) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 9 / 40

22 Multivariate Homogeneity Equivalent Divergence Measures Let X and Y be independent, and (X, Y ) be an iid copy of (X, Y) Theorem Suppose that E( X α + Y α ) <, for some α (0, 2], then E(X, Y; α) = R d φ x (t) φ y (t) 2 ( ) 1 2π d/2 Γ(1 α/2) α2 α Γ((d + α)/2) t d+α dt = 2E X Y α E X X α E Y Y α < If 0 < α < 2 then E(X, Y; α) = 0 if and only if X and Y are identically distributed If α = 2 then E(X, Y; α) = 0 if and only if EX = EY David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 10 / 40

23 Multivariate Homogeneity An Empirical Measure (U-statistics) Let X n = {X i : i = 1,..., n} and Y m = {Y j : j = 1,..., m} be independent iid samples from the distribution of X, Y R d, respectively, such that E X α, E Y α < for some α (0, 2) Define Ê(X n, Y m ; α) = 2 mn n i=1 j=1 m X i Y j α ( ) 1 n X i X k α 2 1 i<k n ( ) 1 m Y j Y k α 2 1 j<k m and Q(X n, Y m ; α) = mn m + n Ê(X n, Y m ; α) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 11 / 40

24 Multivariate Homogeneity Known Location: Two-Sample Homogeneity Test By strong law of large number for U-statistics Hoeffding (1961) almost surely, as min(m, n). Ê(X n, Y m ; α) E(X, Y ; α) avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

25 Multivariate Homogeneity Known Location: Two-Sample Homogeneity Test By strong law of large number for U-statistics Hoeffding (1961) almost surely, as min(m, n). Ê(X n, Y m ; α) E(X, Y ; α) Under the null hypothesis of equal distributions, i.e. E(X, Y ; α) = 0, Q(X n, Y m ; α) Q(X, Y ; α) = λ i Q i in distribution, as min(m, n). Here, the λ i > 0 are constants that depend on α and the distributions of X and Y, and the Q i are iid χ 2 1, see Rizzo and Székely (2010). i=1 avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

26 Multivariate Homogeneity Known Location: Two-Sample Homogeneity Test By strong law of large number for U-statistics Hoeffding (1961) almost surely, as min(m, n). Ê(X n, Y m ; α) E(X, Y ; α) Under the null hypothesis of equal distributions, i.e. E(X, Y ; α) = 0, Q(X n, Y m ; α) Q(X, Y ; α) = λ i Q i in distribution, as min(m, n). Here, the λ i > 0 are constants that depend on α and the distributions of X and Y, and the Q i are iid χ 2 1, see Rizzo and Székely (2010). Under alternative hypothesis of unequal distributions, i.e. E(X, Y ; α) > 0, i=1 Q(X n, Y m ; α) a.s. as min(m, n). avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 12 / 40

27 Single Change Point Single Change Point: Unknown Location Let Z 1,..., Z T R d be an independent sequence. Suppose heterogeneous sample with observations from two distributions. Let γ (0, 1) denote the division of observations, such that Z 1,..., Z γt F x and Z γt +1,..., Z T F y for every sample of size T. avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

28 Single Change Point Single Change Point: Unknown Location Let Z 1,..., Z T R d be an independent sequence. Suppose heterogeneous sample with observations from two distributions. Let γ (0, 1) denote the division of observations, such that Z 1,..., Z γt F x and Z γt +1,..., Z T F y for every sample of size T. Define X τ = {Z 1, Z 2,..., Z τ } and Y τ = {Z τ+1, Z τ+2,..., Z T }. A change point location ˆτ T is then estimated as ˆτ T = argmax τ Q T (X τ, Y τ ; α). avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

29 Single Change Point Single Change Point: Unknown Location Let Z 1,..., Z T R d be an independent sequence. Suppose heterogeneous sample with observations from two distributions. Let γ (0, 1) denote the division of observations, such that Z 1,..., Z γt F x and Z γt +1,..., Z T F y for every sample of size T. Define X τ = {Z 1, Z 2,..., Z τ } and Y τ = {Z τ+1, Z τ+2,..., Z T }. A change point location ˆτ T is then estimated as Theorem ˆτ T = argmax τ Q T (X τ, Y τ ; α). If E(X, Y ; α) < and γ (0, 1), then ˆτ T /T a.s. γ, as T. David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 13 / 40

30 Multiple Change Points Multiple Change Points: Unknown Locations A generalized bisection approach for sequential estimation For 1 τ < κ T, define: X τ = {Z 1, Z 2,..., Z τ } and Y τ (κ) = {Z τ+1, Z τ+2,..., Z κ } A change point location ˆτ is then estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(X τ, Y τ (κ); α). avid S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 14 / 40

31 Multiple Change Points Sequentially Estimating Multiple Change Points Suppose k 1 change points have been estimated: ˆτ 1 < < ˆτ k 1 This partitions the observations into k clusters Ĉ1, Ĉ2,..., Ĉk David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

32 Multiple Change Points Sequentially Estimating Multiple Change Points Suppose k 1 change points have been estimated: ˆτ 1 < < ˆτ k 1 This partitions the observations into k clusters Ĉ1, Ĉ2,..., Ĉk Given these clusters, we then apply the single change point procedure within each of the k clusters. For ith cluster Ĉi, denote proposed change point location ˆτ(i), and the associated constant ˆκ(i) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

33 Multiple Change Points Sequentially Estimating Multiple Change Points Suppose k 1 change points have been estimated: ˆτ 1 < < ˆτ k 1 This partitions the observations into k clusters Ĉ1, Ĉ2,..., Ĉk Given these clusters, we then apply the single change point procedure within each of the k clusters. For ith cluster Ĉi, denote proposed change point location ˆτ(i), and the associated constant ˆκ(i) Now let i = argmax i {1,...,k} ˆQ[Xˆτ(i), Yˆτ(i) (ˆκ(i)); α], in which Xˆτ(i) and Yˆτ(i) (ˆκ(i)) are defined with respect to Ĉi Denote test statistic as ˆq k = ˆQ(Xˆτk, Yˆτk (ˆκ k ); α), ˆτ k = ˆτ(i ) is kth estimated change point, located within cluster C i David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 15 / 40

34 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

35 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

36 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

37 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

38 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

39 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

40 The E-Divisive Algorithm Estimation The E-Divisive Algorithm: Estimating Location A τ = {Z 1, Z 2,..., Z τ } and B τ (κ) = {Z τ+1, Z τ+2,..., Z κ } Recall, a change point location ˆτ is estimated as (ˆτ, ˆκ) = argmax (τ,κ) Q(A τ, B τ (κ); α) Thus, we maximize n+mê(a, mn B; α) for all subsets A and B: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 16 / 40

41 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

42 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

43 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

44 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

45 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

46 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed

47 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Inference via Permutation Test Distribution of test statistic ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is unknown Significance of proposed change point measured via permutation test Randomly permute series, maximize n+mê(a, mn B; α), record and repeat: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 17 / 40

48 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

49 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

50 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

51 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

52 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

53 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

54 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

55 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

56 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

57 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

58 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

59 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points If ˆq = Q(A τ, B τ (κ); α) τ=ˆτ is insignificant: STOP If significant, condition on location, and repeat within clusters: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 18 / 40

60 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

61 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

62 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

63 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

64 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

65 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

66 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

67 The E-Divisive Algorithm Inference The E-Divisive Algorithm: Multiple Change Points Once again, perform permutation test However, only permute within each cluster: David S. Matteson Change Point Analysis 2014 October 19 / 40

68 The E-Divisive Algorithm ecp Package The ecp R package (CRAN) Signature: e.divisive(x, sig.lvl=0.05, R=199, k=null, min.size=30, alpha=1) Arguments: X - A T d matrix representation of a length T time series, with d-dimensional observations. sig.lvl - The significance level used for the permutation test. R - The maximum number of permutations to perform in the permutation test. k - The number of change points to return. If this is NULL only the statistically significant estimated change points are returned. min.size - The minimum number of observations btw change points. alpha - The index for test statistic. David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 20 / 40

69 The E-Divisive Algorithm ecp Package Complexity is O(kT 2 ) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 21 / 40 The ecp R package (CRAN) Returned list: k.hat - Number of clusters created by the estimated change points. order.found - The order in which the change points were estimated. estimates - Locations of the statistically significant change points. considered.last - Location of the last change point, that was not found to be statistically significant at the given significance level. permutations - The number of permutations performed by each of the sequential permutation test. cluster - The estimated cluster membership vector. p.values - Approximate p-values estimated from each permutation test.

70 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

71 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) For two partitions U & V, the Rand Index considers all pairs of observations: David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

72 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) For two partitions U & V, the Rand Index considers all pairs of observations: Define {A} Pairs in same cluster under U and in same cluster under V David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

73 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) For two partitions U & V, the Rand Index considers all pairs of observations: Define {A} Pairs in same cluster under U and in same cluster under V {B} Pairs in different cluster under U and in different cluster under V David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

74 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) For two partitions U & V, the Rand Index considers all pairs of observations: Define {A} Pairs in same cluster under U and in same cluster under V {B} Pairs in different cluster under U and in different cluster under V Rand index = #A + #B ( T 2) An equivalent definition of the Rand index can be found in Hubert and Arabie (1985) David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

75 Simulation Simulation Study: Rand Index Compare E-Divisive with a generalized Wilcoxon/MannWhitney approach: the MultiRank procedure Lung-Yut-Fong et al. (2011) For two partitions U & V, the Rand Index considers all pairs of observations: Define {A} Pairs in same cluster under U and in same cluster under V {B} Pairs in different cluster under U and in different cluster under V Rand index = #A + #B ( T 2) An equivalent definition of the Rand index can be found in Hubert and Arabie (1985) Adjusted Rand = Index Expected Index Rand Expected Rand = Max Index Expected Index 1 Expected Rand David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 22 / 40

76 Simulation A change in variance for univariate normal data Method Correct k Average Adjusted Rand MultiRank 22/ E-Divisive 95/ David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 23 / 40

77 Simulation A change in correlation for bivariate normal data Method Correct k Average Adjused Rand MultiRank 72/ E-Divisive 92/ David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 24 / 40

78 Simulation 1,000 simulations, 2 CP: N(0,1), N(µ,1), N(0,1) Average Rand Average Adj. Rand T µ MultiRank E-Divisive MultiRank E-Divisive David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 25 / 40

79 Simulation 1,000 simulations, 2 CP: N(0,1), N(0, σ 2 ), N(0,1) Average Rand Average Adj. Rand T σ 2 MultiRank E-Divisive MultiRank E-Divisive David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 26 / 40

80 Simulation 1,000 simulations, 2 CP: N(0,1), t ν (0, 1), N(0,1) Average Rand Average Adj. Rand T ν MultiRank E-Divisive MultiRank E-Divisive David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 27 / 40

81 Simulation 1,000 simulations, 2 CP: N 2 (0, I ), N 2 (µ, I ), N 2 (0, I ) Average Rand Average Adj. Rand T µ MultiRank E-Divisive MultiRank E-Divisive David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 28 / 40

82 Simulation 1,000 simulations, 2 CP: N 2 (0, Σ), N 2 (0, I ), N 2 (0, Σ) ( ) 1 ρ Σ = ρ 1 Average Rand Average Adj. Rand T ρ MultiRank E-Divisive MultiRank E-Divisive David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 29 / 40

83 Simulation 1,000 simulations, 2 CP: N d (0, Σ), N d (0, I ), N d (0, Σ) ρ ρ ρ ρ 1 ρ ρ Σ w.o./noise = ρ ρ 1 ρ C.. A ρ ρ ρ ρ 0 0 ρ Σ w/noise = C.. A Without Noise With Noise T d Avg. Rand Avg. Adj. Rand Avg. Rand Avg. Adj. Rand David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 30 / 40

84 Applications Genetics Genetics Data We applied E-divisive to the acgh mico-array dataset of 43 individuals with a bladder tumor (Bleakley and Vert, 2011); relative hybridization intensity profile for one individual. MultiRank (Lung-Yut-Fong et al., 2011) ˆk = 17 adjrand = KCPA (Arlot et al., 2012) ˆk = 41 adjrand = PELT (Killick et al., 2012) ˆk = 47 adjrand = MultiRank Signal Index KCPA Signal Index PELT Signal Index E Divisive Signal Index David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 31 / 40

85 Applications Finance Financial Data: Cisco Systems The E-divisive procedure was applied to the monthly log returns of the Dow 30 Marginal analysis of Cisco Systems Inc. from April 1990 to January The procedure found change points at April 2000 and October David S. Matteson Change Point Analysis 2014 October 32 / 40

86 Applications Finance Financial Data: Cisco Systems Marginal analysis of Cisco Systems Inc. from April 1990 to January The procedure found change points at April 2000 and October David S. Matteson Change Point Analysis 2014 October 33 / 40

87 Applications Finance Financial Data: S&P 500 Index S&P 500: May 20, 1999 April 25, 2011 log returns Date David S. Matteson Change Point Analysis 2014 October 34 / 40

88 Agglomerative Algorithm An Agglomerative Algorithm Given a partition of k clusters C = {C 1, C 2,..., C k }, clusters may or may not be single observations Consider combining a pair of adjacent clusters The partition that maximizes the goodness-of-fit statistic determines change point locations David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 35 / 40

89 Agglomerative Algorithm An Agglomerative Algorithm: Goodness-of-Fit Goodness-of-fit statistic S(k): sum the E-distances between adjacent clusters Given clusters C = {C 1, C 2,..., C k } with n i = #C i, define k 1 ( ) ni n i+1 S(k) = Ên α n i + n i,n i+1 (C i, C i+1 ), i+1 i=1 David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 36 / 40

90 Agglomerative Algorithm An Agglomerative Algorithm The partitioning which maximized S(k) is then used to estimate change point locations. Figure: Progression of the goodness of fit statistic, and where it is maximized. David S. Matteson Change Point Analysis 2014 October 37 / 40

91 Agglomerative Algorithm Application: EMS EMS Priority One Response for Toronto 2007 David S. Matteson Change Point Analysis 2014 October 38 / 40

92 Agglomerative Algorithm Application: EMS EMS Priority One Response for Toronto 2007 David S. Matteson Change Point Analysis 2014 October 39 / 40

93 Bibliography Bibliography matteson/ Bleakley, K., and Vert, J.-P. (2011), The group fused Lasso for multiple change-point detection,, Technical Report HAL , Bioinformatics Center (CBIO). Hoeffding, W. (1961), The Strong Law of Large Numbers for U-Statistics,, Technical Report 302, North Carolina State University. Dept. of Statistics. Hubert, L., and Arabie, P. (1985), Comparing Partitions, Journal of Classification, 2(1), James, N. A., and Matteson, D. S. (2013), ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data, arxiv: ,. Lung-Yut-Fong, A., Lévy-Leduc, C., and Cappé, O. (2011), Homogeneity and change-point detection tests for multivariate data using rank statistics,. Matteson, D. S., and James, N. A. (2013), A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data, Journal of the American Statistical Association, To Appear. Rizzo, M. L., and Székely, G. J. (2010), Disco Analysis: A Nonparametric Extension of Analysis of Variance, The Annals of Applied Statistics, 4(2), Székely, G. J., and Rizzo, M. L. (2005), Hierarchical Clustering via Joint Between-Within Distances: Extending Ward s Minimum Variance Method, Journal of Classification, 22(2), David S. Matteson (matteson@cornell.edu) Change Point Analysis 2014 October 40 / 40

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data David S. Matteson and Nicholas A. James Cornell University April 30, 2013 Abstract Change point analysis has applications