Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames

Size: px

Start display at page:

Download "Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames"

Cleopatra Holmes
5 years ago
Views:

1 Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames Giacomo Zanella Department of Statistics University of Warwick, Coventry, UK 20 May 2014

2 Overview 1. Motivation: historical problem (Anglo-Saxon placenames). 2. Modeling: complementary clustering with random partition models. 3. Computation: MCMC on space of matchings (Data Association problems). 4. Real data: mild support to the historians hypothesis.

3 A classic problem: Cluster Analysis Aim: organizing objects into groups whose members are similar. Geometrical interpretation: separate points into clusters made of close points. Figure : A point pattern being divided into two clusters. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

Figure : Reconstruction of an Anglo-Saxon settlement in West-Stow, Suffolk.

4 Our clustering problem: original motivation Problem posed by John Blair (History Professor from Oxford). Figure : Reconstruction of an Anglo-Saxon settlement in West-Stow, Suffolk. (Image borrowed from John Blair) Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

5 Empirical Observations of cluster of settlements Stretton, Newton, Burton, Carlton in the region of G t.glen

6 Empirical Observations of cluster of settlements Stretton, Newton, Burton, Carlton in the region of G t.glen Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

7 Problem considered The model MCMC Real Data Recurrent Clustering Pattern Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

8 Problem considered The model MCMC Real Data Recurrent Clustering Pattern Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

9 Problem considered The model MCMC Real Data Recurrent Clustering Pattern Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

10 Problem considered The model MCMC Real Data Recurrent Clustering Pattern Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

11 Problem considered The model MCMC Real Data Recurrent Clustering Pattern Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

12 Historians hypothesis Small groups of settlements (2-6 in each cluster?) Closely located (3-10 km?) Each placename corresponds to a function (e.g. Kingston= Police station ) and can appear at most once in each cluster and. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

13 Historians hypothesis Small groups of settlements (2-6 in each cluster?) Closely located (3-10 km?) Each placename corresponds to a function (e.g. Kingston= Police station ) and can appear at most once in each cluster and. Does the geographical distribution of placenames support this? Can we provide information about: Cluster sizes and average intra-cluster distance σ? Which placenames tends to cluster together? Explicit inference on the cluster partition. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

14 The data as a multi-type point process Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

15 The data as a multi-type point process Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

16 The data as a multi-type point process Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

17 Model requirements 1: Complementary clustering Classical clustering for marked p.p.: clusters made of points close and similar marks (hence marks are an additional dimension). Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

18 Model requirements 1: Complementary clustering Complementary clustering: each placename (color) can occur at most once in each cluster. Similar to Data Association Problems Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

19 Model requirements 2: Inferences on the partition x = {x 1,..., x n } Observed points ρ = {C 1,..., C N } Unobserved partition Random Partition Model (RPM) 1. Exchangeable prior distribution π(ρ) 2. Conditional distribution of x ρ 3. Bayesian inferences on the partition: π(ρ x) π(ρ) π(x ρ) Figure : ρ = { {1, 3, 5}, {4, 6}, {2, 7} } Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

20 Data Generation Model Cluster centers Inhomogeneous Poisson Point Process Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

21 Data Generation Model Sizes of clusters: C j (p 1,..., p k ), where k is the number of colors. Locations: i.i.d. Gaussians conditioned on the cluster centers being their means Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

22 Observed Data Observed point process x is the superposition of all the clusters. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

23 Posterior distribution We are interested in the posterior distribution of ρ: π(ρ x) Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

24 Posterior distribution Random elements involved x ρ σ λ p i observed marked point process Partition of x intra-cluster dispersion expected number of clusters Pr[cluster size = i] π(ρ x, σ, p, λ) ( N(ρ) g(x Cj )λ p sj j=1 σ 2 x, ρ, p, λ InvGamma c sj ( exp δ2 C j ) ) 2σ 2 i,l C j, i l 1(m i m l ) ) ( α σ + n(x) N(ρ), β σ + N(ρ) p x, ρ, σ, λ Dir (α 1 + N 1 (ρ),..., α k + N k (ρ)) λ x, ρ, σ, p Gamma (k λ + N(ρ), θ λ /(θ λ + 1)) where c s = ( k s j ) sj (2π) s j 1 and δ 2 C j = i C j ( xi x Cj ) 2. j=1 δ2 C j /2 Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

25 Intractability of π (ρ) π(ρ) is known up to a normalizing constant. The sample space is of order n! (normalizing π(ρ) is NP-hard) Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

26 Intractability of π (ρ) π(ρ) is known up to a normalizing constant. The sample space is of order n! (normalizing π(ρ) is NP-hard) Finding ρ max = argmax ρ π(ρ) Optimal Assignation Problem 2-color: Solvable in O(n 3 ) with Hungarian Algorithm. k-color: NP-hard optimization problem. Not approximable in polynomial time with any deterministic algorithm (not in APX). Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

27 Feasible approach: Monte Carlo Markov Chains Simulate an ergodic Markov chain (X n ) n 0 with stationary distribution π. Estimate I = E π [f (X )] with Î n := 1 t+n f (X k ). n k=t Metropolis-Hasting Algorithm Obtain X n+1 from X n by 1. Sample the proposed move X Q(X n, ) 2. Compute the acceptance probability α(x n, X ) = min { } f (X )Q(X,Xn) 1, f (X n)q(x n,x ). 3. With probability α(x n, X ) set X n+1 = X, otherwise set X n+1 = X n Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

28 Designing the MH-algorithm: 2D case (i.e. 2 colors) Sample Space: partial matchings contained in a complete bipartite graph. partition ρ = { {1}, {2, 6}, {3}, {4, 7}, {5} } matching X (ρ) Target Measure: π (X (ρ)) e X w(e), for suitably defined edge weights. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

29 Proposal distribution Proposal distribution Q(X old, X new ) 1. Pick a red point i and a blue point j according to q(i, j). 2. Propose the corresponding move (add/remove/switch). Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

30 Proposal distribution vertices edges states of the MC moves allowed Question How to choose q(i, j) in order to achieve a good mixing? Remark: Optimal choice of q(i, j) Optimal scaling of the proposal Figure : Markov Chain represented by a graph Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

31 What is the best proposal distribution? 1) q(i, j) 1 {wi,j >ε} 2) q(i, j) π(x new ) 3) q(i, j) π(x new ) π(x old )+π(x new ) Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

32 Truncating the posterior distribution π(x ) 1) q(i, j) 1 {wi,j >ε} 2) Cheap version of 3) 3) q(i, j) π(x new ) π(x old )+π(x new ) Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

33 Some remarks on the previous MCMC Mixing problems? For p 1 0 (complete matching case), π(x ) shows an increasing multimodality (we used Simulated Tempering in those cases). Theoretical results? Jerrum and Sinclair [1996]: upper bound for the mixing time of an MCMC in a similar problem. Unfeasible in practice. Parallel computation? Developed a parallelizable multiple proposal scheme. Figure : Intensity of gray represents estimated probability of the link. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

34 General k dimensional case Sample Space: partial matchings (i.e. hypergraphs of degree at most 1) contained in a k-partite complete hypergraphs.

35 General k dimensional case Sample Space: partial matchings (i.e. hypergraphs of degree at most 1) contained in a k-partite complete hypergraphs. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

36 kd Algorithm kd-algorithm Obtain ρ new from ρ old as follows 1) Sample k 2 colors u.a.r.; 2) Evaluate (x 2D, ρ 2D old ); 3) Obtain ρ 2D new using 2D moves; 4) Obtain ρ new from ρ 2D new. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

37 Problem considered The model MCMC Real Data Real Dataset Placenames Aston/Easton Bolton Burh-Stall Burton Centres Charlton/Charlcot Chesterton Claeg Draycot/Drayton Eaton Kingston Knighton Newbold Newton Norton Stratton Sutton Tot Walton/Walcot Weston Total number of settlements Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

38 Fitting our model to the data Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

39 Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

40 Acknowledgments Prof. Wilfrid Kendall for PhD supervision. Prof. John Blair for collaboration and arranging supply of data. CRiSM and EPSRC for funding. Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

41 Thank you Giacomo Zanella (University of Warwick) Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames 20/05/ / 26

BAYESIAN COMPLEMENTARY CLUSTERING, MCMC AND ANGLO-SAXON PLACENAMES. By Giacomo Zanella University of Warwick

Submitted to the Annals of Applied Statistics BAYESIAN COMPLEMENTARY CLUSTERING, MCMC AND ANGLO-SAXON PLACENAMES By Giacomo Zanella University of Warwick Common cluster models for multi-type point processes