Gaussian mean-shift algorithms

Size: px

Start display at page:

Download "Gaussian mean-shift algorithms"

Magdalene Johnson
6 years ago
Views:

1 Gaussian mean-shift algorithms Miguel Á. Carreira-Perpiñán Dept. of Computer Science & Electrical Engineering, OGI/OHSU

2 Gaussian mean-shift (GMS) as an EM algorithm Gaussian mean-shift (GMS) as an EM algorithm Acceleration strategies for GMS image segmentation Gaussian blurring mean-shift (GBMS) p.

3 Gaussian mixtures, kernel density estimates Given a dataset X = {x n } N n= R D, define a Gaussian kernel density estimate with bandwidth σ: p(x) = N N n= ( x x n K σ 2 ) Other kernels: { t, t [, ) Epanechnikov: K(t), t Student s t: K(t) ( ) + t α+d 2. α Useful way to represent the density of a data set. K(t) e t/2. (finite support). Extremely popular in machine learning and statistics. Objective here: find modes of p(x). p. 2

4 Why find modes? Nonparametric clustering ts C 3 C C 2 p(x) Modes clusters. Assign each point x to a mode. Nonparametric: no model for the clusters shape no predetermined number of clusters. p. 3

5 Why find modes? Multivalued mappings ents p(x, y) y y p(x y = y) x x 2 x x x 2 x x x Given a density model for (x, y), represent a multivalued mapping y x by modes of the conditional distribution p(x y). Map y to {x, x, x 2 }. p(x, y) is a Gaussian mixture p(x y) is a Gaussian mixture. p. 4

6 Mode-finding algorithms Idea: start a hill-climbing algorithm from every centroid. This typically finds all modes in practice (Carreira-Perpiñán ). Hill-climbing algorithms: Gradient ascent Newton s method etc. Fixed-point iteration: the mean-shift algorithm. It is based on ideas by Fukunaga & Hostetler 75 (also Cheng 95, Carreira-Perpiñán, Comaniciu & Meer 2, etc.). p. 5

7 The mean-shift algorithm It is obtained by reorganising the stationary point equation p(x) = as a fixed-point iteration: x (τ+) = f(x (τ) ). For example, for an isotropic kde p(x) = N N n= K( x x n σ 2 ) : p(x) = N x = N n= = 2 Nσ 2 2 Nσ 2 N n= K ( x x n σ N n= N n= K ( x x n σ K ( x x n σ 2 ) 2 σ 2 (x x n) 2 ) 2 ) x K ( x x n σ 2 ) N n = K ( x x n σ 2 )x n. x n = p. 6

8 The mean-shift algorithm (cont.) Gaussian, full covariance: f(x) = ( N n= p(n x)σ n ) N n= p(n x)σ n x n Gaussian, isotropic: f(x) = N n= p(n x)x n p(n x) = e 2 x x n σ 2 N n = e 2 x x n σ 2 Other kernel, isotropic: f(x) = N n= K ( x x n σ 2 ) N n = K ( x x n σ 2 )x n. f = f f... maps points in R D to stationary points of p. It is a particularly simple algorithm, with no step size. p. 7

number of clusters popular in computer vision (segmentation, tracking) (Comaniciu & Meer)

9 Gaussian mean-shift (GMS) as a clustering algorithm x n, x m in same cluster if they converge to same mode nonparametric clustering, able to deal with complex cluster shapes; σ determines the number of clusters popular in computer vision (segmentation, tracking) (Comaniciu & Meer) Segmentation example (one data point x = (i, j, I) per pixel, where (i, j) = location and I = intensity or colour): p. 8

10 Pseudocode: GMS clustering for n {,..., N} x x n repeat n: p(n x) exp ( 2 (x x n)/σ 2) N n = exp ( 2 (x x n )/σ 2) For each data point Starting point Iteration loop Post. prob. (E step) x N p(n x)x n n= until x s update < z n x end connected-components({z n } N n=, Update x (M step) Mode ) Clusters p. 9

11 GMS is an EM algorithm Define the following artificial maximum-likelihood problem (Carreira-Perpiñán & Williams 3): Model: Gaussian mixture that moves as a rigid body ({π n, x n, Σ n } N n= fixed; only parameter is x) q(y x) = N n= π n 2πΣ n 2 e 2 (y (x n x)) T Σ n (y (x n x)). Data: just one point at the origin: y =. Then, the resulting expectation-maximisation (EM) algorithm that maximises the log-likelihood wrt the parameter x (translation of the GM) is formally identical to the GMS step. E step: update posterior probabilities p(n x). M step: update parameter x. p.

12 GMS is an EM algorithm (cont.) The M step maximises the following lower bound on log p: ϱ(x) = log p(x (τ) ) Q(x (τ) x (τ) ) + Q(x x (τ) ) with where ϱ(x (τ) ) = log p(x (τ) ) and ϱ(x) log p(x) x R D Q(x x (τ) ) = constant 2σ 2 N n= p(n x (τ) ) x x n 2 is the expected complete-data log-likelihood from the E step. p.

13 Non-Gaussian mean-shift is a GEM algorithm Consider a mixture of non-gaussian kernels. Analogous derivation as maximum-likelihood problem, but now the M step does not have a closed-form solution for x. However, it is possible to solve the M step iteratively with a certain fixed-point iteration. Taking just one step of this fixed-point iteration provides an inexact M step and so a generalised EM algorithm, which is formally identical to the mean-shift algorithm (Carreira-Perpiñán 6). p. 2

14 Convergence properties of mean-shift From the properties of (G)EM algorithms (Dempster et al 77) we expect, generally speaking: Convergence to a mode from almost any starting point Can also converge to saddle points and minima. Monotonic increase of p at every step (Jensen s inequality). Convergence of linear order. However, particular mean-shift algorithms (for particular kernels) may show different properties. We analyse in detail GMS (isotropic case: Σ n = σ 2 I). p. 3

15 Convergence properties of GMS Taylor expansion around a mode x of the fixed-point mapping f: x (τ+) = f(x (τ) ) = x + J(x )(x (τ) x ) + O( x (τ) x 2 ) x (τ+) x J(x )(x (τ) x ) where the Jacobian of f is J(x) = σ 2 M m= p(m x)(µ m f(x))(µ m f(x)) T. At a mode, J(x ) = σ 2 (local covariance at x ). Thus, the convergence is linear with rate r = lim τ x (τ+) x x (τ) x = λ max(j(x )) <. p. 4

16 Convergence properties of GMS (cont.) Special cases depending on σ: σ, σ (practically uninteresting): r, superlinear. σ at mode merges: r =, sublinear (awfully slow). Intermediate σ (practically useful): r close to (slow). σ =.2 ements σ =.4 σ = r σ = 4 σ PSfrag replacements x.2 σ =.4 σ = 4 r x σ p. 5

17 Convergence properties of GMS (cont.) The GMS iterates follow the local principal component at the mode and lie in the interior of the convex hull of the data points. Smooth path: angle between consecutive steps in ( π 2, π 2 ) (Comaniciu & Meer 2). GMS step size is not optimal along the GMS search direction. p. 6

18 Number of modes of a Gaussian mixture In principle, one would expect that N points should produce at most N modes (additional modes are an artifact of the kernel). But: In D: A Gaussian mixture with N components has at most N modes (Carreira-Perpiñán & Williams 3). A mixture of non-gaussian kernels can have more than N modes: K(x) +e x / : Epanechnikov kernel: p. 7

19 Number of modes of a Gaussian mixture (cont.) In D 2, even a mixture of isotropic Gaussians can have more than N modes (Carreira-Perpiñán & Williams 3): However, in practice the number of modes is much smaller than N because components coalesce. This may be a reason why, in practice, GMS produces better segmentations than other kernels. p. 8

20 Convergence domains of GMS In general, the convergence domains (geometric locus of points that converge to each mode) are nonconvex and can be disconnected and take quite fancy shapes: p. 9

$fractal: These peculiar$ undesirable for

21 Convergence domains of GMS (cont.) The boundary between some domains can be fractal: These peculiar domains are probably undesirable for clustering; but, small effect (seemingly confined to cluster boundaries). p. 2

22 Gaussian mean-shift as an EM algorithm: summary GMS is an EM algorithm Non-Gaussian mean-shift is a GEM algorithm GMS converges to a mode from almost any starting point Convergence is linear (occasionally superlinear or sublinear), slow in practice The iterates approach a mode along its local principal component Gaussian mixtures and kernel density estimates can have more modes than components (but seems rare) The convergence domains for GMS can be fractal p. 2

23 Acceleration strategies for GMS image segmentation Gaussian mean-shift (GMS) as an EM algorithm Acceleration strategies for GMS image segmentation Gaussian blurring mean-shift (GBMS) p. 22

24 Computational bottlenecks of GMS GMS is slow: O(kN 2 D). Computational bottlenecks: B: large average number of iterations k (linear convergence). B2: large cost per iteration 2ND multiplications E step: ND to obtain p(n x (τ) ). M step: ND to obtain x (τ+). Acceleration techniques must address B and/or B2. We propose four strategies: : spatial discretisation : spatial neighbourhood : sparse EM : EM Newton p. 23

25 Evaluation of strategies Evaluation of strategies: Goal: to achieve the same segmentation as GMS. Visual evaluation of segmentation not enough; we compute the segmentation error wrt GMS segmentation (= no. pixels misclustered as a % over the whole image). We report running times in normalised iterations (= iteration of GMS) to ensure independence from implementation details. No pre- or postprocessing of clusters (e.g. removal of small clusters). Why segmentation errors? The accelerated scheme may not converge to a mode of p(x). x n may converge to a different mode than with exact GMS. p. 24

26 : spatial discretisation PSfrag replacements Many different pixels converging to the same mode travel almost identical paths. We discretise the spatial domain by subdividing every pixel (i, j) into n n cells; points projecting to the same cell share the same fate. This works because paths in D dimensions are well approximated by their 2D projection on the spatial domain. We stop iterating once we hit an already visited cell massively reduces the total number of iterations. We start first with a set of pixels uniformly distributed over the image (this finds all modes quickly). Converges to a mode Segmentation error by increasing n Addresses B. Parameter: n =, 2, 3... (discretisation level) p. 25

27 : spatial neighbourhood Approximates E and M steps with a subset of the data points (rather than all N) consisting of a neighbourhood in the spatial domain (not the range domain). Finding neighbours is for free (unlike finding neighbours in full space). p(n x (τ) ) = ( exp 2 (x (τ) x n )/σ 2) n N (x (τ) ) exp ( 2 (x(τ) x n )/σ 2) PSfrag replacements x (τ+) = n N (x (τ) ) p(n x(τ) )x n Does not converge to a mode Segmentation error by increasing e Addresses B2. Parameter: e (, ] (fraction of data set used as neighbours) p. 26

28 : sparse EM Sparse EM (Neal & Hinton 98): coordinate ascent on the space of (x, p) where p are posterior probabilities; this maximises the free energy F ( p, x) = log p(x) D ( p p(n x)) and also p(x). Run partial E steps frequently: update p(n x) only for n S; S = plausible set (nearest neighbours), kept constant over partial E steps. FAST. Run full E steps infrequently: update all p(n x) and S. SLOW. We choose S containing as many neighbours as necessary to account for a total probability n S p(n x) ɛ (, ]. Thus, the fraction of data used e varies after each full step. Using fixed e produced worse results. Converges to a mode no matter how S is chosen; computational savings if few full steps Segmentation error by decreasing ɛ Addresses B2. Parameter: ɛ [, ) (probability not in S) p. 27

29 : EM Newton Start with EM steps, which quickly increase p. Switch to Newton steps when EM slows down (reverting to EM if bad Newton step). Specifically, try Newton step when x (τ) x (τ ) < θ. Computing the Newton step x N yields the EM step x EM for free: Gradient: g(x) = p(x) σ 2 Hessian: H(x) = p(x) σ 2 N Newton step: x N = x H (x)g(x). p(n x)(x n x) = p(x) σ (x 2 EM x) n= ( I + N ) p(n x)(x σ 2 n x)(x n x) T n= Converges to a mode with quadratic order Segmentation error by decreasing θ Addresses B. Parameter: θ > (minimum EM step length) relative to σ p. 28

30 Computational cost Strategy : spatial discretisation : spatial neighbourhood e Cost per iteration relative to exact GMS : sparse EM 2 if full step, e if partial step : EM Newton if EM step, ( + D+ ) 4 if Newton step, ( D+ ) 4 if EM step after failed N. step e (, ] is the fraction of the data set used (neighbours for plausible set for ). Using iterations normalised wrt GMS allows direct comparison of the numbers of iterations in the experiments., p. 29

31 Experimental results with image segmentation Dataset: x n = (i n, j n, I n ) (greyscale) or x n = (i n, j n, L n, u n, v n) (colour) where (i, j) is the pixel s spatial position. Prescale I or (L, u, v ) so same σ for all dimensions (in pixels). Best segmentations for large bandwidths: σ (image side). 5 We study all strategies with different images over a range of σ. Convergence tolerance = 3 pixels. Number of clusters 5 x 5 Total number of iterations ts PSfrag replacements σ ns Iterations Clusters rs σ 5 PSfrag replacements Clusters Iterations σ p. 3

32 & "! % "! $ "! # "! Experimental results with image segmentation (cont.) Segmentation results for each method under its optimal parameter value for σ = 2: GMS: 5 modes : 5 modes : 5 modes : 5 modes : 5 modes its = error.62% its = 3379 error.2% its = error.% its = 3495 error.98% its = 494 segmentation # iterations iterations histogram p. 3

33 Experimental results with image segmentation (cont.) Optimal parameter value for each method (= that which minimises the iterations ratio s. t. achieving an error < 3%) as a function of σ. eplacements e σ σ n r e ɛ θ σ σ p. 32

34 Experimental results with image segmentation (cont.) Clustering error (percent) and number of iterations for each method under its optimal parameter value as a function of σ. eplacements error Iterations 5 x 5 ms ms ms2 ms3 ms σ σ p. 33

35 ents Experimental results with image segmentation (cont.) # "! $ "! % "! & "! Clustering error (percent) and computational cost for each method as a function of its parameter. error e σ = 8 σ = 2 σ = 6 σ = 2 σ = e iteration ratio wrt GMS n r ɛ θ p. 34

36 Experimental results with image segmentation (cont.) Plot of all iterates for all starting pixels: most cells in are empty Proportion of cells visited by (less than nn out of n2 N ) i 5 5 j Average number of iterations per pixel for I PSfrag replacements 6 rag replacements Iterations/pixel Proportion PSfrag replacements n n=6 n=5 n=4 3 n=3 2 n=2 n= 5 64 N p. 35

37 Acceleration strategies: summary (cont.) With parameters set optimally, all methods can obtain nearly the same segmentation as GMS with significant speedups. Near-optimal parameter values can easily be set in advance for, and, and somewhat more heuristically for. (spatial discretisation): best strategy; speedup (average number of iterations per pixel k = 2 4 only) with very small error controlled by the discretisation level. (EM Newton): second best;.5 6 speedup., (neighbourhood methods): 3 speedup; attains the lowest (near-zero) error, while can result in unacceptably large errors for a suboptimal setting of its parameter (neighbourhood size). The neighbourhood methods (, ) are less effective because GMS needs very large neighbourhoods (comparable to the data set size for the larger bandwidths). p. 36

38 Acceleration strategies: summary (cont.) Other methods: Approximate algorithms for neighbour search (e.g. kd-trees): less effective because GMS needs large neighbourhoods. Other optimisation algorithms (e.g. quasi-newton): need to ensure first steps are EM. Fast Gauss transform: can be combined with our methods. Extensions: Adaptive and non-isotropic bandwidths: all 4 methods. Non-image data:, Is it possible to extend readily applicable. to higher dimensions? p. 37

39 Gaussian blurring mean-shift (GBMS) Gaussian mean-shift (GMS) as an EM algorithm Acceleration strategies for GMS image segmentation Gaussian blurring mean-shift (GBMS) p. 38

40 Gaussian blurring mean-shift (GBMS) This is the algorithm that Fukunaga & Hostetler really proposed It has gone largely unnoticed. Same iterative scheme, but now the data points move at each iteration. Result: sequence of progressively shrunk datasets X (), X (),... converging to single, all-points-coincident cluster. τ = τ = τ = 2 τ = 3 τ = 4 τ = 5 τ = 6 τ = 7 τ = 8 τ = 9 τ = τ = p. 39

41 Pseudocode: GBMS clustering repeat for m {,..., N} n: p(n x m ) exp ( 2 (x m x n )/σ 2) N n = exp ( 2 (x m x n )/σ 2) Iteration loop For each data point N y m p(n x m )x n n= end m: x m y m until stop connected-components({x n } N n=, One GMS step Update whole dataset Stopping criterion ) Clusters p. 4

42 Stopping criterion for GBMS GBMS converges to an all-points-coincident cluster (Cheng 95). Proof: diameter of data set decreases at least geometrically. However, GBMS shows two phases: Phase : points merge into clusters of coincident points (a few iterations); we want to stop here. Phase 2: clusters keep approaching and merging (a few to hundreds of iterations); slowly erases clustering structure. We need a stopping criterion (as opposed to a convergence criterion) to stop just after phase. Simply checking X (τ) X (τ ) < the points are always moving. does not work because Instead, consider the histogram of updates {e (τ) n } N n=, e n (τ) = x (τ) n x (τ ) n. Though the histograms change as points move, in phase 2 the entropy H does not change (the histogram bins change their position but not their values). p. 4

43 Stopping criterion for GBMS (cont.) τ = τ = τ = 2 τ = 3 τ = 4 τ = 5 τ = 6 placements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements placements PSfrag replacements PSfrag 6 replacements PSfrag 6 replacements PSfrag 5 replacements PSfrag 6 replacements τ = 7 τ = 8 τ = 9 τ = τ = τ = 2 τ = 3 placements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements placements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements τ = 4 τ = 5 τ = 6 τ = 7 τ = 8 τ = 9 τ = 2 placements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements placements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements p

44 Stopping criterion for GBMS (cont.) ts Stopping criterion: average update py py s ing ( H(e (τ+) ) H(e (τ) ) < 8) OR PSfrag replacements update does not become update small enough does not become small enough τ average update update entropy ( N N n= e (τ+) n < ) τ entropy stops changing p. 43

45 Convergence rate of GBMS GBMS shrinks a Gaussian cluster towards its mean with cubic convergence rate. Proof: work with continuous pdf p(x) = N N n= ( x x n K σ 2 ) p(x) = R D q(y)k(x y) dy New dataset: x = p(y x)y dy = E {y x} x R D with p(y x) = R D K(x y)q(y) p(x) Considering D w.l.o.g. (dimensions decouple): q(x) = N (x;, s 2 ) p(x) = N (x;, s 2 + σ 2 ), q( x) = N ( x;, (rs) 2 ) with r = +(σ/s) 2 (, ). p. 44

46 Convergence rate of GBMS (cont.) Recurrence relation for the standard deviation s(τ ) : (τ +) (τ ) () s = s for s > (τ ) 2 + (σ/s ) s(τ +) s( ) which converges to with cubic order, i.e., limτ (τ ) ( ) 3 <. s s Effectively, σ increases at every step. τ = τ = τ =2 τ =3 τ =4 τ = Sfrag cements replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements acements Sfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements The cluster shrinks without changing orientation or mean. The principal direction shrinks more slowly relative to other directions. p. 45

47 Convergence rate of GBMS (cont.) In summary, a Gaussian distribution remains Gaussian with the same mean but each principal axis decreases cubically. This explains the practical behaviour shown by GBMS:. Clusters collapse extremely fast (clustering). 2. After a few iterations only the local principal component survives temporary linearly-shaped clusters (denoising). τ = τ = 2 τ = 3 p. 46

48 Convergence rate of GBMS (cont.) Number of GBMS iterations τ necessary to achieve s (τ) < function of the bandwidth σ for s () = : as a 3 ) ' ( ) ' ( = 3 = 2 frag replacements τ σ Note that GMS converges with linear order (p = ), thus requiring many iterations to converge to a mode. p. 47

49 Connection with spectral clustering for m {,..., N} GBMS pseudocode (inner loop) n: p(n x m ) exp ( 2 (x m x n )/σ 2) N n = exp ( 2 (x m x n )/σ 2) y m N p(n x m )x n n= end m: x m y m For each data point One GMS step Update whole dataset Same pseudocode in matrix form W = ( exp ( ( (x 2 m x n )/σ 2)) nm N ) D = diag X = XWD n= w nm Gaussian affinity matrix Degree (normalising) matrix Update whole dataset p. 48

50 Connection with spectral clustering (cont.) GBMS can be written as repeated products X XP with the random-walk matrix P = WD equivalent to the graph Laplacian (W: Gaussian affinities, D: degree matrix, P: posterior probabilities p(n x m )). In phase P is quickly changing as points cluster. In phase 2 P is almost constant (and perfectly blocky) so GBMS implicitly extracts the leading eigenvectors (power method) like spectral clustering. Thus GBMS is much faster than computing eigenvectors (about 5 matrix-vector products are enough). Actually, since P is a positive matrix it has a single leading eigenvector (Perron-Frobenius th.) with constant components, so eventually all points collapse. p. 49

51 Connection with spectral clustering (cont.) Leading 7 eigenvectors u,..., u 7 R N and leading 2 eigenvalues µ,..., µ 2 (, ] of P = (p(n x m )) nm : u 2 u 3 u 4 u 5 u 6 u 7 µ n τ = placements fragpsfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements replacements PSfrag replacements 5 τ = 2 placements fragpsfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements replacements PSfrag replacements 5 τ = 4 placements fragpsfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements replacements PSfrag replacements 5 τ = 6 placements fragpsfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements replacements PSfrag replacements 5 τ = PSfrag replacements placements fragpsfrag replacements PSfrag replacements PSfrag replacements PSfrag replacements replacements n p. 5

52 Accelerated GBMS algorithm At each iteration, replace clusters already formed with a single point of mass = no. points in cluster. The algorithm is now an alternation between GMS blurring steps and connected-component reduction steps. Equivalent to the original GBMS but faster. The effective number of points N (τ) (thus the computation) decreases very quickly. Computational cost: k : average number of iterations per point for GMS k 2 : number of iterations for GBMS and accelerated GBMS GMS GBMS Accelerated GBMS 2N 2 3 Dk N 2 3 Dk 2 2 D k 2 2 τ= (N (τ ) ) 2 p. 5

53 Experiments with image segmentation +* +. -, / Original image GMS GBMS Number of iterations Image GMS GBMS Accelerated GBMS + / +, (σ = 24.2) 8 (σ = 2.3) 4.6 (σ = 2.3) (σ = 33) 4 (σ = 24) 4.8 (σ = 24) p. 52

54 Experiments with image segmentation (cont.) Effective number of points N (τ) in accelerated GBMS: 6 4 cameraman hand 2 N (τ) frag replacements Accelerated GBMS is 2 4 faster than GBMS 5 6 faster than GMS 5 5 τ p. 53

55 Experiments with image segmentation (cont.) nts clusters C iterations GMS σ = 8.4 σ =. σ = 8.3 C = 6 C = 4 C = 2 σ PSfrag replacements clusters C iterations speedup GBMS and accelerated GBMS acc. GBMS GBMS = acc. GBMS GBMS σ = 8. σ = 8.8 σ = 5.8 C = 6 C = 4 C = 2 σ p. 54

56 Experiments with image segmentation (cont.) Segmentations of similar quality to those of GMS but faster. Computational cost O(kN 2 ). Possible further accelerations: Fast Gauss transform Sparse affinities W: Truncated Gaussian kernel Proximity graph (k-nearest-neighbours, etc.) Bandwidth σ set by the user; good values of σ: GMS: σ 5 (image-side) GBMS: somewhat smaller Can also try values of σ in a scaled-down version of the image (fast) and then scale up σ. p. 55

57 GBMS: summary We have turned a neglected algorithm originally due to Fukunaga & Hostetler into a practical algorithm by introducing a reliable stopping criterion and an acceleration. Fast convergence (cubic order for Gaussian clusters). Connection with spectral clustering (GBMS interleaves refining the affinities with power iterations). Excellent segmentation results, faster than GMS and spectral clustering; only parameter is σ (which controls the granularity). Very simple to implement. We hope to see it more widely applied. p. 56

Gaussian mean shift is an EM algorithm

Gaussian mean shift is an EM algorithm Miguel Á. Carreira-Perpiñán Dept. of Computer Science & Electrical Engineering OGI School of Science & Engineering, Oregon Health & Science University 20000 NW Walker