Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon Jelena Kovacevic Carnegie Mellon University 1
Outline Motivation Training-free methods Comparative Reasoning Related work Approach Winner Take All (WTA) Hash Clustering based on Random Walks Some experimental results 2
Acknowledgements Example and test images taken from Berkeley Segmentation Dataset (BSDS) The Prague Texture Segmentation Data Generator and Benchmark 3
Motivation Goals: Segment images where no. of classes unknown Eliminate training data (may not be available) Fast pre-processing step for classification Segmentation is similarity search Comparative Reasoning is rank correlation using machine learning concept of hashing 4
Hashing Used to speed up the searching process A hash function relates the data values to keys or hash codes Value Hash function Key/ Hash code Hash table is shortened representation of data Hash table Hash value 001 010 011 100 Data Bird_type1 Bird_type2 Dog_type1 Fox_type1 5
Hashing Similar data points have the same (or close by) hash keys or hash codes Input data Hash code Properties of hash functions Always returns a number for an object Two equal objects will always have the same number Two unequal objects may not always have different numbers Images from https://upload.wikimedia.org/wikipedia/commons/3/32/house_sparrow04.jpg Wikipedia www.weknowyourdreams.com 6
Hashing for Segmentation Each pixel is described by some feature vectors (eg. Color) Hashing is used to cluster them into groups 1110 0110 Image Color features of each pixel computed 0001 0111 Similar features hashed into same groups 7
Segmentation and Randomized Hashing Random hashing i.e using a hash code to indicate the region in which a feature vector lies after splitting the space using a set of randomly chosen splitting planes 1 1011 2 1111 3 0111 1001 0011 0 0001 1000 0110 0000 0100 (a) C. J. Taylor and A. Cowley, Fast segmentation via randomized hashing., in BMVC, pp. 1 11, 2009. 8
Winner Take All (WTA) Hash A way to convert feature vectors into compact binary hash codes Absolute value of feature does not matter, only the ordering of values matters Rank correlation preserved Stability Distance between hashes approximates rank correlation J. Yagnik, D. Strelow, D. A. Ross, and R.s. Lin, The power of comparative reasoning, in ICCV 2011, pp. 2431 2438, IEEE, 2011. 9
Calculating WTA Hash Consider 3 feature vectors Step 1: Create random permutations Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ 10
Calculating WTA Hash Step 2: Choose first K entries. Let K=3 Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries 11
Calculating WTA Hash Step 3: Pick the index of the max. entry. This is the hash code h of that feature vector Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries Step 3 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 h=2 h=2 h=1 Hash code is index of top entry out of the K 12
Calculating WTA Hash Notice that Feature 2 is just Feature 1 perturbed by one, but Feature 3 is very different Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries Step 3 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 h=2 h=2 h=1 Hash code is index of top entry out of the K Feature 1 and Feature 2 are similar 13
Random Walks Understanding proximity in graphs Useful in propagation in graphs creates probability maps Similar to electrical network with voltages and resistances It is supervised. User must specify seeds 0.05V 1 2 0.16V 1-0.16V 2 1 1 2 +1V -1V 14
Our Approach Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 15
Block I: Similarity Search Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 16
WTA hash Image Dimensions: P x Q x d Project onto R randomly chosen hyperplanes Each point in image has R feature vectors d R Image = d Q vectorize PQ Random projections onto R pairs of points PQ P 17
R 01 WTA 11 hash ons ints d R 0 PQ ge = Run WTA hash d N times. Each point has R features Image = PQ d PQ d Random projections onto R pairs of points vectorize Run WTA hash. We get one hash code P Q vectorize PQ for each Q point in the image Random projections onto R pairs of points PQ PQ R PQ 01 11 PQ s to get PQ x N matrix of hash codes P Each point has R features Each point has R features Run WTA hash. We for each poin K=3 Hence possible values of hash codes are 00, 01, 11 WTA hash. for each p Repeat this N times to get PQ x N matrix of hash codes 18
Block II: Create Graph Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 19
Create Graph Run WTA hash N times each point has N hash codes Image transformed into lattice Calculate edge weight between nodes i and j where: i,j = d H(i, j) i,j =exp( i,j ) d H (i, j) = Avg. Hamm. distance over all N hash codes of i and j = Scaling factor = Weight parameter for the RW algorithm 20
Block III: RW Algorithm Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 21
Seed Selection Needs initial seeds to be defined Unsupervised draws using Dirichlet processes DP(G0,α) Go is base distribution " = parameter, =1 = α is discovery where " of = =, =1 Larger α"leads to discovery more classes = Totalnumberofclasses where = Classlabel, 1,2 " " = Totalnumberofclasses = { } " = Classlabel, 1,2 " = = numberofsamplesinthclassexcludingthethsample, =10 = = { } where " = numberofsamplesinthclassexclu " =, = =10 =, = =1 " = Totalnumberofclasses where = Classlabel, 1,2 " ere " = Totalnumberofclasses = { } = Totalnumberofclasses = Classlabel, 1,2 " " = Classlabel, 1,2 numberofsamplesinthclassexcludingthethsample " = } > 0 lim =, =100 == = = {,, = { }, " = numberofsamplesinthclassexclud " where " = numberofsamplesinthclassexcludingthethsample =, =10 = lim =, = =, =,, =100 " " = Totalnumberofclasses lim where =, =,, = re " 1+ = Classlabel, 1,2 " 22 = Totalnumberofclasses = Totalnumberofclasses " = { } lim =, =
Seed Selection Probability that a new seed belongs to a new class is proportional to α Probability for the i th sample with class label y i Result by Blackwell and MacQueen, 1973 p(y i = c y i, ) = n i c + C tot n 1+ where: C tot = Total number of classes y i = Class label c, c 2 {1, 2...C tot } y i = {y j j 6= i} n i c = number of samples in cth class excluding ith sample 23
Seed Selection Unsupervised, hence C tot is infinite. Hence, lim p(y i = c y i, ) = C tot 1 n i c n 1+ 8c, n i c > 0 Clustering effect or rich gets richer Probability that a new class is discovered: Class is non-empty lim p(y i 6= y j for all j<i y i, ) = C tot 1 n 1+ 8c, n i c =0 Class is empty or new 24
Random Walks Use the RW algorithm to generate probability maps in each iteration Entropy calculated with probability maps Entropy-based stopping criteria Cluster purity ", Avg. image entropy # 25
Experimental Results Histology images Automatically Picked seeds Berkeley segmentation subset Avg. GCE of dataset = 0.186 26
Experimental Results ogy images with the respective sets of seeds used TexGeo Avg GCE of dataset = 0.134 ges. Middle row: ground truth images provided by ented outputs using our method. Avg. GCE TexBTF of dataset = 0.061 method demonstrated on some natural images from 27
Experimental Results Comparison measure: Global Consistency Error (GCE)* Lower GCE indicates lower error No. of features GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134 *C. Fowlkes, D. Martin, and J. Malik, Learning affinity functions for image segmentation: Combining patch-based and gradient-based approaches, vol. 2, pp. II 54, IEEE, 2003. 28
Experimental Results Comparison measure: Global Consistency Error (GCE) Lower GCE indicates lower error No. of features GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134 Comparison with other methods ** : Performed on BSDS Subset Method Human RAD Seed Learned Affinity Mean Shift Normalized cuts GCE 0.080 0.205 0.209 0.214 0.260 0.336 **E. Vazquez, J. Van De Weijer, and R. Baldrich, Image segmentation in the presence of shadows and highlights, pp. 1 14, Springer, 2008. 29
Conclusions Comparative reasoning and Winner Take All hash enables fast similarity search Our method performs unsupervised segmentation using context (Random Walks-based clustering) There is no need to predefine the number of classes This can be used as a pre-processing step for classification of hyperspectral images, biomedical images etc. 30
Thank you 31