Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Similar documents
Learning Spectral Graph Segmentation

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

CS 664 Segmentation (2) Daniel Huttenlocher

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

When Dictionary Learning Meets Classification

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Protein Complex Identification by Supervised Graph Clustering

Final Exam, Machine Learning, Spring 2009

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Correlation Preserving Unsupervised Discretization. Outline

What is semi-supervised learning?

Supervised locally linear embedding

Models, Data, Learning Problems

CS5112: Algorithms and Data Structures for Applications

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Iterative Laplacian Score for Feature Selection

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Machine Learning Basics

Brief Introduction of Machine Learning Techniques for Content Analysis

UVA CS 4501: Machine Learning

Induction of Decision Trees

Automated Orbital Mapping

Machine Learning Recitation 8 Oct 21, Oznur Tastan

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Kernel Logistic Regression and the Import Vector Machine

Gaussian Process Based Image Segmentation and Object Detection in Pathology Slides

Diffuse interface methods on graphs: Data clustering and Gamma-limits

ORIE 4741 Final Exam

Artificial Neural Networks Examination, June 2005

Learning Tetris. 1 Tetris. February 3, 2009

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Classification Using Decision Trees

Using a Hopfield Network: A Nuts and Bolts Approach

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Multiclass Classification-1

FINAL: CS 6375 (Machine Learning) Fall 2014

Machine Learning CPSC 340. Tutorial 12

Homework 6. Due: 10am Thursday 11/30/17

Linear & nonlinear classifiers

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

CSC Neural Networks. Perceptron Learning Rule

Genetic Algorithms: Basic Principles and Applications

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

ECE521 week 3: 23/26 January 2017

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Large Scale Semi-supervised Linear SVMs. University of Chicago

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Clustering using Mixture Models

Single layer NN. Neuron Model

Modern Information Retrieval

Compressed Fisher vectors for LSVR

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Random walks and anisotropic interpolation on graphs. Filip Malmberg

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Machine Learning, Fall 2011: Homework 5

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Connectedness of Random Walk Segmentation

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Kernel Density Topic Models: Visual Topics Without Visual Words

Self-Tuning Spectral Clustering

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Learning theory. Ensemble methods. Boosting. Boosting: history

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Machine Learning, Fall 2012 Homework 2

Undirected Graphical Models

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Neural Networks: Backpropagation

Mathematical Embeddings of Complex Systems

Generative v. Discriminative classifiers Intuition

Self-Tuning Semantic Image Segmentation

The exam is closed book, closed notes except your one-page cheat sheet.

Chapter 7 Network Flow Problems, I

Tutorial 6. By:Aashmeet Kalra

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

10-701/15-781, Machine Learning: Homework 4

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Conditional Random Field

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear Spectral Hashing

Introduction to Neural Networks

Why Spatial Data Mining?

Neural Networks and Deep Learning

UNSUPERVISED LEARNING

Warm up: risk prediction with logistic regression

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Learning Decision Trees

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Modeling Symmetries for Stochastic Structural Recognition

Heuristics for The Whitehead Minimization Problem

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

EECS 349:Machine Learning Bryan Pardo

Transcription:

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon Jelena Kovacevic Carnegie Mellon University 1

Outline Motivation Training-free methods Comparative Reasoning Related work Approach Winner Take All (WTA) Hash Clustering based on Random Walks Some experimental results 2

Acknowledgements Example and test images taken from Berkeley Segmentation Dataset (BSDS) The Prague Texture Segmentation Data Generator and Benchmark 3

Motivation Goals: Segment images where no. of classes unknown Eliminate training data (may not be available) Fast pre-processing step for classification Segmentation is similarity search Comparative Reasoning is rank correlation using machine learning concept of hashing 4

Hashing Used to speed up the searching process A hash function relates the data values to keys or hash codes Value Hash function Key/ Hash code Hash table is shortened representation of data Hash table Hash value 001 010 011 100 Data Bird_type1 Bird_type2 Dog_type1 Fox_type1 5

Hashing Similar data points have the same (or close by) hash keys or hash codes Input data Hash code Properties of hash functions Always returns a number for an object Two equal objects will always have the same number Two unequal objects may not always have different numbers Images from https://upload.wikimedia.org/wikipedia/commons/3/32/house_sparrow04.jpg Wikipedia www.weknowyourdreams.com 6

Hashing for Segmentation Each pixel is described by some feature vectors (eg. Color) Hashing is used to cluster them into groups 1110 0110 Image Color features of each pixel computed 0001 0111 Similar features hashed into same groups 7

Segmentation and Randomized Hashing Random hashing i.e using a hash code to indicate the region in which a feature vector lies after splitting the space using a set of randomly chosen splitting planes 1 1011 2 1111 3 0111 1001 0011 0 0001 1000 0110 0000 0100 (a) C. J. Taylor and A. Cowley, Fast segmentation via randomized hashing., in BMVC, pp. 1 11, 2009. 8

Winner Take All (WTA) Hash A way to convert feature vectors into compact binary hash codes Absolute value of feature does not matter, only the ordering of values matters Rank correlation preserved Stability Distance between hashes approximates rank correlation J. Yagnik, D. Strelow, D. A. Ross, and R.s. Lin, The power of comparative reasoning, in ICCV 2011, pp. 2431 2438, IEEE, 2011. 9

Calculating WTA Hash Consider 3 feature vectors Step 1: Create random permutations Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ 10

Calculating WTA Hash Step 2: Choose first K entries. Let K=3 Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries 11

Calculating WTA Hash Step 3: Pick the index of the max. entry. This is the hash code h of that feature vector Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries Step 3 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 h=2 h=2 h=1 Hash code is index of top entry out of the K 12

Calculating WTA Hash Notice that Feature 2 is just Feature 1 perturbed by one, but Feature 3 is very different Permutation vector θ 3 1 5 2 6 4 feature 1 feature 2 feature 3 13 4 2 11 5 3 12 5 3 10 4 2 1 90 44 5 15 6 Step 1 2 13 5 4 3 11 44 1 15 90 6 5 3 12 4 5 2 10 Permute with θ Step 2 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 Choose first K entries Step 3 2 13 5 4 3 11 3 12 4 5 2 10 44 1 15 90 6 5 h=2 h=2 h=1 Hash code is index of top entry out of the K Feature 1 and Feature 2 are similar 13

Random Walks Understanding proximity in graphs Useful in propagation in graphs creates probability maps Similar to electrical network with voltages and resistances It is supervised. User must specify seeds 0.05V 1 2 0.16V 1-0.16V 2 1 1 2 +1V -1V 14

Our Approach Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 15

Block I: Similarity Search Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 16

WTA hash Image Dimensions: P x Q x d Project onto R randomly chosen hyperplanes Each point in image has R feature vectors d R Image = d Q vectorize PQ Random projections onto R pairs of points PQ P 17

R 01 WTA 11 hash ons ints d R 0 PQ ge = Run WTA hash d N times. Each point has R features Image = PQ d PQ d Random projections onto R pairs of points vectorize Run WTA hash. We get one hash code P Q vectorize PQ for each Q point in the image Random projections onto R pairs of points PQ PQ R PQ 01 11 PQ s to get PQ x N matrix of hash codes P Each point has R features Each point has R features Run WTA hash. We for each poin K=3 Hence possible values of hash codes are 00, 01, 11 WTA hash. for each p Repeat this N times to get PQ x N matrix of hash codes 18

Block II: Create Graph Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 19

Create Graph Run WTA hash N times each point has N hash codes Image transformed into lattice Calculate edge weight between nodes i and j where: i,j = d H(i, j) i,j =exp( i,j ) d H (i, j) = Avg. Hamm. distance over all N hash codes of i and j = Scaling factor = Weight parameter for the RW algorithm 20

Block III: RW Algorithm Similarity Search Block I Block II Input image Random projections WTA hash Transform to graph with (Nodes, Edges) RW Algorithm Block III Auto. seed Probabilities selection from Stop? RW algo. No Yes Segmented output 21

Seed Selection Needs initial seeds to be defined Unsupervised draws using Dirichlet processes DP(G0,α) Go is base distribution " = parameter, =1 = α is discovery where " of = =, =1 Larger α"leads to discovery more classes = Totalnumberofclasses where = Classlabel, 1,2 " " = Totalnumberofclasses = { } " = Classlabel, 1,2 " = = numberofsamplesinthclassexcludingthethsample, =10 = = { } where " = numberofsamplesinthclassexclu " =, = =10 =, = =1 " = Totalnumberofclasses where = Classlabel, 1,2 " ere " = Totalnumberofclasses = { } = Totalnumberofclasses = Classlabel, 1,2 " " = Classlabel, 1,2 numberofsamplesinthclassexcludingthethsample " = } > 0 lim =, =100 == = = {,, = { }, " = numberofsamplesinthclassexclud " where " = numberofsamplesinthclassexcludingthethsample =, =10 = lim =, = =, =,, =100 " " = Totalnumberofclasses lim where =, =,, = re " 1+ = Classlabel, 1,2 " 22 = Totalnumberofclasses = Totalnumberofclasses " = { } lim =, =

Seed Selection Probability that a new seed belongs to a new class is proportional to α Probability for the i th sample with class label y i Result by Blackwell and MacQueen, 1973 p(y i = c y i, ) = n i c + C tot n 1+ where: C tot = Total number of classes y i = Class label c, c 2 {1, 2...C tot } y i = {y j j 6= i} n i c = number of samples in cth class excluding ith sample 23

Seed Selection Unsupervised, hence C tot is infinite. Hence, lim p(y i = c y i, ) = C tot 1 n i c n 1+ 8c, n i c > 0 Clustering effect or rich gets richer Probability that a new class is discovered: Class is non-empty lim p(y i 6= y j for all j<i y i, ) = C tot 1 n 1+ 8c, n i c =0 Class is empty or new 24

Random Walks Use the RW algorithm to generate probability maps in each iteration Entropy calculated with probability maps Entropy-based stopping criteria Cluster purity ", Avg. image entropy # 25

Experimental Results Histology images Automatically Picked seeds Berkeley segmentation subset Avg. GCE of dataset = 0.186 26

Experimental Results ogy images with the respective sets of seeds used TexGeo Avg GCE of dataset = 0.134 ges. Middle row: ground truth images provided by ented outputs using our method. Avg. GCE TexBTF of dataset = 0.061 method demonstrated on some natural images from 27

Experimental Results Comparison measure: Global Consistency Error (GCE)* Lower GCE indicates lower error No. of features GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134 *C. Fowlkes, D. Martin, and J. Malik, Learning affinity functions for image segmentation: Combining patch-based and gradient-based approaches, vol. 2, pp. II 54, IEEE, 2003. 28

Experimental Results Comparison measure: Global Consistency Error (GCE) Lower GCE indicates lower error No. of features GCE Score BSDSubset TexBTF TexColor TexGeo 10 0.179 0.063 0.159 0.102 20 0.180 0.065 0.159 0.129 40 0.186 0.061 0.156 0.134 Comparison with other methods ** : Performed on BSDS Subset Method Human RAD Seed Learned Affinity Mean Shift Normalized cuts GCE 0.080 0.205 0.209 0.214 0.260 0.336 **E. Vazquez, J. Van De Weijer, and R. Baldrich, Image segmentation in the presence of shadows and highlights, pp. 1 14, Springer, 2008. 29

Conclusions Comparative reasoning and Winner Take All hash enables fast similarity search Our method performs unsupervised segmentation using context (Random Walks-based clustering) There is no need to predefine the number of classes This can be used as a pre-processing step for classification of hyperspectral images, biomedical images etc. 30

Thank you 31