Learning Spectral Graph Segmentation

Similar documents
Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

CS 556: Computer Vision. Lecture 13

Grouping with Bias. Stella X. Yu 1,2 Jianbo Shi 1. Robotics Institute 1 Carnegie Mellon University Center for the Neural Basis of Cognition 2

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Introduction to Spectral Graph Theory and Graph Clustering

Linear Regression. S. Sumitra

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Grouping with Directed Relationships

CS 664 Segmentation (2) Daniel Huttenlocher

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Spectral Clustering. Guokun Lai 2016/10

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Multiclass Spectral Clustering

Statistical Machine Learning

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Object Detection and Segmentation from Joint Embedding of Parts and Pixels

Machine Learning for Data Science (CS4786) Lecture 11

Segmentation with Pairwise Attraction and Repulsion

A spectral clustering algorithm based on Gram operators

Segmentation Using Superpixels: A Bipartite Graph Partitioning Approach

Spectral Clustering. Zitao Liu

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Graphs in Machine Learning

Lecture 5 : Projections

Approximation Algorithms

MAP Examples. Sargur Srihari

Convex relaxation for Combinatorial Penalties

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

MATH 567: Mathematical Techniques in Data Science Clustering II

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

2.1 Optimization formulation of k-means

Constrained Optimization and Lagrangian Duality

Convex Optimization Algorithms for Machine Learning in 10 Slides

Doubly Stochastic Normalization for Spectral Clustering

Spectral Techniques for Clustering

Learning Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Probabilistic Graphical Models & Applications

NORMALIZED CUTS ARE APPROXIMATELY INVERSE EXIT TIMES

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

We choose parameter values that will minimize the difference between the model outputs & the true function values.

minimize x subject to (x 2)(x 4) u,

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Robust Motion Segmentation by Spectral Clustering

Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Jeff Howbert Introduction to Machine Learning Winter

Differentiation of functions of covariance

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Self-Tuning Spectral Clustering

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Regularization in Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Radial-Basis Function Networks

Advanced Machine Learning & Perception

Stochastic Proximal Gradient Algorithm

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Part 4: Conditional Random Fields

Intelligent Systems:

PU Learning for Matrix Completion

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Radial-Basis Function Networks

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Random walks for deformable image registration Dana Cobzas

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Support Vector Machines

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Convergence of the Graph Allen-Cahn Scheme

Probabilistic Graphical Models

OWL to the rescue of LASSO

Statistical and Computational Analysis of Locality Preserving Projection

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Optimization geometry and implicit regularization

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Towards stability and optimality in stochastic gradient descent

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Fusion of Similarity Data in Clustering

Probabilistic Graphical Models

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Principal Curvatures of Surfaces

Weighted Minimal Surfaces and Discrete Weighted Minimal Surfaces

CSC 412 (Lecture 4): Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Lecture 9: PGM Learning

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

MATH 567: Mathematical Techniques in Data Science Clustering II

Cheng Soon Ong & Christian Walder. Canberra February June 2018

11. Learning graphical models

Machine Learning Linear Models

Feedforward Neural Nets and Backpropagation

An optimal control problem for a parabolic PDE with control constraints

EECS 275 Matrix Computation

Transcription:

Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique

Graph-based Image Segmentation Weighted graph G =(V,W) W ij V = vertices (pixels i) Similarity between pixels i and j : W ij = W ji 0 Segmentation = graph partition of pixels 2

Spectral Graph Segmentation Image I Intensity Color Edges Texture Graph Affinities W=W(I,Θ) 3

Spectral Graph Segmentation Image I Intensity Color Edges Texture Graph Affinities W=W(I,Θ) NCut( A, B) = cut( A, B) Vol A Vol B 4

Spectral Graph Segmentation Image I Graph Affinities W=W(I,Θ) Eigenvector X(W) NCut( A, B) = cut( A, B) Vol A Vol B WX X A () i = λdx 1 if i = 0 if i A A 5

Spectral Graph Segmentation Image I Graph Affinities W=W(I,Θ) Eigenvector X(W) Discretisation NCut( A, B) = cut( A, B) Vol A Vol B WX = λdx X A ( i) 1 if i A = 0 if i A 6

Spectral Graph Segmentation Image I Intensity + Color + Edges + Texture + Graph Affinities W=W(I,Θ) Eigenvector X(W) Learn graph parameters Θ Target segmentation Reverse pipeline 7

intensity cues edges cues color cues [Shi & Malik, 97] Multiscale cues [Yu, 04; Fowlkes 04] Texture cues 8

Where is Waldo? Do you use Edges cues? Color cues? Texture cues? -That s not enough, you need Shape cues High-level object priors 9

Spectral Graph Segmentation Image I Intensity + Color + Edges + Texture + Graph Affinities W=W(I,Θ) Eigenvector X(W) Learn graph parameters Θ Target segmentation 10

Θ Image I Affinities W=W(I,Θ) Eigenvector X(W) - Target X * Target P * ~W * Error criterion on affinity matrix W [1] Meila and Shi (2001) [2] Fowlkes, Martin, Malik (2004) 11

Θ Image I Affinities W=W(I,Θ) Eigenvector X(W) - Target X * Target P * ~W * Error criterion on affinity matrix W Constraining Wij is overkill: Many W have segmentation W: O(n 2 ) parameters Segments : O(n) parameters [1] Meila and Shi (2001) [2] Fowlkes, Martin, Malik (2004) 12

Θ Image I Affinities W=W(I,Θ) Eigenvector X(W) - Error criterion on partition vector X only! Target X * 13

Energy function for segmenting 1 image I Eigenvector X(W) - * 2 E ( W) = X( W( I, Θ)) X ( I) I Target X * 14

Energy function for segmenting 1 image I Eigenvector X(W) - * 2 E ( W) = X( W( I, Θ)) X ( I) I Min. E( Θ) = XW ( ( I, Θ)) X( I) images I * 2 Target X * 15

Eigenvector X(W) - E ( W) = X( W( I, Θ)) X ( I) I Energy function for image I * 2 Min. E( Θ) = XW ( ( I, Θ)) X( I) images I * 2 Target X * Can use gradient descent...but XW ( ) is only implicitly defined, by W( I, Θ ) - λ D X = 0, ( ) X f( W, Θ) Can Not backtrack easily 16

Min. E( Θ) = X( W( I, Θ)) X ( I) images I * 2 Gradient descent: Θ = η E E X W = η Θ X W Θ 17

Min. E( Θ) = X( W( I, Θ)) X ( I) images I * 2 Gradient descent: X(W) is implicit Θ = η E E X W = η Θ X W Θ E(X) is quadratic depends on W(Θ) 18

E X W Θ = η X W Θ Theorem : Derivative of NCut eigenvectors The map W (X, λ) is C over Ω and we can 1 express the derivatives over any C path ( ) as : Wt dλ( W( t)) dt dx ( W ( t)) dt ( ' λ ') T X W D X = T X DX = ( W λd) W' λd' dλ dt D X 19

E X W Θ = η X W Θ Theorem : Derivative of NCut eigenvectors The map W (X, λ) is C over Ω and we can 1 express the derivatives over any C path ( ) as : Wt dλ( W( t)) dt dx ( W ( t)) dt ( ' λ ') T X W D X = T X DX = ( W λd) W' λd' dλ dt D X Feasible set in the space of graph weight matrices : W Ωiff 1) W is n n symmetric matrix 2) W1 > 0 3) λ is single with WX = λ DX 2 2 2 2 20

E X W Θ = η X W Θ Theorem : Derivative of NCut eigenvectors The map W (X, λ) is C over Ω and we can 1 express the derivatives over any C path ( ) as : Wt dλ( W( t)) dt dx ( W ( t)) dt ( ' λ ') T X W D X = T X DX = ( W λd) W' λd' dλ dt D X [Meila, Shortreed, Xu, 05] 21

Task 1: Learning 2D point segmentation 100 points at ( x, y ) i i W ij = 2 2 e σx xi xj σ y yi yj ( σ σ ) parameters : Θ =, x y 22

( 2 2) Learn W = exp σ ( x x ) σ ( y y ) ij x i j y i j ( ) * 2 σ σ E( σ, σ ) = X W(, ) X x y x y Ground truth segmentation X * and X(W) learned 23

Learning 2D point segmentation ( 2 2) Learn W = exp σ ( x x ) σ ( y y ) ij x i j y i j ( ) * 2 σ σ E( σ, σ ) = X W(, ) X x y x y Ground truth segmentation X * and X(W) learned 24

Proposition : Exponential convergence E The PDE W = either W converges to a global minimum W : E W(t) ( ) 0, exponentially or escapes any compact K Ω this happens when 1) λ ( t) 1 2 W() t or λ () t λ () t 0 2 3 2) D ( i, i) 0 for some i 25

Is there a ground truth segmentation? { e x y α } Graph nodes = edge at (, ) with angle i i i i 26

Task 2: Learning rectangular shape W( e, e ) = f( x, y, α; x, y, α ) i j i i i j j j Edge location x, y Edge orientation i i α i Convex Concave Impossible 27

Task 2: Learning rectangular shape W( e, e ) = f( x, y, α; x, y, α ) i j i i i j j j Edge location Edge orientaion x, y i α i i Convex Concave Impossible 10 10 edge locations 4 edge angles 2 W = n n n = 160,000 parameters x y α 28

Task 2: Learning rectangular shape W( e, e ) = f( x, y, α; x, y, α ) i j i i i j j j Edge location x, y Edge orientaion i α i i Convex Concave Impossible Invariance through parameterization : W( e, e ) = f( x x, y y, α, α ) i j j i j i i j 20 20 4 4 = 6400 parameters 29

Synthetic case training Input edges Ncut eigenvector after training target testing Input edges Ncut eigenvector 30

Testing on real images Learning with precise edges 31

Comparison of the spectral graph learning schemes Meila and Shi, Fowlkes, Martin and Malik Bach-Jordan Our method 1 1 * arg min D W D W( X ) W arg min W J ( W, X * ) * argmin XW ( ) X W Bach and Jordan (2003) Learning spectral clustering Use a differentiable approximation of eigenvectors Our work Exact analytical form computationally efficient solution 32

Conclusion Supervise the un-supervised spectral clustering Exact and efficient learning of graph weights using derivatives of eigenvectors 33