Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr
Image labelling Problems Assign a label to each image pixel Geometry Estimation Image Denoising Object Segmentation Sky Building Tree Grass
Pairwise CRF models Standard CRF Energy Data term Smoothness term
Pairwise CRF models Standard CRF Energy Data term Smoothness term Restricted expressive power
Structures in CRF Taskar et al. 02 associative potentials Kohli et al. 08 segment consistency Woodford et al. 08 planarity constraint Vicente et al. 08 connectivity constraint Nowozin & Lampert 09 connectivity constraint Roth & Black 09 field of experts Ladický et al. 09 consistency over several scales Woodford et al. 09 marginal probability Delong et al. 10 label occurrence costs
Pairwise CRF models Standard CRF Energy for Object Segmentation Local context Cannot encode global consistency of labels!!
Detection Suppression If we have 1000 categories (detectors), and each detector produces 1 fp every 10 images, we will have 100 false alarms per image pretty much garbage [Torralba et al. 10, Leibe & Schiele 09, Barinova et al. 10] chair table car keyboard road table road mage from Torralba et al. 10
Encoding Co-occurrence Co-occurrence is a powerful cue [Heitz et al. '08] [Rabinovich et al. 07] Thing Thing Stuff - Stuff Stuff - Thing [ Images from Rabinovich et al. 07 ]
Encoding Co-occurrence Co-occurrence is a powerful cue [Heitz et al. '08] [Rabinovich et al. 07] Thing Thing Stuff - Stuff Stuff - Thing Proposed solutions : 1. Csurka et al. 08 - Hard decision for label estimation 2. Torralba et al. 03 - GIST based unary potential 3. Rabinovich et al. 07 - Full-connected CRF [ Images from Rabinovich et al. 07 ]
So... What properties should these global co-occurence potentials have?
1. No hard decisions Desired properties
Desired properties 1. No hard decisions Incorporation in probabilistic framework Unlikely possibilities are not completely ruled out
Desired properties 1. No hard decisions 2. Invariance to region size
Desired properties 1. No hard decisions 2. Invariance to region size Cost for occurrence of {people, house, road etc.. } invariant to image area
Desired properties 1. No hard decisions 2. Invariance to region size The only possible solution : L(x)={,, } Local context Global context Cost defined over the assigned labels L(x)
Desired properties 1. No hard decisions 2. Invariance to region size 3. Parsimony simple solutions preferred L(x)={ aeroplane, tree, flower, building, boat, grass, sky } L(x)={ building, tree, grass, sky }
Desired properties 1. No hard decisions 2. Invariance to region size 3. Parsimony simple solutions preferred 4. Efficiency
Desired properties 1. No hard decisions 2. Invariance to region size 3. Parsimony simple solutions preferred 4. Efficiency a) Memory requirements as O(n) with the image size and number or labels b) Inference tractable
Previous work Torralba et al.(2003) Gist-based unary potentials Rabinovich et al.(2007) - complete pairwise graphs Csurka et al.(2008) - hard estimation of labels present
Related work Zhu & Yuille 1996 MDL prior Bleyer et al. 2010 Surface Stereo MDL prior Hoiem et al. 2007 3D Layout CRF MDL Prior C(x) = K L(x) Delong et al. 2010 label occurence cost C(x) = Σ L K L δ L (x)
Related work Zhu & Yuille 1996 MDL prior Bleyer et al. 2010 Surface Stereo MDL prior Hoiem et al. 2007 3D Layout CRF MDL Prior C(x) = K L(x) Delong et al. 2010 label occurence cost C(x) = Σ L K L δ L (x) All special cases of our model
Inference Pairwise CRF Energy
Inference IP formulation (Schlesinger 73)
Inference Pairwise CRF Energy with co-occurence
Inference IP formulation with co-occurence
Inference IP formulation with co-occurence Pairwise CRF cost Pairwise CRF constaints
Inference IP formulation with co-occurence Co-occurence cost
Inference IP formulation with co-occurence Inclusion constraints
Inference IP formulation with co-occurence Exclusion constraints
Inference LP relaxation Relaxed constraints
Inference LP relaxation Very Slow! 80 x 50 subsampled image takes 20 minutes
Inference: Our Contribution Pairwise representation One auxiliary variable Z 2 L Infinite pairwise costs if x i Z [see technical report] *Solvable using standard methods: BP, TRW etc.
Inference: Our Contribution Pairwise representation One auxiliary variable Z 2 L Infinite pairwise costs if x i Z [see technical report] *Solvable using standard methods: BP, TRW etc. Relatively faster but still computationally expensive!
Inference using Moves Graph Cut based move making algorithms [Boykov et al. 01] Series of locally optimal moves Each move reduces energy Optimal move by minimizing submodular function Current Solution Search Neighbourhood Move Space (t) : 2 N N L Number of Variables Number of Labels Space of Solutions (x) : L N α-expansion transformation function
Inference using Moves Graph Cut based move making algorithms [Boykov, Veksler, Zabih. 01] α-expansion transformation function
Inference using Moves Co-occurence representation Label indicator functions
Inference using Moves Move Energy Cost of current label set
Inference using Moves Move Energy Decomposition to α-dependent and α-independent part α-independent α-dependent
Inference using Moves Move Energy Decomposition to α-dependent and α-independent part Either α or all labels in the image after the move
Inference using Moves Move Energy submodular non-submodular
Inference Move Energy non-submodular Non-submodular energy overestimated by E'(t) E'(t) = E(t) for current solution E'(t) E(t) for any other labelling
Inference Move Energy non-submodular Non-submodular energy overestimated by E'(t) E'(t) = E(t) for current solution E'(t) E(t) for any other labelling Occurrence - tight
Inference Move Energy non-submodular Non-submodular energy overestimated by E'(t) E'(t) = E(t) for current solution E'(t) E(t) for any other labelling Co-occurrence overestimation
Inference Move Energy non-submodular Non-submodular energy overestimated by E'(t) E'(t) = E(t) for current solution E'(t) E(t) for any other labelling General case [See the paper]
Inference Move Energy non-submodular Non-submodular energy overestimated by E'(t) E'(t) = E(t) for current solution E'(t) E(t) for any other labelling Quadratic representation
Application: Object Segmentation Standard MRF model for Object Segmentation Label based Costs Cost defined over the assigned labels L(x)
Training of label based potentials Label set costs Approximated by 2 nd order representation Indicator variables for occurrence of each label
Methods Segment CRF Experiments Segment CRF + Co-occurrence Potential Associative HCRF [Ladický et al. 09] Associative HCRF + Co-occurrence Potential Datasets MSRC-21 Number of Images: 591 Number of Classes: 21 Training Set: 50% Test Set: 50% PASCAL VOC 2009 Number of Images: 1499 Number of Classes: 21 Training Set: 50% Test Set: 50%
MSRC - Qualitative
VOC 2010-Qualitative
Quantitative Results MSRC-21 PASCAL VOC 2009
Summary and further work Incorporated label based potentials in CRFs Proposed feasible inference Open questions Optimal training method for co-occurence Bounds of graph cut based inference Questions?