Level-Set Person Segmentation and Tracking with Multi-Region Appearance Models and Top-Down Shape Information Appendix

Similar documents
Human Pose Tracking I: Basics. David Fleet University of Toronto

CSC487/2503: Foundations of Computer Vision. Visual Tracking. David Fleet

KillingFusion: Non-rigid 3D Reconstruction without Correspondences. Supplementary Material

Motion Estimation (I) Ce Liu Microsoft Research New England

Motion Estimation (I)

Modulation-Feature based Textured Image Segmentation Using Curve Evolution

Riemannian Metric Learning for Symmetric Positive Definite Matrices

INTEREST POINTS AT DIFFERENT SCALES

Active Contours Snakes and Level Set Method

Clustering with k-means and Gaussian mixture distributions

A Closed-Form Solution for Image Sequence Segmentation with Dynamical Shape Priors

Lecture 8: Interest Point Detection. Saad J Bedros

Segmenting Images on the Tensor Manifold

Introduction to motion correspondence

Analysis of Numerical Methods for Level Set Based Image Segmentation

Iterative Image Registration: Lucas & Kanade Revisited. Kentaro Toyama Vision Technology Group Microsoft Research

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Introduction to Gaussian Processes

Visual Tracking via Geometric Particle Filtering on the Affine Group with Optimal Importance Functions

Lecture 8: Interest Point Detection. Saad J Bedros

A Pseudo-distance for Shape Priors in Level Set Segmentation

Tracking. Readings: Chapter 17 of Forsyth and Ponce. Matlab Tutorials: motiontutorial.m, trackingtutorial.m

Spatial Transformation

Tracking. Readings: Chapter 17 of Forsyth and Ponce. Matlab Tutorials: motiontutorial.m. 2503: Tracking c D.J. Fleet & A.D. Jepson, 2009 Page: 1

A Background Layer Model for Object Tracking through Occlusion

Introduction to Nonlinear Image Processing

Tracking Human Heads Based on Interaction between Hypotheses with Certainty

N-mode Analysis (Tensor Framework) Behrouz Saghafi

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

Motion estimation. Digital Visual Effects Yung-Yu Chuang. with slides by Michael Black and P. Anandan

Achieving scale covariance

TRACKING and DETECTION in COMPUTER VISION

Joint Optimization of Segmentation and Appearance Models

LoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Deformation and Viewpoint Invariant Color Histograms

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Single-Image-Based Rain and Snow Removal Using Multi-guided Filter

Machine Learning Lecture 3

Lecture 7: Finding Features (part 2/2)

Fisher Vector image representation

Simultaneous Segmentation and Object Behavior Classification of Image Sequences

Bayesian Linear Regression. Sargur Srihari

Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction

Digital Matting. Outline. Introduction to Digital Matting. Introduction to Digital Matting. Compositing equation: C = α * F + (1- α) * B

A Short Essay on Variational Calculus

Introduction to Machine Learning

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Statistical Filters for Crowd Image Analysis

A Generative Perspective on MRFs in Low-Level Vision Supplemental Material

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Blobs & Scale Invariance

Total Variation Blind Deconvolution: The Devil is in the Details Technical Report

Malliavin Calculus in Finance

CSE 473/573 Computer Vision and Image Processing (CVIP)

Abnormal Activity Detection and Tracking Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University

Two Simple Nonlinear Edge Detectors

Point Distribution Models

CS5670: Computer Vision

Generalized Newton-Type Method for Energy Formulations in Image Processing

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

A Flexible Framework for Local Phase Coherence Computation

A Generative Model Based Approach to Motion Segmentation

Lesson 04. KAZE, Non-linear diffusion filtering, ORB, MSER. Ing. Marek Hrúz, Ph.D.

Pushmeet Kohli Microsoft Research

Machine Learning Lecture 5

RESTORATION OF VIDEO BY REMOVING RAIN

Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Social Saliency Prediction. Hyun Soo Park and Jianbo Shi

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Optical Flow, Motion Segmentation, Feature Tracking

STA 4273H: Statistical Machine Learning

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Outline Introduction Edge Detection A t c i ti ve C Contours

Optical flow. Subhransu Maji. CMPSCI 670: Computer Vision. October 20, 2016

Edge Detection. CS 650: Computer Vision

Kernel Density Topic Models: Visual Topics Without Visual Words

2D Image Processing (Extended) Kalman and particle filter

Directional Derivatives and Gradient Vectors. Suppose we want to find the rate of change of a function z = f x, y at the point in the

Clustering with k-means and Gaussian mixture distributions

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

When taking a picture, what color is a (Lambertian) surface?

A Tutorial on Active Contours

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Degeneracies, Dependencies and their Implications in Multi-body and Multi-Sequence Factorizations

Markov Random Fields

Feature detectors and descriptors. Fei-Fei Li

Outline Lecture 2 2(32)

Loss Functions and Optimization. Lecture 3-1

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Supplementary Material for Superpixel Sampling Networks

Markov Random Field driven Region-Based Active Contour Model (MaRACel): Application to Medical Image Segmentation

Gaussian discriminant analysis Naive Bayes

Invariant local features. Invariant Local Features. Classes of transformations. (Good) invariant local features. Case study: panorama stitching

Weighted Minimal Surfaces and Discrete Weighted Minimal Surfaces

Transcription:

Level-Set Person Segmentation and Tracking with Multi-Region Appearance Models and Top-Down Shape Information Appendix Esther Horbert, Konstantinos Rematas, Bastian Leibe UMIC Research Centre, RWTH Aachen University horbert,leibe@umic.rwth-aachen.de This appendix contains a detailed derivation of our segmentation and tracking framework [27] and lists typical values for important parameters. A. Derivation Fig. shows the results of our full model for the sequence WALKSTRAIGHT [26], illustrating the evolution of the three contours. Figure 2. The generative model used in this approach, M Mf, Mf 2, Mb, Mb2. x Pixel s coordinates inside reference frame y Pixel s color p Reference frame position h Shape model W x, p) Warp with parameters p Mf, M f 2 Foreground regions Mb, Mb2 Background regions P y Mk ) Appearance models Φ Level set embedding function Φc, Φf, Φb Embeddings for person and fore/background Ck Contour represented by the zero level set H z) Smoothed Heaviside step function δ z) Smoothed Dirac delta function Table. Notation used in this paper Figure. Results of our full model for the sequence WALK STRAIGHT 25 frames, from [26]). This example nicely illustrates the evolution of the separating contours dark red). The foreground contour Φf is initialized with a horizontal line at 50% of the object frame height, the background contour Φb with a line at 60%. However this is only the initialization and the lines can evolve to very different forms as the person s contour does).

The joint distribution for one pixel given by the model in Fig. 2 is: P x i, y i, h i, Φ, p, M) P x i Φ, p, M)P y i M)P h i M)P M)P Φ)P p) ) assumption: y, h independent) 2) P x i, Φ, p, M y i, h i )P y i )P h i ) P x i Φ, p, M)P y i M)P h i M)P M)P Φ)P p) 3) P x i, Φ, p, M y i, h i ) P x i Φ, p, M)P M y i )P M h i )P Φ)P p) 4) Marginalization over the models M yields the pixel-wise posterior probability of shape Φ and location p given a pixel x i, y i, h i : k f,f2,b,b2 k f,f2,b,b2 P x i, Φ, p, M k y i, h i ) P x i, Φ, p y i, h i ) P Φ, p x i, y i, h i )P x i ) 5) l P y i M l )P M l ) P M k h i ) P Φ)P p) 6) It follows P Φ, p x i, y i, h i ) P x) k f,f2,b,b2 l P y M l)p M l ) P M k h i ) P Φ)P p) 7) We use a smoothed Heaviside step function H ɛ to select the respective regions and a smoothed Dirac delta function δ ɛ to select the contours: H c H ɛ Φ c x i )), Hc H ɛ Φ c x i )) 8) H f H ɛ Φ f x i )), Hf H ɛ Φ f x i )) 9) H b H ɛ Φ b x i )), Hb H ɛ Φ b x i )) 0) P M k ) η k η, k f, f2, b, b2, M f M f M f2, M b M b M b2 ) The number of pixels in the four respective regions can be obtained as follows: N N N η k, η f H c H f, η f2 H chf, 2) η b N H c H b, η b2 N H c Hb 3)

and thus the probability of pixel x i for each region: P x i Φ, p, M f ) H ch f, P x i Φ, p, M f2 ) H ch f, η f η f2 4) P x i Φ, p, M b ) H c H b, P x i Φ, p, M b2 ) H c Hb. η b η b2 5) Fusing the pixel-wise posteriors with a logarithmic opinion pool yields P Φ, p x, y, h) 6) N [ ] l P y i M l )P M l ) P M k h i ) P Φ)P p) 7) k f,f2,b,b2 N [ Hc H f P y i M f )P M f ) η f l P y i M l )P M l ) P M f h i ) + H ch f P y i M f2 )P M f2 ) η f2 l P y i M l )P M l ) P M f2 h i ) 8) + H c H b P y i M b )P M b ) η b l P y i M l )P M l ) P M b h i ) + H c Hb P y i M b2 )P M b2 ) ] η b2 l P y i M l )P M l ) P M b2 h i ) P Φ)P p) 9) N [ H c H f P f + H chf P f2 + H c H b P b + H c Hb P b2 ]P Φ)P p) 20) N P x i Φ, p, y i, h i )P Φ)P p) 2) where P k P y i M k )P M k )P M k h i ) η k l P y i M l )P M l ) P y i M k )P M k h i ) l η, k f, f2, b, b2 22) lp y i M l ) P x i Φ, p, y i, h i )H c H f P f + H c Hf P f2 + H c H b P b + H c Hb P b2. 23) P y i M k ) is computed from the appearance models, i.e. color histograms. Eq. 2) contains P M k h) for four regions, but the detector only provides probabilities for two regions: foreground and background. However, 2) selects a model for each region by use of H, H f and H b respectively. We set Thus it holds: P M f h) P M f h) + P M f2 h) H f P M f h) + H f P M f h) 24) P M b h) P M b h) + P M b2 h) H b P M b h) + H b P M b h). 25) Either P M f h) P M f h) or P M f h) P M f2 h) and 26) either P M b h) P M b h) or P M b h) P M b2 h). 27) This means in practice P M f h) and P M f h) are both set to P M f h) background accordingly), which is possible because for each pixel one of the subregions is selected. We now specify P Φ) as the internal energy of the level set embedding functions). It contains a geometric prior that rewards a signed distance function: the gradient of the level set function is a normal distribution with mean. It thus makes the level set embedding function numerically stable without the need for periodic re-initializations [20]. The second term as in [0]) describes the length of the contour and rewards a smoother contour. This is very useful for cluttered scenes, where pixels with foreground or background appearance can form very small regions, which can easily result in a very uneven contour. N [ P Φ) σ 2π exp Φx i) ) 2 ) )] 2σ 2 exp λ H ɛ Φ) 28)

where σ and λ are the weights of the priors. Maximizing the posterior is equivalent to minimizing its negative logarithm: EΦ) logp Φ, p x, y, h)) N logp x i Φ, p, y i, h i )) Φ )2 2σ 2 A.. Derivation of the Segmentation Framework ) ) λ H ɛ Φ) + N log σ + logp p)) 2π For segmentation we optimize 29) w.r.t Φ, so the last two terms can be dropped, the rest is then differentiated by calculus of variation: Φ c EΦ c) δh f P f + δh f P f2 δh b P b δ H b P b2 )] [ 2 Φc t Φ c P x Φ, p, y, h) σ 2 Φ c ) div 30) Φ c ) Φc +λδ ɛ Φ c )div Φ f t Φ b t EΦ f ) Φ f EΦ b) Φ b Φ c H cδ f P f H c δ f P f2 + H c H b P b + H c Hb P b2 P x Φ, p, y, h) ) Φf +λδ ɛ Φ f )div Φ f :H >0)! δ f P f P f2 ) P x Φ, p, y, h) σ 2 accordingly. )] [ 2 Φf σ 2 Φ f ) div [ )] ) 2 Φf Φf Φ f ) div + λδ ɛ Φ f )div We evolve the two additional level set functions interleaved with the original level set function Φ c. In this way, the four appearance models are optimized at the same time, which leads to more robust and accurate segmentation results. In our implementation we use σ 2 50, λ 2, τ 2, ɛ 6 and 0 if x < ɛ x H ɛ x) 2ɛ + 2π sin ) πx ɛ + 2 if x < ɛ 33) if x > ɛ A.2. Derivation of the Tracking Framework δ ɛ x) 2ɛ + cos πx )) ɛ if x < ɛ 0 else In preparation for differentiation w.r.t p some terms in 29) can be dropped: 29) 3) 32) 34) N ) EΦ) logp x i Φ, p, y i, h i )) + logp p)) + const. 35) Now the warp Wx i, p) is introduced into 35), i.e. pixels x i are warped with parameters p. P p) is dropped for the moment, this is handled with drift correction, as in [3]: EΦ) N log P Wx i, p) Φ, p, y i, h i ) 36) We maximize w.r.t p: p arg max p N log P Wx i, p) Φ, p, y i, h i ) 37)

We use a second order Newton optimization scheme as in [4]: With the short-hand notation P...) P Wx i, p) Φ, p, y i, h i ): where with [ N p P...) P...) ) 2 ] N P...) P...) J ch f + H c J f )P f + J chf H c J f )P f2 J c H b P b J c Hb P b2 J c H f P f + H f P f2 H b P b H b P b2 ) + J f H c P f H c P f2 ) 39) J c H c Φ c W Φ c x J f H f Φ f Φ f x 38) p δ ɛφ c x i )) Φ c x i ) W p, 40) W p δ ɛφ f x i )) Φ f x i ) W p 4) We assume that the background does not move with the foreground, thus H ɛ Φ b x i )) is constant w.r.t the derivation. Eq. 39) illustrates that both the person s contour and the division line of the foreground contribute to the warp, whereas the division line of the background does not contribute: J c and J f contain the factor δ ɛ Φ k ), the Dirac delta function of the respective level set function, which is only greater than zero in a narrow band around the contour with width ɛ). For parameters p s, t x, t y ) T with scale and translation the warp is: ) x ) + s 0 tx Wx; p) y + s) x + tx 42) 0 + s t y + s) y + t y The term W is the Jacobian of the warp and with Wx; p) W xx; p), W y x; p)) T : W x ) 0 y 0 43) In our implementation the rigid registration typically converges after 0 to 40 iterations. References [3] C. Bibby and I. Reid. Robust Real-Time Visual Tracking using Pixel-Wise Posteriors. ECCV, 2008. [4] C. Bibby and I. Reid. Real-time Tracking of Multiple Occluding Objects using Level Sets. CVPR, 200. [0] D. Cremers, M. Rousson, and R. Deriche. A Review of Statistical Approaches to Level Set Segmentation Integrating Color, Texture, Motion and Shape. IJCV, 72:95-25, 2007. [20] C. Li, C. Xu, C. Gui, and M. Fox. Level Set Evolution without Re-initialization: A New Variational Formulation. CVPR, 2005. [26] H. Sidenbladh and M. J. Black. Learning the statistics of people in images and video. IJCV, 54-3):83-209, 2003. [27] E. Horbert, K. Rematas and B. Leibe. Level-Set Person Segmentation and Tracking with Multi-Region Appearance Models and Top-Down Shape Information. ICCV, 20.