Differential Motion Analysis (II)

Differential Motion Analysis (II) Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu 1/30

Outline Kernel-based Tracking Basic kernel-based tracking Multiple kernel tracking Context Flow 2/30

Representation The target is represented by a feature histogram q = [q 1, q 2,...,q m ] T R m, where q u = 1 C n K(s i x)δ(b(s i ), u) i=1 its matrix form q(x) = U T K(x) δ(b(s 1 ),u 1 )... δ(b(s 1 ),u m ) U =... R n m, K = 1 K(s 1 x) C. R n. δ(b(s n ),u 1 )... δ(b(s n ),u m ) K(s n x) Kernel profile: K(x) = k( x 2 ) denote by g(x) = k (x) 3/30

Formulation The target is initially at x to find the optimal motion x by mino(q,p(x + x)) x where q is the target model, and p is the image observation choices of O(, ) Bhattachayya coefficient O B ( x) = q, p(x + x) = q T p(x + x). Matusita metric O M ( x) = q p(x + x) 2. 4/30

Mean-shift tracking 1 O B = m pu (x + x)q u u=1 first order approximation O B ( x) = 1 2 where w i = m u=1 m u=1 pu (x)q u + 1 2C n i=1 w i K( x + x s i 2 ) h qu δ(b(s i ), u) p u (x) is the weight for s i CVPR 00 1 D. Comaniciu, V. Ramesh and P. Meer, Real-Time Tracking of Non-Rigid Objects using Mean Shift, 5/30

Mean-shift tracking So, we have min O B( x) = max x x n i=1 w i K( x + x s i 2 ) h The solution is an iterative mean-shift procedure x n i=1 n i=1 s i w i g( x s i 2 ) h w i g( x s i 2 ) h 6/30

SSD kernel-based tracking 2 let s use O M ( x) = q p(x + x) 2 Linearization where p(x + x) p(x) + 1 2 d(p(x)) 1 2 U T J k (x) x d(p(x)) = diag(p 1 (x),...,p m (x)) J k = [ c K(s 1 x) ] K K u v =. c K(s n x) 2 G. Hager, M. Dewan and C. Stewart, Multiple Kernel Tracking with SSD, CVPR 04 7/30

SSD kernel-based tracking So the objective function is O M ( x) = q p(x) 1 2 d(p(x)) 1 2 U T J k (x) x 2 Denote M(x) = 1 2 d(p(x)) 1 2U T J k (x) we have a linear system M x = q p(x) the solution is clear x = M ( q p(x)) 8/30

Singularities It is clear that M is in the form of d 1 x M =. d m x d 1 y. d m y where [ d j x d j y ] = 1 2 (s j i p x)g s j i x 2 j h i which is the center of mass for feature j. {[ ] } If dx j dy j, j = 1,...,m are linearly dependent, then rank(m) = 1 and the solution is not unique. 9/30

Optimal Kernel Placement 3 Different image regions have different properties. Some of them are singular, and some are far from singular. How can we find those that are far from singular? Checking the property of M. The Schatten 1-norm: A S = σ i The S-norm condition number κ S (A) = ( σ i ) 2 / σ i we can compute in a closed form κ S (M T M) = (M T M) S (M T M) S = exhaustive search v.s. gradient-based search ( (d j x) 2 + (d j y) 2 ) 2 (d j x ) 2 (d j y) 2 ( (d j xd j y)) 2 3 Zhimin Fan, Ming Yang, Ying Wu, Gang Hua and Ting Yu, Efficient Optimal Kernel Placement for Reliable Visual Tracking, CVPR 06 10/30

Optimal Kernel Placement 11/30

Kernel Concatenation Concatenate multiple kernels to increase the dimensionality of measurement the same as using more features a set of K kernels p i (x) = U T K i (x) stacking histograms into p and q. the objective function is min x K q p i (x + x) 2 i=1 easy to see the solution where M = 1 2 d(p) 1 2 M x = q p i (x) U T... U T J K1. J Kw 12/30

Kernel Combination Aggregating histograms to produce new features K K q = U T K i, p = U T K i (c). i=1 i=1 The objective function is min K q K p i (x + x) 2 x i=1 i=1 The corresponding linear system is: K q i K p i (x) = M x, i=1 i=1 where M = 1 K 2 d(p) 1 2 U T J Ki = K i=1 i=1 M i 13/30

Collaborative Multiple Kernels 4 x 1 Relaxed motion representation x =. x k Consider a structural constrain Objective function O(x 1,...,x k ) = Ω(x 1,...,x k ) = 0 k q i p i (x i ) 2 + γ Ω(x 1,...,x k ) 2 i=1 This is equivalent to a linear system { l = G x + ω1 y = M x + ω 2, 4 Zhimin Fan, Ying Wu and Ming Yang, Multiple Collaborative Kernel Tracking, CVPR 05 14/30

Collaborative Multiple Kernels where q1 M p(x 1 ) 1 0 0 0 0 M 2 0 0 y =. qk, M =. 0 0.. 0, p(x k ) 0 0 0 M k ] Ω Ω G = x 2 x k, l = Ω(x 1,x 2,...,x k ) [ Ω x 1 We have ([ ]) M rank γg rank(m) it enhances the motion observability 15/30

An Example special case: x 1 = x 2 =... = x k, and γ is chosen as the optimal Lagrangian multiplier, then I I I I G =......, and l = 0. I I we have rank(g) = (k 1) dim(x 1 ). E.g., supposing k = 10 and dim(x 1 ) = 2, this implies that the motion resides in a 2-D manifold in R 20. Thus, as long as rank(m) dim(x 1 ), all the motion parameters are observable, or can be uniquely determined. It is be easily satisfied if any of the xi is observable through its kernel, there are a number of dim(x1 ) motion parameters that are observable through multiple kernels. 16/30

Solution and Collaboration The solution x = (M T M + γg T G) 1 (M T y + γg T l). A more efficient solution x = (I D)(M T M) 1 (M T y + γg T l), where D = γ(m T M) 1 G T (γg(m T M) 1 G T + I) 1 G Notice that x u = (M T M) 1 M T y = M y, is the solution to the independent kernels, and x = (I D) x u + z(x) The collaboration through a fixed-point recursion x k+1 (I D k )[M( x k )] y k + z k, 17/30

MKL for scale 5 Determining the scale of the target is an important issue It is related to the scale of the kernel Basic idea: using mean-shift in the spatial-scale space (x, σ) Algorithm: alternating a spatial mean-shift and a scale one 1. initial states (x 0,σ 0 ); 2. fix σ 0, perform a 2-D spatial mean-shift to obtain x ; 3. fix x, perform a 1-D scale mean-shift to obtain σ ; 4. repeat 2 and 3 until convergence. 5 Robert Collins, Mean-shift Blob Tracking through Scale Space, CVPR 03 18/30

Outline Kernel-based Tracking Basic kernel-based tracking Multiple kernel tracking Context Flow 19/30

Distraction and Matching Ambiguity Spatial context can reduce matching ambiguity Questions: Modeling context for motion analysis? Methods resilient to local variations? 20/30

Spatial Context (for object recognition) Structure-stiff (e.g., template and filters) Structure-flexible random fields deformable templates shape context, AutoContext Structure-free bag-of-words or bag-of-features 21/30

Modeling Spatial Context Location x is associated with features f(x) feature class {ω 1,...,ω N } individual context: C i = {y f(y) ω i,y Ω(x)}, N total context: C = C i. i=1 context representation: p(ω i x) p(x ω i )p(ω i ) 22/30

Contextual Maps 23/30

Brightness Constancy Context Constancy 6 Context constancy p(ω i x + x, t + t, C) = p(ω i x, t, C) The motion x shall not change the context More flexible than constant brightness insensitive to lighting insensitive to local deformation Let s impose a small motion assumption... 6 Ying Wu and Jialue Fan, Contextual Flow, CVPR 09 24/30

A Differential Form T x p(ω i x, t) }{{} x + t p(ω i x, t) }{{} t = 0 contextual gradient contextual frame difference Contextual frame difference is approximated by p(ω i x, t + t) p(ω i x, t) Contextual gradient (details follow) { x p(ω i x) = x p(ω i ) p(x ω } i) p(x) = 1 [ ] c p(ω i x) µ i (x) µ 0 (x) 25/30

Context Gradient Conditional Shift µ i (x) = E{(y x) y ω i } = 1 Z i (x) After simple manipulation Total shift µ i (x) = c xp(x ω i ) p(x ω i ) µ 0 (x) = E{(y x) y Ω} = c xp(x) p(x) Ω (y x)p(y ω i )dy so we have x p(ω i x) = 1 [ ] c p(ω i x) µ i (x) µ 0 (x) 26/30

Illustration: Contextual Gradient 27/30

Context Flow Constraint It is easy to see: [ ] [ T p(ωi x, t + 1) µ i (x) µ 0 (x) x + c }{{} p(ω i x, t) µ i (x) ] 1 } {{ } b i = 0 µ i (x) is the centered shift b i is the change of context ratio Contextual flow constraint µ i (x) T x b i = 0 28/30

Local Contextual System Each context gives a constrain weighted by W i (x) = p(ω i x, t), and W(x) = diag[w 1 (x),...,w N (x)] Denote by U r (x) = [ µ 1 (x),..., µ N (x)] T, br (x) = [b 1, b 2...,b N ] T, U(x) = W(x)U r (x), b(x) = W(x)b r (x) we have a linear contextual system U(x) x = b(x), or simply U x = b 29/30

Extended Lucas-Kanade Method If U is rank deficient, we have an aperture problem as well Considering the nearby locations X = {x 1,...,x m } each of which is associated with a contextual system U i (x i ) x i = b(x i ), or simply U i x i = b i where x i is the motion for location x i. If they share the same motion, i.e., x i = x, then Extended Lucas-Kanade method U 1 b 1... x = U c x =... U m b m 30/30