Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975)
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Find the region of highest density
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Pick a point
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Draw a window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean
Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window
Kernel Density Estimation 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University
To understand the mean shift algorithm Kernel Density Estimation Approximate the underlying PDF from samples Put bump on every sample to approximate the PDF
p(x) 1 2 3 4 5 6 7 8 9 10 probability density function cumulative density function
1 randomly sample 0
1 randomly sample 0
1 randomly sample 0 samples
place Gaussian bumps on the samples samples
samples
samples
samples
Kernel Density Estimate approximates the original PDF 1 2 3 4 5 6 7 8 9 10 samples
Kernel Density Estimation Approximate the underlying PDF from samples from it p(x) = X i (x x i ) 2 c i e 2 2 Gaussian bump aka kernel Put bump on every sample to approximate the PDF
Kernel Function K(x, x 0 ) a distance between two points
Epanechnikov kernel K(x, x 0 )= c(1 kx x 0 k 2 ) kx x 0 k 2 apple 1 0 otherwise Uniform kernel c kx x 0 k 2 apple 1 K(x, x 0 )= 0 otherwise Normal kernel K(x, x 0 )=c exp 1 2 kx x0 k 2 Radially symmetric kernels
Radially symmetric kernels can be written in terms of its profile K(x, x 0 )=c k(kx x 0 k 2 ) profile
Connecting KDE and the Mean Shift Algorithm
Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University
Mean-Shift Tracking Given a set of points: {x s } S s=1 x s 2R d and a kernel: K(x, x 0 ) Find the mean sample point: x
Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) v(x) =m(x) x 2. Update x x + v(x) Where does this algorithm come from?
Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) Where does this come from? v(x) =m(x) x 2. Update x x + v(x) Where does this algorithm come from?
How is the KDE related to the mean shift algorithm? Recall: Kernel density estimate (radially symmetric kernels) P (x) = 1 N c X n k(kx x n k 2 ) We can show that: Gradient of the PDF is related to the mean shift vector rp (x) / m(x) The mean shift is a step in the direction of the gradient of the KDE
In mean-shift tracking, we are trying to find this which means we are trying to
We are trying to optimize this: x = arg max x = arg max x P (x) 1 N c X n k( x x n 2 ) usually non-linear non-parametric How do we optimize this non-linear function?
We are trying to optimize this: x = arg max x = arg max x P (x) 1 N c X n k( x x n 2 ) usually non-linear non-parametric How do we optimize this non-linear function? compute partial derivatives, gradient descent
P (x) = 1 N c X n k(kx x n k 2 ) Compute the gradient
P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand the gradient (algebra)
P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 )
P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 ) Call the gradient of the kernel function g k 0 ( ) = g( )
P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 ) change of notation (kernel-shadow pairs) rp (x) = 1 N 2c X n (x n x)g(kx x n k 2 ) keep this in memory: k 0 ( ) = g( )
rp (x) = 1 N 2c X n (x n x)g(kx x n k 2 ) multiply it out rp (x) = 1 N 2c X n x n g(kx x n k 2 ) 1 xg(kx x n k 2 ) N 2c X n too long! (use short hand notation) rp (x) = 1 N 2c X n x n g n 1 N 2c X n xg n
rp (x) = 1 N 2c X n x n g n 1 N 2c X n xg n rp (x) = 1 N 2c X n multiply by one! x n g n P P n g n n g n 1 N 2c X n xg n rp (x) = 1 N 2c X n collecting like terms P g n x ng n n P n g n x Does this look familiar?
mean shift rp (x) = 1 N 2c X n P g n x ng n n P n g n x { constant { mean shift! The mean shift is a step in the direction of the gradient of the KDE m(x) v(x) = P n x ng n P n g n x = rp (x) 1 N 2c P n g n Gradient ascent with adaptive step size
Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) v(x) =m(x) x 2. Update x x + v(x) gradient with adaptive step size rp (x) 1 N 2c P n g n
Everything up to now has been about distributions over samples
Dealing with images Pixels for a lattice, spatial density is the same everywhere! What can we do?
Consider a set of points: {x s } S s=1 x s 2R d Associated weights: w(x s ) Sample mean: m(x) = P s K(x, x s)w(x s )x s P s K(x, x s)w(x s ) Mean shift: m(x) x
Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)w(x s )x P s s K(x, x s)w(x s ) v(x) =m(x) x 2. Update x x + v(x)
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
For images, each pixel is point with a weight
Finally mean shift tracking in video
Goal: find the best candidate location in frame 2 x center coordinate of target center coordinate y of candidate target candidate there are many candidates but only one target Frame 1 Frame 2 Use the mean shift algorithm to find the best candidate location
Non-rigid object tracking
Compute a descriptor for the target Target
Search for similar descriptor in neighborhood in next frame Target Candidate
Compute a descriptor for the new target Target
Search for similar descriptor in neighborhood in next frame Target Candidate
How do we model the target and candidate regions?
Modeling the target M-dimensional target descriptor q = {q 1,...,q M } (centered at target center) a fancy (confusing) way to write a weighted histogram q m = C X n Normalization factor sum over all pixels k(kx n k 2 ) [b(x n ) m] function of inverse distance (weight) Kronecker delta function quantization function bin ID A normalized color histogram (weighted by distance)
Modeling the candidate M-dimensional candidate descriptor p(y) ={p 1 (y),...,p M (y} (centered at location y) y 0 a weighted histogram at y p m = C h X n k y x n h 2! [b(x n ) m] bandwidth
Similarity between the target and candidate Distance function d(y) = p 1 [p(y), q] Bhattacharyya Coefficient (y) [p(y), q] = X m p pm (y)q u Just the Cosine distance between two unit vectors p(y) (y) = cos y = p(y)> q kpkkqk = X m p pm (y)q m q
Now we can compute the similarity between a target and multiple candidate regions
q target p(y) [p(y), q] image similarity over image
q target we want to find this peak p(y) [p(y), q] image similarity over image
Objective function min y d(y) same as max y [p(y), q] Assuming a good initial guess [p(y 0 + y), q] Linearize around the initial guess (Taylor series expansion) [p(y), q] 1 2 X p pm (y 0 )q m + 1 X r qm p m (y) 2 p m (y 0 ) m m function at specified value derivative
Linearized objective [p(y), q] 1 2 X m p pm (y 0 )q m + 1 2 X m p m (y) r qm p m (y 0 ) p m = C h X n k y x n h 2! [b(x n ) m] Remember definition of this? [p(y), q] 1 2 X m Fully expanded ( p pm (y 0 )q m + 1 X X C h k 2 m n y x n h 2! [b(x n ) m] ) r qm p m (y 0 )
[p(y), q] 1 2 X m Fully expanded linearized objective ( p pm (y 0 )q m + 1 X X C h k 2 m n y x n h 2! [b(x n ) m] ) r qm p m (y 0 ) [p(y), q] 1 2 X m Moving terms around p pm (y 0 )q m + C h 2 X n w n k y x n h 2! Does not depend on unknown y X r where w n = m Weighted kernel density estimate qm p m (y 0 ) [b(x n) m] Weight is bigger when q m >p m (y 0 )
OK, why are we doing all this math?
We want to maximize this max y [p(y), q]
We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2!
We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2!
We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] only need to maximize this! 2!
We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2! Mean Shift Algorithm! what can we use to solve this weighted KDE?
C h 2 X n w n k y h x n 2! the new sample of mean of this KDE is y 1 = (new candidate location) P n x nw n g P n w ng y0 y0 x n h x n h 2 2 (this was derived earlier)
Mean-Shift Object Tracking For each frame: 1. Initialize location Compute q Compute p(y 0 ) y 0 2. Derive weights w n 3. Shift to new candidate location (mean shift) y 1 4. Compute p(y 1 ) 5. If ky 0 y 1 k < return Otherwise and go back to 2 y 0 y 1
Compute a descriptor for the target Target q
Search for similar descriptor in neighborhood in next frame Target Candidate max y [p(y), q]
Compute a descriptor for the new target Target q
Search for similar descriptor in neighborhood in next frame Target Candidate max y [p(y), q]