Mean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University

Similar documents
CS 556: Computer Vision. Lecture 21

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Video and Motion Analysis Computer Vision Carnegie Mellon University (Kris Kitani)

Lucas-Kanade Optical Flow. Computer Vision Carnegie Mellon University (Kris Kitani)

Improved Fast Gauss Transform. Fast Gauss Transform (FGT)

CPSC 540: Machine Learning

Chapter 9. Non-Parametric Density Function Estimation

Non-parametric Methods

Regression with Numerical Optimization. Logistic

More on Estimation. Maximum Likelihood Estimation.

12 - Nonparametric Density Estimation

Introduction to Machine Learning

Chapter 9. Non-Parametric Density Function Estimation

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Improved Fast Gauss Transform

Review for Exam 1. Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA

Improved Fast Gauss Transform. (based on work by Changjiang Yang and Vikas Raykar)

Image Alignment Computer Vision (Kris Kitani) Carnegie Mellon University

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

Generative v. Discriminative classifiers Intuition

Nonparametric Methods

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Introduction to Machine Learning

Gaussian mean-shift algorithms

Robert Collins CSE598G Mean-Shift Blob Tracking through Scale Space

Modelling Non-linear and Non-stationary Time Series

CMU-Q Lecture 24:

Midterm exam CS 189/289, Fall 2015

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Mixture Models and EM

Gaussian Mixture Models

Expectation maximization

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

EKF and SLAM. McGill COMP 765 Sept 18 th, 2017

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings

A Discriminatively Trained, Multiscale, Deformable Part Model

Bias-Variance Tradeoff

Shape of Gaussians as Feature Descriptors

2D Image Processing (Extended) Kalman and particle filter

Generative v. Discriminative classifiers Intuition

Support Vector Machines

3.1. QUADRATIC FUNCTIONS AND MODELS

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods

Announcements. Proposals graded

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Expectation Propagation Algorithm

Mean Shift is a Bound Optimization

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Econ 582 Nonparametric Regression

Kaggle.

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

L5 Support Vector Classification

Visual Tracking via Geometric Particle Filtering on the Affine Group with Optimal Importance Functions

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

10-701/15-781, Machine Learning: Homework 4

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Linear Models for Regression CS534

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

MATH 2250 Final Exam Solutions

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

Non-parametric Methods

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CS 4495 Computer Vision Principle Component Analysis

Machine Learning Lecture 5

Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning

Probabilistic Graphical Models

Density Estimation (II)

CS 147: Computer Systems Performance Analysis

Review Smoothing Spatial Filters Sharpening Spatial Filters. Spatial Filtering. Dr. Praveen Sankaran. Department of ECE NIT Calicut.

(Extended) Kalman Filter

Computational Physics (6810): Session 13

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Distances in R 3. Last time we figured out the (parametric) equation of a line and the (scalar) equation of a plane:

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CSC2515 Assignment #2

Analysis methods of heavy-tailed data

UNDETERMINED COEFFICIENTS SUPERPOSITION APPROACH *

CSE 598C Vision-based Tracking Seminar. Times: MW 10:10-11:00AM Willard 370 Instructor: Robert Collins Office Hours: Tues 2-4PM, Wed 9-9:50AM

Functional Gradient Descent

Quarter 2 400, , , , , , ,000 50,000

Geometry Summer Assignment 2018

Joint Optimization of Segmentation and Appearance Models

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

A-Level Maths Induction Summer Work

SIFT: SCALE INVARIANT FEATURE TRANSFORM BY DAVID LOWE

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

MAT Mathematics in Today's World

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

Chapter 7. Functions and onto. 7.1 Functions

Nonparametric Estimation of Luminosity Functions

Transcription:

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975)

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Find the region of highest density

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Pick a point

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Draw a window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Compute the mean

Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Shift the window

Kernel Density Estimation 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

To understand the mean shift algorithm Kernel Density Estimation Approximate the underlying PDF from samples Put bump on every sample to approximate the PDF

p(x) 1 2 3 4 5 6 7 8 9 10 probability density function cumulative density function

1 randomly sample 0

1 randomly sample 0

1 randomly sample 0 samples

place Gaussian bumps on the samples samples

samples

samples

samples

Kernel Density Estimate approximates the original PDF 1 2 3 4 5 6 7 8 9 10 samples

Kernel Density Estimation Approximate the underlying PDF from samples from it p(x) = X i (x x i ) 2 c i e 2 2 Gaussian bump aka kernel Put bump on every sample to approximate the PDF

Kernel Function K(x, x 0 ) a distance between two points

Epanechnikov kernel K(x, x 0 )= c(1 kx x 0 k 2 ) kx x 0 k 2 apple 1 0 otherwise Uniform kernel c kx x 0 k 2 apple 1 K(x, x 0 )= 0 otherwise Normal kernel K(x, x 0 )=c exp 1 2 kx x0 k 2 Radially symmetric kernels

Radially symmetric kernels can be written in terms of its profile K(x, x 0 )=c k(kx x 0 k 2 ) profile

Connecting KDE and the Mean Shift Algorithm

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Mean-Shift Tracking Given a set of points: {x s } S s=1 x s 2R d and a kernel: K(x, x 0 ) Find the mean sample point: x

Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) v(x) =m(x) x 2. Update x x + v(x) Where does this algorithm come from?

Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) Where does this come from? v(x) =m(x) x 2. Update x x + v(x) Where does this algorithm come from?

How is the KDE related to the mean shift algorithm? Recall: Kernel density estimate (radially symmetric kernels) P (x) = 1 N c X n k(kx x n k 2 ) We can show that: Gradient of the PDF is related to the mean shift vector rp (x) / m(x) The mean shift is a step in the direction of the gradient of the KDE

In mean-shift tracking, we are trying to find this which means we are trying to

We are trying to optimize this: x = arg max x = arg max x P (x) 1 N c X n k( x x n 2 ) usually non-linear non-parametric How do we optimize this non-linear function?

We are trying to optimize this: x = arg max x = arg max x P (x) 1 N c X n k( x x n 2 ) usually non-linear non-parametric How do we optimize this non-linear function? compute partial derivatives, gradient descent

P (x) = 1 N c X n k(kx x n k 2 ) Compute the gradient

P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand the gradient (algebra)

P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 )

P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 ) Call the gradient of the kernel function g k 0 ( ) = g( )

P (x) = 1 N c X n k(kx x n k 2 ) Gradient rp (x) = 1 N c X n rk(kx x n k 2 ) Expand gradient rp (x) = 1 N 2c X n (x x n )k 0 (kx x n k 2 ) change of notation (kernel-shadow pairs) rp (x) = 1 N 2c X n (x n x)g(kx x n k 2 ) keep this in memory: k 0 ( ) = g( )

rp (x) = 1 N 2c X n (x n x)g(kx x n k 2 ) multiply it out rp (x) = 1 N 2c X n x n g(kx x n k 2 ) 1 xg(kx x n k 2 ) N 2c X n too long! (use short hand notation) rp (x) = 1 N 2c X n x n g n 1 N 2c X n xg n

rp (x) = 1 N 2c X n x n g n 1 N 2c X n xg n rp (x) = 1 N 2c X n multiply by one! x n g n P P n g n n g n 1 N 2c X n xg n rp (x) = 1 N 2c X n collecting like terms P g n x ng n n P n g n x Does this look familiar?

mean shift rp (x) = 1 N 2c X n P g n x ng n n P n g n x { constant { mean shift! The mean shift is a step in the direction of the gradient of the KDE m(x) v(x) = P n x ng n P n g n x = rp (x) 1 N 2c P n g n Gradient ascent with adaptive step size

Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)x P s s K(x, x s) v(x) =m(x) x 2. Update x x + v(x) gradient with adaptive step size rp (x) 1 N 2c P n g n

Everything up to now has been about distributions over samples

Dealing with images Pixels for a lattice, spatial density is the same everywhere! What can we do?

Consider a set of points: {x s } S s=1 x s 2R d Associated weights: w(x s ) Sample mean: m(x) = P s K(x, x s)w(x s )x s P s K(x, x s)w(x s ) Mean shift: m(x) x

Mean-Shift Algorithm Initialize x While v(x) > 1. Compute mean-shift P m(x) = s K(x, x s)w(x s )x P s s K(x, x s)w(x s ) v(x) =m(x) x 2. Update x x + v(x)

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

For images, each pixel is point with a weight

Finally mean shift tracking in video

Goal: find the best candidate location in frame 2 x center coordinate of target center coordinate y of candidate target candidate there are many candidates but only one target Frame 1 Frame 2 Use the mean shift algorithm to find the best candidate location

Non-rigid object tracking

Compute a descriptor for the target Target

Search for similar descriptor in neighborhood in next frame Target Candidate

Compute a descriptor for the new target Target

Search for similar descriptor in neighborhood in next frame Target Candidate

How do we model the target and candidate regions?

Modeling the target M-dimensional target descriptor q = {q 1,...,q M } (centered at target center) a fancy (confusing) way to write a weighted histogram q m = C X n Normalization factor sum over all pixels k(kx n k 2 ) [b(x n ) m] function of inverse distance (weight) Kronecker delta function quantization function bin ID A normalized color histogram (weighted by distance)

Modeling the candidate M-dimensional candidate descriptor p(y) ={p 1 (y),...,p M (y} (centered at location y) y 0 a weighted histogram at y p m = C h X n k y x n h 2! [b(x n ) m] bandwidth

Similarity between the target and candidate Distance function d(y) = p 1 [p(y), q] Bhattacharyya Coefficient (y) [p(y), q] = X m p pm (y)q u Just the Cosine distance between two unit vectors p(y) (y) = cos y = p(y)> q kpkkqk = X m p pm (y)q m q

Now we can compute the similarity between a target and multiple candidate regions

q target p(y) [p(y), q] image similarity over image

q target we want to find this peak p(y) [p(y), q] image similarity over image

Objective function min y d(y) same as max y [p(y), q] Assuming a good initial guess [p(y 0 + y), q] Linearize around the initial guess (Taylor series expansion) [p(y), q] 1 2 X p pm (y 0 )q m + 1 X r qm p m (y) 2 p m (y 0 ) m m function at specified value derivative

Linearized objective [p(y), q] 1 2 X m p pm (y 0 )q m + 1 2 X m p m (y) r qm p m (y 0 ) p m = C h X n k y x n h 2! [b(x n ) m] Remember definition of this? [p(y), q] 1 2 X m Fully expanded ( p pm (y 0 )q m + 1 X X C h k 2 m n y x n h 2! [b(x n ) m] ) r qm p m (y 0 )

[p(y), q] 1 2 X m Fully expanded linearized objective ( p pm (y 0 )q m + 1 X X C h k 2 m n y x n h 2! [b(x n ) m] ) r qm p m (y 0 ) [p(y), q] 1 2 X m Moving terms around p pm (y 0 )q m + C h 2 X n w n k y x n h 2! Does not depend on unknown y X r where w n = m Weighted kernel density estimate qm p m (y 0 ) [b(x n) m] Weight is bigger when q m >p m (y 0 )

OK, why are we doing all this math?

We want to maximize this max y [p(y), q]

We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2!

We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2!

We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] only need to maximize this! 2!

We want to maximize this max y [p(y), q] [p(y), q] 1 2 X m where Fully expanded linearized objective p pm (y 0 )q m + C h 2 doesn t depend on unknown y w n = X m r qm X n w n k y x n h p m (y 0 ) [b(x n) m] 2! Mean Shift Algorithm! what can we use to solve this weighted KDE?

C h 2 X n w n k y h x n 2! the new sample of mean of this KDE is y 1 = (new candidate location) P n x nw n g P n w ng y0 y0 x n h x n h 2 2 (this was derived earlier)

Mean-Shift Object Tracking For each frame: 1. Initialize location Compute q Compute p(y 0 ) y 0 2. Derive weights w n 3. Shift to new candidate location (mean shift) y 1 4. Compute p(y 1 ) 5. If ky 0 y 1 k < return Otherwise and go back to 2 y 0 y 1

Compute a descriptor for the target Target q

Search for similar descriptor in neighborhood in next frame Target Candidate max y [p(y), q]

Compute a descriptor for the new target Target q

Search for similar descriptor in neighborhood in next frame Target Candidate max y [p(y), q]