What is Motion? As Visual Input: Change in the spatial distribution of light on the sensors.

What is Motion? As Visual Input: Change in the spatial distribution of light on the sensors. Minimally, di(x,y,t)/dt 0 As Perception: Inference about causes of intensity change, e.g. I(x,y,t) v OBJ (x,y,z,t)

Motion Field: Movement of Projected points

Essential Matrix O' x x'= R t x 000 1 ' O x OO' = 0 x' r t Rx ( ) = 0 x' E x ( ) = 0 x'e x = 0 E = 0 t z t y t z 0 t x R t y t x 0

Differential Camera Motion For a small rotation around an axis ω, rotation of the camera frame can be r expressed: x '= Rx r + r t And motion of a point: Thus r x '= x r + (ω x r )dt + v r dt = (I + dt[ ω ]) x r + v r dt p'= p + dp dt dt Epipolar:

Basic Idea 1) Estimate point motions 2) use point motions to estimate camera/object motion Problem: Motion of projected points not directly measurable. -Movement of projected points creates displacements of image patches -- Infer point motion from image patch motion Matching across frames Differential approach Fourier/filtering methods

Assumption is frequently violatedneed invariant methods

h(x) = udt

Derivation I(x, y,τ) = I(x(τ + t),y(τ + t),τ + t) I(x, y,τ) I(x(τ),y(τ),τ) + I x I x x t + I y x t + I t = 0 I x v x + I y v y + I t = 0 I I t T v x v y = 0 1 x t + I y x t + I t I v T x = I v t y r A T v = I t r v = A(AA T ) 1 I t r v = I I t I 2

Brightness constraint revealed

Image sequence from Egomotion

OpticFlow: (Gibson,1950) Assigns local image velocities v(x,y,t) Time ~100msec Space ~1-10deg Local Translations

Brightness constraint

Photometric Motion

Measurement Problems Brightness constraint violations Ambiguity Aliasing Non-translational motions

Aliasing

Non-translational motion

Problem: Images contain many edges-- Aperture problem Normal flow: Motion component in the direction of the edge

Find Least squares solution for multiple patches.

Aperture Problem (Motion/Form Ambiguity) Result: Early visual measurements are ambiguous w.r.t. motion.

Aperture Problem (Motion/Form Ambiguity) However, both the motion and the form of the pattern are implicitly encoded across the population of V1 neurons. Actual motion

Fourier Methods More fundamental than the Taylor approximation is the brightness constraint

Localizing by a windowing function: Which is the equation of a plane, weighted by the spatial texture Localizing the velocity to a patch using a windowing function: Has the effect of blurring the plane with the transform of the window: A Fourier Pancake.

X-T Slice of Translating Camera y t x x

2-D Fourier Analysis Image translations: are Oriented in Space-Time have Power Spectra along a line in Fourier space

3-D Motion Information (x-y-t)

Information in Translating Images The power spectral density of a translating image lies on a plane in (ω x, ω y, ω t ) space. The orientation of this plane is uniquely determined by the velocity of the translation. The amplitudes on the plane are determined by the (spatial) image spectrum.

Solving the ambiguities Pool neurons tuned to frequencies in a common plane ω t ω x ω y Velocity plane V1 receptive field

Generalizations

The three main issues in tracking

Tracking Very general model: We assume there are moving objects, which have an underlying state X There are measurements Y, some of which are functions of this state There is a clock at each tick, the state changes at each tick, we get a new observation Examples object is ball, state is 3D position+velocity, measurements are stereo pairs object is person, state is body configuration, measurements are frames, clock is in camera (30 fps)

Three main steps

Simplifying Assumptions

Assumptions allow recursive solutions Decompose estimation problem part that depends on new observation part that can be computed from previous history E.g., running average: a t = 1 t a t = t 1 t y i = 1 i=1:t t a t 1 + 1 t y t y i + 1 i=1:(t 1) t y t = t 1 t 1 t 1 Now in form that allows recursive application y i i=1:(t 1) + 1 t y t

Tracking as induction Assume data association is done a dangerous assumption--assumes good segmentation Do correction for the 0 th frame Assume we have corrected estimate for i th frame show we can do prediction for i+1, correction for i+1

Base case

Given Induction step

Induction step

Linear dynamic models Use notation ~ to mean has the pdf of, N(a, b) is a normal distribution with mean a and covariance b. Then a linear dynamic model has the form State Dynamics x i = N( D i 1 x i 1 ;Σ ) di y i = N( M i x i ;Σ ) mi This is much, much more general than it looks, and extremely powerful Measurement Dynamics

Examples Drifting points we assume that the new position of the point is the old one, plus noise. For the measurement model, we may not need to observe the whole state of the object e.g. a point moving in 3D, at the 3k th tick we see x, 3k+1 th tick we see y, 3k+2 th tick we see z in this case, we can still make decent estimates of all three coordinates at each tick. This property, which does not apply to every model, is called Observability

Examples Points moving with constant velocity Periodic motion Etc. Points moving with constant acceleration

Points moving with constant We have velocity u i = u i 1 + Δtv i 1 + ε i v i = v i 1 + ς i (the Greek letters denote noise terms) Stack (u, v) into a single state vector which is the form we had above u v i = 1 Δt 0 1 u v i 1 + noise

Points moving with constant acceleration We have u i = u i 1 + Δtv i 1 + ε i v i = v i 1 + Δta i 1 +ς i a i = a i 1 + ξ i (the Greek letters denote noise terms) Stack (u, v) into a single state vector which is the form we had above u v a i = 1 Δt 0 u 0 1 Δt v 0 0 1 a i 1 + noise

The Kalman Filter Key ideas: Linear models interact uniquely well with Gaussian noise - make the prior Gaussian, everything else Gaussian and the calculations are easy Gaussians are really easy to represent --- once you know the mean and covariance, you re done

The Kalman Filter in 1D Dynamic Model Notation Predicted mean Corrected mean

Correction for 1D Kalman filter Pattern match to identities given in book basically, guess the integrals, get: Notice: if measurement noise is small, we rely mainly on the measurement, if it s large, mainly on the prediction

In higher dimensions, derivation follows the same lines, but isn t as easy. Expressions here.

Smoothing Idea We don t have the best estimate of state - what about the future? Run two filters, one moving forward, the other backward in time. Now combine state estimates The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter s prediction as yet another measurement for the forward filter so we ve already done the equations

Data Association Nearest neighbours choose the measurement with highest probability given predicted state popular, but can lead to catastrophe Probabilistic Data Association combine measurements, weighting by probability given predicted state gate using predicted state