CS 4495 Computer Vision Hidden Markov Models

Similar documents
Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

CS 4495 Computer Vision

Hidden Markov Models

Viterbi Algorithm: Background

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Machine Learning 4771

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Statistical Machine Learning Methods for Bioinformatics I. Hidden Markov Model Theory

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

Vehicle Arrival Models : Headway

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Machine Learning Methods for Bioinformatics I. Hidden Markov Model Theory

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Object tracking: Using HMMs to estimate the geographical location of fish

Isolated-word speech recognition using hidden Markov models

Excel-Based Solution Method For The Optimal Policy Of The Hadley And Whittin s Exact Model With Arma Demand

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Temporal probability models

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Using the Kalman filter Extended Kalman filter

Authors. Introduction. Introduction

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

A First Course on Kinetics and Reaction Engineering. Class 19 on Unit 18

Sequential Importance Resampling (SIR) Particle Filter

Hidden Markov Models. Seven. Three-State Markov Weather Model. Markov Weather Model. Solving the Weather Example. Markov Weather Model

Tracking. Announcements

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

Block Diagram of a DCS in 411

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Notes on Kalman Filtering

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Two Coupled Oscillators / Normal Modes

Expectation- Maximization & Baum-Welch. Slides: Roded Sharan, Jan 15; revised by Ron Shamir, Nov 15

= ( ) ) or a system of differential equations with continuous parametrization (T = R

3.1 More on model selection

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Hidden Markov Models

Final Spring 2007

Lecture 20: Riccati Equations and Least Squares Feedback Control

Physical Limitations of Logic Gates Week 10a

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

An introduction to the theory of SDDP algorithm

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Matlab and Python programming: how to get started

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

From Particles to Rigid Bodies

Lecture 33: November 29

14 Autoregressive Moving Average Models

GMM - Generalized Method of Moments

Graphical Event Models and Causal Event Models. Chris Meek Microsoft Research

Anno accademico 2006/2007. Davide Migliore

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Ensamble methods: Boosting

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Ensamble methods: Bagging and Boosting

Inference of Sparse Gene Regulatory Network from RNA-Seq Time Series Data

Doctoral Course in Speech Recognition

Christos Papadimitriou & Luca Trevisan November 22, 2016

Estimation of Poses with Particle Filters

CS 4495 Computer Vision Tracking 1- Kalman,Gaussian

1 Review of Zero-Sum Games

Linear Time-invariant systems, Convolution, and Cross-correlation

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

Some Basic Information about M-S-D Systems

Presentation Overview

5. Stochastic processes (1)

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

4.1 - Logarithms and Their Properties

Solutions from Chapter 9.1 and 9.2

Chapter 7: Solving Trig Equations

5.1 - Logarithms and Their Properties

KEY. Math 334 Midterm III Winter 2008 section 002 Instructor: Scott Glasgow

On a Discrete-In-Time Order Level Inventory Model for Items with Random Deterioration

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

Probabilistic Robotics SLAM

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Online Convex Optimization Example And Follow-The-Leader

Probabilistic Robotics SLAM

Testing for a Single Factor Model in the Multivariate State Space Framework

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

10. State Space Methods

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Chapter 4. Truncation Errors

Lecture 2 April 04, 2018

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Random Walk with Anti-Correlated Steps

Lecture 3: Exponential Smoothing

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Data Fusion using Kalman Filter. Ioannis Rekleitis

Transcription:

CS 4495 Compuer Vision Aaron Bobick School of Ineracive Compuing

Adminisrivia PS4 going OK? Please share your experiences on Piazza e.g. discovered somehing ha is suble abou using vl_sif. If you wan o alk abou wha scales worked and why ha s ok oo.

Ouline Time Series Markov Models 3 compuaional problems of HMMs Applying HMMs in vision- Gesure Slides borrowed from UMd and elsewhere Maerial from: slides from Sebasian Thrun, and Yair Weiss

Audio Specrum Audio Specrum of he Song of he Prohonoary Warbler

Bird Sounds Prohonoary Warbler Chesnu-sided Warbler

Quesions One Could Ask Wha bird is his? How will he song coninue? Is his bird sick? Wha phases does his song have? Time series classificaion Time series predicion Oulier deecion Time series segmenaion

Oher Sound Samples

Anoher Time Series Problem Cisco General Elecric Inel Microsof

Quesions One Could Ask Will he sock go up or down? Wha ype sock is his (eg, risky)? Is he behavior abnormal? Time series predicion Time series classificaion Oulier deecion

Music Analysis

Quesions One Could Ask Is his Beehoven or Bach? Can we compose more of ha? Can we segmen he piece ino hemes? Time series classificaion Time series predicion/generaion Time series segmenaion

For vision: Waving, poining, conrolling?

The Real Quesion How do we model hese problems? How do we formulae hese quesions as a inference/learning problems?

Ouline For Today Time Series Markov Models 3 compuaional problems of HMMs Applying HMMs in vision- Gesure Summary

Weaher: A Markov Model (maybe?) 80% Sunny 60% Rainy 20% 5% 38% 75% 5% 2% 5% Snowy Probabiliy of moving o a given sae depends only on he curren sae: s Order Markovian

Ingrediens of a Markov Model Saes: { S, S2,..., S N } Sae ransiion probabiliies: a = Pq ( = S q= S) Iniial sae disribuion: ij + i j π = Pq [ = S] i i 80% Sunny 5% Rainy 60% 38% 5% 2% 75% 5% Snowy 20%

Ingrediens of Our Markov Model Saes: { Ssunny, Srainy, Ssnowy} Sae ransiion probabiliies:.8.5.05 A =.38.6.02.75.05.2 Iniial sae disribuion: π = (.7.25.05) 80% Sunny Rainy 5% 38% 5% 2% 75% 5% Snowy 60% 20%

Probabiliy of a Time Series Given: Wha is he probabiliy of his series? P( S P( S sunny snowy ) P( S S rainy S ) P( S rainy sunny snowy ) P( S S rainy snowy ) S rainy ) P( S rainy S rainy ) = 0.7 0.5 0.6 0.6 0.02 0.2 = 0.00052 A =.8.38.75.5.6.05.05.02.2 π = (.7.25.05)

Ouline For Today Time Series Markov Models 3 compuaional problems of HMMs Applying HMMs in vision- Gesure Summary

80% Sunny 60% 30% NOT OBSERVABLE 80% Sunny 5% 5% Snowy Rainy 5% Rainy 30% 38% 5% 2% 75% 5% 0% 5% Snowy 2% 65% 75% 5% 20% 0% 50% 50% 60% 60% OBSERVABLE 20%

Probabiliy of a Time Series Given: Wha is he probabiliy of his series? P ( O) = P( Ocoa, Ocoa, Oumbrella,..., Oumbrella ) = P( O Q) P( Q) = P( O q,..., q7) P( q,..., q7) all Q q,..., q7 2 4 6 = (0.3 0. 0.6) (0.7 0.8 ) +... A =.8.38.75.5.6.05.05.02.2 π = (.7.25.05) B =.6.05 0.3.3.5..65.5

Specificaion of an HMM N - number of saes Q = {q ; q 2 ; : : : ;q T } sequence of saes Some form of oupu symbols Discree finie vocabulary of symbols of size M. One symbol is emied each ime a sae is visied (or ransiion aken). Coninuous an oupu densiy in some feaure space associaed wih each sae where a oupu is emied wih each visi For a given sequence observaion O O = {o ; o 2 ; : : : ;o T } o i observed symbol or feaure a ime i

Specificaion of an HMM A - he sae ransiion probabiliy marix a ij = P(q + = j q = i) B- observaion probabiliy disribuion Discree: b j (k) = P(o = k q = j) i k M Coninuous b j (x) = p(o = x q = j) π - he iniial sae disribuion π (j) = P(q = j) S S 2 S 3 Full HMM over a of saes and oupu space is hus specified as a riple: λ = (A,B,π)

Wha does his have o do wih Vision? Given some sequence of observaions, wha model generaed hose? Using he previous example: given some observaion sequence of clohing: Is his Philadelphia, Boson or Newark? Noice ha if Boson vs Arizona would no need he sequence!

Ouline For Today Time Series Markov Models 3 compuaional problems of HMMs Applying HMMs in vision- Gesure Summary

The 3 grea problems in HMM modelling. Evaluaion: Given he model λ = (A, B, π) wha is he probabiliy of occurrence of a paricular observaion sequence O = {o,, o T } = P(O λ) This is he hear of he classificaion/recogniion problem: I have a rained model for each of a se of classes, which one would mos likely generae wha I saw. 2. Decoding: Opimal sae sequence o produce an observaion sequence O = {o,, o T } Useful in recogniion problems helps give meaning o saes which is no exacly legal bu ofen done anyway. 3. Learning: Deermine model λ, given a raining se of observaions Find λ, such ha P(O λ) is maximal

Problem : Naïve soluion Sae sequence Q = (q, q T ) Assume independen observaions: T P ( O q, λ) = P( o q, λ) = bq ( o ) bq 2( o2 i= )... b qt ( o T ) NB: Observaions are muually independen, given he hidden saes. Tha is, if I know he saes hen he previous observaions don help me predic new observaion. The saes encode *all* he informaion. Usually only kind-of rue see CRFs.

Problem : Naïve soluion Bu we know he probabiliy of any given sequence of saes: Pq ( λ) = π a a... a q qq2 q2q3 q( T ) qt

Problem : Naïve soluion Given: P ( O q, λ) = P( o q, λ) = bq ( o ) bq 2( o2 We ge: T i= Pq ( λ) = π a a... a q qq2 q2q3 q( T ) qt )... b P ( O λ) = P( O q, λ) P( q λ) q NB: -The above sum is over all sae pahs -There are N T saes pahs, each cosing O(T) calculaions, leading o O(TN T ) ime complexiy. qt ( o T )

Problem : Efficien soluion Define auxiliary forward variable α: ) = (,...,, = α ( i P o o q i λ α (i) is he probabiliy of observing a parial sequence of observables o, o AND a ime, sae q = i )

Problem : Efficien soluion Recursive algorihm: Iniialise: α () i = π b( o ) i i (Parial obs seq o AND sae i a ) x (ransiion o j a +) x (sensor) Calculae: α Obain: N ( j) = α () ia b( o ) + ij j + i= N P O λ) = i= ( α ( i ) T Sum of differen ways of geing obs seq Complexiy is only O(N 2 T)!!! Sum, as can reach j from any preceding sae

CS 4495 Compuer Vision A. Bobick The Forward Algorihm () S 2 S 3 S S 2 S 3 S O 2 O S 2 S 3 S O 3 S 2 S 3 S O 4 S 2 S 3 S O T ),,..., ( ) ( i S q O O P i = = α ) ( ) ( ) ( ), ( ),,..., ( ),,...,,,..., ( ),,..., ( ) ( i a O b i S q S q P O S q O O P S q O O S q O O P S q O O P j ij N i j i j N i N i i i j j α α α + = + + = = + + + + + = = = = = = = = = = ) ( ) ( O b i i α = π i (Trellis diagram)

Problem : Alernaive soluion Backward algorihm: Define auxiliary forward variable β: β ( i) = Po (, o,..., o q = i, λ) + + 2 T β (i) he probabiliy of observing a sequence of observables o +,, o T GIVEN sae q = i a ime, and λ

Problem : Alernaive soluion Recursive algorihm: Iniialize: β ( j) = T Calculae: Terminae: N β () i = β + ( jab ) ( o + ) ij j j= N p( O λ) = β ( i = T,..., i= ) Complexiy is O(N 2 T)

Forward-Backward Opimaliy crierion : o choose he saes individually mos likely a each ime q ha are The probabiliy of being in sae i a ime γ () = pq ( = i O, λ) = i α () i β () i N i= α () i β () i = p(o λ) and q =i = p(o λ) α () i : accouns for parial observaion sequence ( i): accoun for remainder o, o,... o β + + 2 T o, o2,... o

Problem 2: Decoding Choose sae sequence o maximise probabiliy of observaion sequence Vierbi algorihm - inducive algorihm ha keeps he bes sae sequence a each insance S S S 2 S 2 S S 2 S S 2 S S 2 S 3 S 3 S 3 S 3 S 3 O O 2 O 3 O 4 O T

Problem 2: Decoding Vierbi algorihm: Sae sequence o maximize P(O, Q ): Pq (, q,... q Oλ, ) 2 Define auxiliary variable δ: T δ ( i) = max Pq (, q,..., q = io,, o,... o λ) 2 2 q δ (i) he probabiliy of he mos probable pah ending in sae q = i

Problem 2: Decoding Recurren propery: Algorihm:. Iniialise: δ ( j) = max( δ ( ia ) ) b( o ) + ij j + i To ge sae seq, need o keep rack of argumen o maximise his, for each and j. Done via he array ψ (j). δ () i = πb( o) i i i N ψ () i = 0

Problem 2: Decoding 2. Recursion: δ ( j) = max( ( ia ) ) b( o) 3. Terminae: δ ij j i N ψ ( ) arg max( ( ) ) j δ iaij i N = 2 T, j N P q = T = maxδ ( i) i N T arg maxδ i N T ( i) P* gives he sae-opimized probabiliy Q* is he opimal sae sequence (Q = {q, q2,, qt })

Problem 2: Decoding 4. Backrack sae sequence: q ψ q = ( ) = T, T 2,..., + + S S S 2 S 2 S S 2 S S 2 S S 2 S 3 S 3 S 3 S 3 S 3 O O 2 O 3 O 4 O T O(N 2 T) ime complexiy

Problem 3: Learning Training HMM o encode observaion sequence such ha HMM should idenify a similar obs seq in fuure Find λ = (A, B, π), maximizing P(O λ) General algorihm: Iniialize: λ 0 Compue new model λ, using λ 0 and observed sequence O Then λ λ o Repea seps 2 and 3 unil: log P ( O λ) log P( O λ0) < d

CS 4495 Compuer Vision A. Bobick Problem 3: Learning ) ( ) ( ) ( ) ( ), ( λ β α ξ O P j o b a i j i j ij + + = Le ξ(i,j) be a probabiliy of being in sae i a ime and a sae j a ime +, given λ and O seq = = + + + + = N i N j j ij j ij j o b a i j o b a i ) ( ) ( ) ( ) ( ) ( ) ( β α β α Sep of Baum-Welch algorihm: = p(o and (ake i o j) λ ) = p(o λ) = p(ake i o j a ime O,λ)

Problem 3: Learning Operaions required for he compuaion of he join even ha he sysem is in sae Si and ime and Sae Sj a ime +

Problem 3: Learning Le γ () i be a probabiliy of being in sae i a ime, given O T = T = γ () i ξ (, i j) N γ () i = ξ (, i j) j= - expeced no. of ransiions from sae i - expeced no. of ransiions i j

Problem 3: Learning Sep 2 of Baum-Welch algorihm: ˆ π = γ ( i ) he expeced frequency of sae i a ime = aˆ ij = ξ ( i, γ ( i) j) raio of expeced no. of ransiions from sae i o j over expeced no. of ransiions from sae i γ ( j) ˆ o, ( ) = k j γ ( j) b k = raio of expeced no. of imes in sae j observing symbol k over expeced no. of imes in sae j

Problem 3: Learning Baum-Welch algorihm uses he forward and backward algorihms o calculae he auxiliary variables α, β B-W algorihm is a special case of he EM algorihm: E-sep: calculaion of ξ and γ M-sep: ieraive calculaion of πˆ, â ij, bˆ j ( k) Pracical issues: Can ge suck in local maxima Numerical problems log and scaling

Now HMMs and Vision: Gesure Recogniion

"Gesure recogniion"-like aciviies

Some houghs abou gesure There is a conference on Face and Gesure Recogniion so obviously Gesure recogniion is an imporan problem Prooype scenario: Subjec does several examples of "each gesure" Sysem "learns" (or is rained) o have some sor of model for each A run ime compare inpu o known models and pick one New found life for gesure recogniion:

Generic Gesure Recogniion using HMMs Nam, Y., & Wohn, K. (996, July). Recogniion of space-ime hand-gesures using hidden Markov model. In ACM symposium on Virual realiy sofware and echnology (pp. 5-58).

Generic gesure recogniion using HMMs () Daa glove

Generic gesure recogniion using HMMs (2)

Generic gesure recogniion using HMMs (3)

Generic gesure recogniion using HMMs (4)

Generic gesure recogniion using HMMs (5)

Wins and Losses of HMMs in Gesure Good poins abou HMMs: A learning paradigm ha acquires spaial and emporal models and does some amoun of feaure selecion. Recogniion is fas; raining is no so fas bu no oo bad. No so good poins: If you know somehing abou sae definiions, difficul o incorporae Every gesure is a new class, independen of anyhing else you ve learned. ->Paricularly bad for parameerized gesure.

Parameerized Gesure I caugh a fish his big.

Parameric HMMs (PAMI, 999) Basic ideas: Make oupu probabiliies of he sae be a funcion of he parameer of ineres, b j (x) becomes b j(x, θ). Mainain same emporal properies, a ii unchanged. Train wih known parameer values o solve for dependencies of bb on θ. During esing, use EM o find θ ha gives he highes probabiliy. Tha probabiliy is confidence in recogniion; bes θ is he parameer. Issues: How o represen dependence on θ? How o rain given θ? How o es for θ? Wha are he limiaions on dependence on θ?

Linear PHMM - Represenaion Represen dependence on θ as linear movemen of he mean of he Gaussians of he saes: Need o learn W j and µ j for each sae j. (ICCV 98)

Linear PHMM - raining Need o derive EM equaions for linear parameers and proceed as normal:

Linear HMM - esing Derive EM equaions wih respec o θ : We are esing by EM! (i.e. ieraive): Solve for γ k given guess for θ Solve for θ given guess for γ k

How big was he fish?

Poining Poining is he prooypical example of a parameerized gesure. Assuming wo DOF, can parameerize eiher by (x,y) or by (θ,φ). Under linear assumpion mus choose carefully. A generalized non-linear map would allow greaer freedom. (ICCV 99)

Linear poining resuls Tes for boh recogniion and recovery: If prune based on legal θ (MAP via uniform densiy) :

Noise sensiiviy Compare ad hoc procedure wih PHMM parameer recovery (ignoring heir recogniion problem!!).

HMMs and vision HMMs capure sequencing nicely in a probabilisic manner. Moderae ime o rain, fas o es. More when we do aciviy recogniion