CS 4495 Computer Vision

Size: px

Start display at page:

Download "CS 4495 Computer Vision"

Dennis Chad Osborne
5 years ago
Views:

1 CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing S 1 S 2 S 3 S 1 S 1 S 2 S 2 S 3 S 3 S 1 S 2 S 3 S 1 S 2 S 3 S 1 S 2 S 3 O 1 O 2 O 3 O 4 O T

2 Administrivia PS 6 easier yes? Still due Tues Nov 25th, 11:55pm. And the final is still: Tues, December 9, 2:50pm - 5:40pm Room TBD?

3 Outline Time Series Markov Models Hidden Markov Models 3 computational problems of HMMs Applying HMMs in vision- Gesture Slides borrowed from UMd and elsewhere Material from: slides from Sebastian Thrun, and Yair Weiss

4 Audio Spectrum Audio Spectrum of the Song of the Prothonotary Warbler

5 Bird Sounds Prothonotary Warbler Chestnut-sided Warbler

6 Questions One Could Ask What bird is this? How will the song continue? Is this bird sick? What phases does this song have? Time series classification Time series prediction Outlier detection Time series segmentation

7 Another Time Series Problem Cisco General Electric Intel Microsoft

8 Questions One Could Ask Will the stock go up or down? What type stock is this (eg, risky)? Is the behavior abnormal? Time series prediction Time series classification Outlier detection

9 Music Analysis

10 Questions One Could Ask Is this Beethoven or Bach? Can we compose more of that? Can we segment the piece into themes? Time series classification Time series prediction/generation Time series segmentation

11 What does this have to do with vision?

12 For vision: Waving, pointing, controlling?

13 The Real Questions How do we model these problems? How do we formulate these questions as a inference/learning problems?

14 Outline For Today Time Series Markov Models Hidden Markov Models 3 computational problems of HMMs Applying HMMs in vision- Gesture Summary

15 Weather: A Markov Model (maybe?) 80% Sunny 60% Rainy 20% 15% 38% 5% 2% 75% 5% Snowy Probability of moving to a given state depends only on the current state: 1 st Order Markovian

16 Ingredients of a Markov Model States: { S1, S2,..., S N } 80% State transition probabilities: a = Pq ( = S q= S) ij t+ 1 i t j Initial state distribution: π = Pq [ = S] i 1 i Sunny Rainy 15% 38% 5% 2% 75% 5% 20% Snowy 60%

17 Ingredients of Our Weather Markov Model States: { Ssunny, Srainy, Ssnowy} 80% State transition probabilities: A = Initial state distribution: π = ( ) Sunny Rainy 15% 38% 5% 2% 75% 5% 20% Snowy 60%

18 Probability of a Time Series Given: What is the probability of this series? P( S P( S sunny snowy ) P( S S rainy S ) P( S rainy sunny snowy ) P( S S rainy snowy ) S rainy ) P( S rainy S rainy ) = = A = π = ( )

19 Outline For Today Time Series Markov Models Hidden Markov Models 3 computational problems of HMMs Applying HMMs in vision- Gesture Summary

20 Markov Models 80% Sunny 60% Rainy 15% 38% 75% 5% 2% 5% Snowy 20%

21 Hidden Markov Models 80% Sunny 60% 30% NOT OBSERVABLE 80% Sunny 15% 15% Snowy Rainy 5% Rainy 30% 38% 5% 2% 75% 5% 10% 5% Snowy 2% 65% 75% 5% 20% 0% 50% 50% 60% 60% OBSERVABLE 20%

22 Probability of a Time Series Given: What is the probability of this series?

23 Probability of a Time Series Given: PO ( ) = PO (, O, O,..., O ) coat coat umbrella umbrella = PO ( QPQ ) ( ) = POq (,..., q) Pq (,..., q) all Q q,..., q = ( ) ( ) +... (All sun!) A = π = ( ) B =

24 Specification of an HMM N - number of states QQ = {qq 1 ; qq 2 ; ; qq TT } sequence of states. TT time steps observations. Some form of output symbols Discrete finite vocabulary of symbols of size M. One symbol is emitted each time a state is visited (or transition taken). Continuous an output density in some feature space associated with each state where a output is emitted with each visit Consider a given sequence observation O (or set) OO = {oo 1 ; oo 2 ; ; oo TT } oo ii observed symbol or feature at time i

25 Specification of an HMM: λλ= (A,B,π) A - the state transition probability matrix aa iiii = PP(qq tt+1 = jj qq tt = ii) S 1 S 2 S 3 B- observation probability distribution Discrete: bb jj kk = PP oo tt = kk qq tt = jj 1 kk MM Continuous: bb jj (xx) = pp(oo tt = xx qq tt = jj) π - the initial state distribution ππ (jj) = PP(qq 1 = jj) Full HMM over a set of states and output space is thus specified as a triplet: λλ = (AA, BB, ππ)

26 What does this have to do with Vision? Given some sequence of observations, what model generated those?

27 What does this have to do with Vision? Suppose we had two different models of weather: Boston, Massachusetts and Philadelphia, PA Given some observation sequence of clothing is this Boston or Philly? Mark Twain: If you don t like the weather in New England, wait 5 minutes. So the statistics of the sequence matter. Notice that if Boston vs Phoenix, we would not need the sequence, distinguishable like BOVW distribution.

28 Outline For Today Time Series Markov Models Hidden Markov Models 3 computational problems of HMMs Applying HMMs in vision- Gesture Summary

29 The 3 great problems in HMM modelling 1. Evaluation PP(OO λλ): Given the model λλ = (AA, BB, ππ) what is the probability of occurrence of a particular observation sequence OO = oo 1,, oo TT (This is the heart of the classification/recognition problem: I have a trained model for each of a set of classes, which one would most likely generate what I saw.)

30 The 3 great problems in HMM modelling 2. Decoding: Optimal state sequence to produce an observation sequence OO = {oo 1,, oo TT } Useful in recognition problems helps give meaning to states But grad students see CRFs

31 The 3 great problems in HMM modelling 2. Decoding: Optimal state sequence to produce an observation sequence OO = {oo 1,, oo TT } Useful in recognition problems helps give meaning to states But grad students see CRFs 3. Learning: Determine model λ, given a training set of observations Find λ, such that PP(OO λλ) is maximal. (Why?)

32 Problem 1: Naïve solution State sequence QQ = (qq 1, qq TT ) Assume independent observations: T PO ( Q, λ)= Po ( q, λ) = b( o) b ( o)... b ( o) i= 1 t t q1 1 q 2 2 qt T NB: Observations are mutually independent, given the hidden states. That is, if I know the states then the previous observations don t help me predict new observation. The states encode *all* the information. Usually only kind-of true see CRFs.

33 Problem 1: Naïve solution But we know the probability of any given sequence of states: Pq ( λ) = π a a... a q1 q1q2 q2q3 q( T 1) qt

34 Problem 1: Naïve solution Given: and T P ( O q, λ) = P( ot qt, λ) = bq 1( o1 ) bq 2( o2 i= 1 Pq ( λ) = π a a... a q1 q1q2 q2q3 q( T 1) qt )... b qt ( o T ) Then P ( O λ) = P( O q, λ) P( q λ) q NB: -The above sum is over all state paths -There are NN TT states paths, each costing OO(TT) calculations, leading to OO(TTNN TT ) time complexity. -Big number

35 Problem 1: Efficient solution Define auxiliary forward variable α: α ( i) = P( o,..., o, q = i λ) t 1 t t αα tt (ii) is the probability of observing a partial sequence of observables oo 1, oo tt AND at time t, state qq tt = ii

36 Problem 1: Efficient solution Forward Recursive algorithm: Initialise: α () i = π b ( o ) 1 i i 1 Can reach jj from any preceding state Each time step: Conclude: α Complexity: O(NN 2 TT) N ( j) = α () ia b( o ) t+ 1 t ij j t+ 1 i= 1 N PO ( λ) = α () i T i= 1 Probability of the entire observation sequence is just sum of observations and ending up in state i, for all i.

37 The Forward Algorithm Trellis Diagram S 1 S 1 S 2 S 2 S 1 S 2 S 1 S 2 S 1 S 2 S 3 S 3 S 3 S 3 S 3 O 1 O 2 O 3 O 4 O T α i ) = P( O,..., O, q = t ( 1 t t Si α 1( i) = π i bi ( O1 ) ) α t+ 1 ( j) = = = = P( O N i= 1 N i= 1 N i= 1 1,..., O P( O P( O b j 1 ( O,..., O t+ 1 t+ 1 t+ 1, q, q t+ 1 t+ 1 t+ 1, q = S ) a α ( i) ij t = S t+ 1 j j q ) = S t j O i 1,..., O = S ) α ( i) t t, q t = S ) P( O i 1,..., O t, q t = S i )

38 Problem 1: Alternative solution Backward algorithm: Define auxiliary backward variable ββ: β ( i) = Po (, o,..., o q = i, λ) t t+ 1 t+ 2 T t β tt (ii) the probability of observing a sequence of observables o t+1,, oo TT GIVEN state qq tt = ii at time tt, and λ

39 Problem 1: Alternative solution Backward Recursive algorithm: Initialize: β ( j) = 1 T Calculate: Terminate: N β () i = β + ( jab ) ( o + ) t t 1 ij j t 1 j= 1 N p( O λ) = β ( i = T 1,..., 1 i= 1 1 ) t Complexity is OO(NN 2 TT)

40 Forward-Backward Optimality criterion : to choose the states qq tt that are individually most likely at each time t Define γγ ii tt as the probability of being in state i at time t γ () t = pq ( = i O, λ) = i α () i β () i N i= 1 t α () i β () i t t t t = p(o λ) and q t =i = p(o λ) αα tt ii accounts for partial observation sequence ββ tt ii acounts for remainder o, o,... o t+ 1 t+ 2 T o1, o2,... o t

41 Problem 2: Decoding Choose state sequence to maximise probability of observation sequence Viterbi algorithm - inductive algorithm that keeps the best state sequence at each instance S 1 S 1 S 2 S 2 S 1 S 2 S 1 S 2 S 1 S 2 S 3 S 3 S 3 S 3 S 3 O 1 O 2 O 3 O 4 O T

42 Problem 2: Decoding Viterbi algorithm: State sequence to maximize PP(OO, QQ ): Pq (, q,... q Oλ, ) 1 2 Define auxiliary variable δ: T δ ( i) = max Pq (, q,..., q = io,, o,... o λ) t 1 2 t 1 2 t q δδ tt (ii) the probability of the most probable path ending in state qq tt = ii

43 Problem 2: Decoding Recurrent property: δ ( j) = max( δ ( ia ) ) b( o ) t+ 1 t ij j t+ 1 i Algorithm: 1. Initialise: δ () i = πb( o) 1 i i 1 ψ () i = i N To get state seq, need to keep track of argument to maximize this, for each t and j. Done via the array ψψ tt (jj).

44 Problem 2: Decoding 2. Recursion: δ ( ) max( ( ) ) ( ) 3. Terminate: t j = δt 1 iaij bj ot 1 i N ψ ( ) arg max( ( ) ) t j δt 1 iaij 1 i N = 2 t T, 1 j N P q = T = maxδ ( i) 1 i N T arg maxδ 1 i N T ( i) P* gives the state-optimized probability Q* is the optimal state sequence (QQ = {qq1, qq2,, qqtt })

45 Problem 2: Decoding 4. Backtrack state sequence: = ( ) t = T 1, T 2,...,1 q ψ q t t t S 1 S 1 S 2 S 2 S 1 S 2 S 1 S 2 S 1 S 2 S 3 S 3 S 3 S 3 S 3 O 1 O 2 O 3 O 4 O T O(N 2 T) time complexity

46 Problem 3: Learning Training HMM to encode observation sequence such that HMM should identify a similar obs seq in future Find λλ = (AA, BB, ππ), maximizing PP(OO λλ) General algorithm: Initialize: λλ 0 Compute new model λλ, using λλ 0 and observed sequence OO Then λ λ o Repeat steps 2 and 3 until: PO ( λ) PO ( λ ) < d or log PO ( λ) log PO ( λ ) < d' 0 0

47 Problem 3: Learning Step 1 of Baum-Welch algorithm: Let ξ(i,j) be a probability of being in state i at time t and at state j at time t+1, given λ and O seq ξ ( i, t j) αt ( i) a = ij b j ( o P( O t+ 1 ) β λ) = p(take i to j at time t O,λ) t+ 1 ( j) = p(o and (take i to j) λ ) = p(o λ)

48 Problem 3: Learning ξ ( i, t αt ( i) a j) = ij b j ( o P( O t+ 1 ) β λ) t+ 1 ( j) Operations required for the computation of the joint event that the system is in state SS ii at time t and state SS jj at time t+1

49 Problem 3: Learning the E step Remember γγ tt ii is the probability of being in state ii at time tt. Also given O: γ () i = ξ (, i j) t j= 1 Expected no. of transitions in O from state i N t T 1 t= 1 γ () i t And expected no. of transitions for ii jj T 1 t= 1 ξ (, i j) t

50 Problem 3: Learning the M step Step 2 of Baum-Welch algorithm: ˆ π = γ 1( i ) the expected frequency of state i at time t=1 aˆ ij = ξ ( i, t γ ( i) t j) ratio of expected no. of transitions from state i to j over expected no. of transitions from state i γ ( j) ˆ to, ( ) t = k t j γ t ( j) b k = ratio of expected no. of times in state j observing symbol k over expected no. of times in state j

51 Problem 3: Learning Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables αα, ββ B-W algorithm is a special case of the EM algorithm: E-step: Using current guess of λλ to estimate ξ and γ M-step: Maximize likelihood - calculate ππ, aa iiii bb jj kk Practical issues: Can get stuck in local maxima Numerical problems log and scaling

52 HMMs: General HMMs: Generative probabilistic models of time series (with hidden state) Forward-Backward: Algorithm for computing probabilities over hidden states Given the forward-backward algorithms you can also train the models. Best known methods in speech, computer vision, robotics, though for really big data CRFs winning.

53 Now HMMs and Vision: Gesture Recognition

54 "Gesture recognition"-like activities

55 Some thoughts about gesture There is a conference on Face and Gesture Recognition so obviously Gesture recognition is an important problem Prototype scenario: Subject does several examples of "each gesture" System "learns" (or is trained) to have some sort of model for each At run time compare input to known models and pick one

56 New found life for gesture recognition:

57 Generic Gesture Recognition using HMMs Nam, Y., & Wohn, K. (1996, July). Recognition of space-time hand-gestures using hidden Markov model. In ACM symposium on Virtual reality software and technology (pp ).

58 Generic gesture recognition using HMMs (1) Data glove

59 Generic gesture recognition using HMMs (2)

60 Generic gesture recognition using HMMs (3)

61 Generic gesture recognition using HMMs (4)

62 Generic gesture recognition using HMMs (5)

63 Wins and Losses of HMMs in Gesture Good points about HMMs: A learning paradigm that acquires spatial and temporal models and does some amount of feature selection. Recognition is fast; training is not so fast but not too bad. Not so good points: If you know something about state definitions, difficult to incorporate Every gesture is a new class, independent of anything else you ve learned. ->Particularly bad for parameterized gesture.

64 Parameterized Gesture I caught a fish this big.

Parametric HMMs (PAMI, 1999) Basic ideas: Make output probabilities of the state be a function of the parameter of interest, bb jj (xx) becomes bb jj(xx, θθ).

65 Parametric HMMs (PAMI, 1999) Basic ideas: Make output probabilities of the state be a function of the parameter of interest, bb jj (xx) becomes bb jj(xx, θθ). Maintain same temporal properties, aa iiii unchanged. Train with known parameter values to solve for dependencies of bbb on θ. During testing, use EM to find θ that gives the highest probability. That probability is confidence in recognition; best θ is the parameter. Issues: How to represent dependence on θ? How to train given θ? How to test for θ? What are the limitations on dependence on θ?

66 Linear PHMM - Representation Represent dependence on θ as linear movement of the mean of the Gaussians of the states: P( x μˆ ( θ) = W θ+ μ j j j q = j, θ) = N( x, μˆ ( θ), Σ ) t t t j j Need to learn WW jj and μμ jj for each state j. (ICCV 98)

67 Linear PHMM - training Need to derive EM equations for linear parameters and proceed as normal:

68 Linear HMM - testing Derive EM equations with respect to θ : We are testing by EM! (i.e. iterative): Solve for γ tk given guess for θ Solve for θ given guess for γ tk

69 How big was the fish?

70 Pointing Pointing is the prototypical example of a parameterized gesture. Assuming two DOF, can parameterize either by (x,y) or by (θ,φ). Under linear assumption must choose carefully. A generalized non-linear map would allow greater freedom. (ICCV 99)

71 Linear pointing results Test for both recognition and recovery: If prune based on legal θ (MAP via uniform density) :

72 Noise sensitivity Compare ad hoc procedure with PHMM parameter recovery (ignoring their recognition problem!!).

73 Validating EM testing of Θ

74 HMMs and vision HMMs capture sequencing nicely in a probabilistic manner. They are generative models. Forward-Backward: Recursive algorithms for computing probabilities over hidden states Learning models: EM, iterates estimation of hidden state and model fitting Moderate time to train, fast to test. Recent resurgence in the robotics universe of action recognition they have many fewer examples than discriminative methods may require. Numerous extensions exist (continuous observations, states; factorial HMMs, controllable HMMs=POMDPs, )

75 The End

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However