Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26
Content Motivation/Goals Modeling of sequential data Definition of (Hidden) Markov Models Parameters of HMMs Calculations / Illustrations / Algorithms Examples CI/CI(CS) SS 2015 June 23, 2015 Slide 2/26
Motivation List of samples, calculation of joint likelihood: N p(x Θ) = p(x n Θ) n=1 Requires independence of samples x n (i.i.d.)! 4 0.5 3 white noise 2 1 0 1 2 Speech signal 0 0.5 3 0 50 100 150 200 Strong requirement that might not be met! How to model dependencies? 1 0 50 100 150 200 CI/CI(CS) SS 2015 June 23, 2015 Slide 3/26
Example Weather (1) Observations: Q = {,,,, } (Weather on a specific day) Data modeled as i.i.d. Probability of rain on one day depends only on relative frequency of Rain in Q Correlations/Trends over multiple days are ignored CI/CI(CS) SS 2015 June 23, 2015 Slide 4/26
Example Weather (2) Observations: Q = {,, } Weather on day n as q n {,, } We are interested in the following conditional probability P(q n q n 1, q n 2,..., q 1 ) How probable is rain on day n, given the observations above: P(q 4 = q 3 =, q 2 =, q 1 = ) Possibility: Estimate based on prior observations of sequence {,,, } Problem: Complexity grows exponentially CI/CI(CS) SS 2015 June 23, 2015 Slide 5/26
Simplification: Markov-Assumption The following assumption simplifies the problem: P(q n q n 1, q n 2,..., q 1 ) = P(q n q n 1 ) Weather on day n only depends on weather on day n 1 Markov-Assumption of first-order Model using this assumption is called Markov-Model, an output sequence of it Markov-Chain Probability of a sequence {q n,..., q 2, q 1 } P(q n,..., q 1 ) = n P(q i q i 1 ) i=1 CI/CI(CS) SS 2015 June 23, 2015 Slide 6/26
Simplification: Markov-Assumption (2) Now, only 3 3 = 9 statistics necessary Arrange in table: Wetter today Weather tomorrow 0.8 0.05 0.15 0.2 0.6 0.2 0.2 0.3 0.5 For sure any weather tomorrow Table holds transition probabilities P(q i q i 1 ) Representation as automata Blackboard CI/CI(CS) SS 2015 June 23, 2015 Slide 7/26
Computation of Probabilities Reminder: Product rule: P(X, Y ) = P(Y X )P(X ) Tomorrow, given yesterday and today P(q 3 = q 2 =, q 1 = ) Tomorrow, day after tomorrow, given today P(q 3 =, q 2 = q 1 = ) Day after tomorrow, given today Blackboard P(q 3 = q 1 = ) CI/CI(CS) SS 2015 June 23, 2015 Slide 8/26
Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter P. for umbrella 0.1 0.8 0.3 CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26
Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter P. for umbrella 0.1 0.8 0.3 hidden states q i {,, } and observations x i {, } CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26
Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter q i P. for umbrella P(x i q i ) 0.1 0.8 0.3 hidden states q i {,, } and observations x i {, } We want P(q i x i ) Bayes P(q i x i ) = P(x i q i )P(q i ), P(x i ) CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26
HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26
HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26
HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities discrete observations x n {v 1,..., v K }, b i,k = P(x n = v k q n = s i ), are the P. for observing x n = v k in state s i CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26
HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities discrete observations x n {v 1,..., v K }, b i,k = P(x n = v k q n = s i ), are the P. for observing x n = v k in state s i continuous Observations, e.g. x n R D, b i (x n) = p(x n q n = s i ), (PDFs of observations) CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26
HMMs Definition and Parameters Transition Matrix A q n q n+1 0.8 0.05 0.15 0.2 0.6 0.2 0.2 0.3 0.5 a i,j = P(q n+1 = s j q n = s i ) 0.8 0.05 0.15 A = 0.2 0.6 0.2 0.2 0.3 0.5 CI/CI(CS) SS 2015 June 23, 2015 Slide 11/26
HMMs Definition and Parameters Emission Matrix B q i P(x i = q i ) 0.1 0.8 0.3 x n {v 1 =, v 2 = } b i,k = P(x n = v k q n = s i ) 0.1 0.9 B = 0.8 0.2 0.3 0.7 CI/CI(CS) SS 2015 June 23, 2015 Slide 12/26
HMMs Useful formulas Most is already known, here just in HMM notation Probability of state sequence Q = {q 1,..., q n } P(Q Θ) = π q1 N a qn,qn+1 n=1 Prob. of observation sequence given state sequence N P(X Q, Θ) = n=1 b qn,x n Joint Likelihood of X and Q (product rule) P(X, Q Θ) = P(X Q, Θ) P(Q Θ) Likelihood of observation sequence given an HMM P(X Θ) = P(X, Q Θ) allq CI/CI(CS) SS 2015 June 23, 2015 Slide 13/26
HMMs Trellis Diagram Representation of HMM and parameters with time evolution State 1 b 1,k a 1,1 b 1,k b 1,k...... b 1,k a 1,2 b 2,k b 2,k b 2,k b 2,k State 2 a 1,3...... State 3 b 3,k b 3,k b 3,k...... b 3,k Sequence: x 1 x 2 x i x N n = 1 n = 2 n = i n = N time CI/CI(CS) SS 2015 June 23, 2015 Slide 14/26
7 6 5 4 3 2 1 0 2500 2000 1500 F2 1000 200 2 1.5 1 0.5 0 2000 400 F1 1500 600 1000 500 800 0 0 500 1000 1500 2000 Graz University of Technology SPSC Laboratory HMMs with continuous observations Observations continuous s1 s2 B might hold parameters of Gaussians x 10 5 Example here: Speech recognition Observations: Feature vectors State could be word element, observation feature vector x 10 6 s1 s2 s3 x 10 5 1 0.8 0.6 0.4 0.2 0 1600 1400 1200 1000 800 600 400 F2 200 400 F1 600 800 CI/CI(CS) SS 2015 June 23, 2015 Slide 15/26
HMMs Examples Matlab-examples CI/CI(CS) SS 2015 June 23, 2015 Slide 16/26
HMMs 3 basic problems 1. Given an observation sequence X and model Θ, calculate P(X Θ), (P. that Θ generated sequence) e.g. for classification, forward-backward-algorithm 2. Given an observation sequence X and model Θ, estimate optimal state sequence Q Find hidden states, Viterbi-algorithm 3. Given an observation sequence X (training-sequence), estimate model Θ, such that P(X Θ) is maximized Learning problem, e.g. Baum-Welch-algorithm We cover 1) and 2) CI/CI(CS) SS 2015 June 23, 2015 Slide 17/26
Sequence classification with HMMs Given an observation sequence X and multiple HMMs Θ k X could be extracted features of speech, HMMs model word elements Problem: Which of the HMMs has generated sequence? Calculate likelihood for X for every model We already know the equation P(X Θ) = allq P(X, Q Θ) Sum over all state sequences that the model allows Matlab CI/CI(CS) SS 2015 June 23, 2015 Slide 18/26
Optimal State Sequence Given an observation sequence X and HMM Θ Problem: What is the optimal state sequence? Q that explains X best That is Q, which maximizes P(Q X, Θ) Sequence with maximum a-posteriori probability We can calculate L(q 1,..., q n x 1,..., x n ) ML-path For every path: Complexity grows exponentially with n Viterbi-algorithm (GSM, WLAN, satellite communication, speech recognition,... ) CI/CI(CS) SS 2015 June 23, 2015 Slide 19/26
Optimal state sequence Weather example e.g. at n = 3: Every state possible, one path (per state) has largest L until here (survivor-path) δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 20/26
Viterbi Algorithm General Blackboard CI/CI(CS) SS 2015 June 23, 2015 Slide 21/26
Viterbi Algorithm Induction Assume δ n (i) holds max. likelihood of path ending in q n = s i at time n Likelihood in next time step is straightforward δ n+1 (j) = max {δ n (i) a i,j } b j,xn+1 i And the most probable predecessor is ψ n+1 (j) = arg max {δ n (i) a i,j } i If we also know δ 1 (i), we are (almost) done For initialization, we take the a-priori-p. of the states δ 1 (i) = π i b i,x1 ψ 1 (i) = 0 CI/CI(CS) SS 2015 June 23, 2015 Slide 22/26
Viterbi Algorithm Weather example (1) Most probable path to at n = 2: δ 2 ( ) = max(δ 1 ( ) a,, δ 1 ( ) a,, δ 1 ( ) a, ) b, δ 1 = 0.3 ψ 2 ( ) = STATES Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 23/26
Viterbi Algorithm Weather example (2) Three most probable paths at n = 3: δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 24/26
Viterbi Algorithm Termination and Backtracking Out of the δ n (i) and ψ n (i) we want to recover the optimal state sequence Q = {q 1,..., q N } Last time step n = N: p (X Θ) = max 1 i N s δ N (i), q N maximum likelihood = arg max 1 i N s δ N (i), last state of ML path We have last state ψ n (i) prior ones can be traced back (Backtracking) q n 1 = ψ n (q n) CI/CI(CS) SS 2015 June 23, 2015 Slide 25/26
Viterbi Algorithm Weather example (3) Backtracking of optimal path δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 26/26