15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Size: px

Start display at page:

Download "15-381: Artificial Intelligence. Hidden Markov Models (HMMs)"

Hortense Young
5 years ago
Views:

1 15-381: Artificial Intelligence Hidden Markov Models (HMMs)

2 What s wrong with Bayesian networks Bayesian networks are very useful for modeling joint distributions But they have their limitations: - Cannot account for temporal / sequence models - DAG s (no self or any other loops) This is not a valid Bayesian network!

3 Hidden Markov models Model a set of observation with a set of hidden states - Robot movement Observations: range sensor, visual sensor Hidden states: location (on a map) - Speech processing Observations: sound signals Hidden states: parts of speech, words - Biology Observations: DNA base pairs Hidden states: Genes

4 Hidden Markov models Model a set of observation with a set of hidden states - Robot movement Observations: range sensor, visual sensor Hidden states: location (on a map) - 1. Speech Hidden processing states generate observations Observations: sound signals 2. Hidden states transition to other hidden states Hidden states: parts of speech, words - Biology Observations: DNA base pairs Hidden states: Genes

5 Examples: Speech processing

CTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGG

6 Example: Biological data ATGAAGCTACTGTCTTCTATCGAACAAGCATGCG ATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGT CTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGG GCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTT TTCCTCGAGAAGACCTTGACATGATT

9 Example: Gambling on dice outcome Two dices, both skewed (output model). Can either stay with the same dice or switch to the second dice (transition mode) A B

10 A Hidden Markov model A set of states {s 1 s n } - In each time point we are in exactly one of these states denoted by q t Π i, the probability that we start at state s i A transition probability model, P(q t = s i q t-1 = s j ) A set of possible outputs Σ - In time point t we emit a symbol σ Σ An emission probability model, p(o t = σ s i ) A B

11 The Markov property A set of states {s 1 s n } - In each time point we are in exactly one of these states denoted by q t Π i, the probability that we start at state s i A transition probability model, P(q t = s i q t-1 = s j ) A set of possible outputs Σ An important aspect of this definitions is the Markov property: q - In time point t we emit a symbol o t Σ t+1 is conditionally independent of q t-1 (and any earlier time points) An emission given q t probability model, p(o t s i ) More formally P(q t+1 = s i q t = s j ) = P(q t+1 = s i q t = s j,q t-1 = 0.8 s j ) 0.8 A

12 What can we ask when using a HMM? A few examples: What dice is currently being used? What is the probability of a 6 in the next role? What is the probability of 6 in any of the next 3 roles? A B

13 Inference in HMMs Computing P(Q) and P(q t = s i ) - If we cannot look at observations Computing P(Q O) and P(q t = s i O) - When we have observation and care about the last state only Computing argmax Q P(Q O) - When we care about the entire path

14 What dice is currently being used? There where t rounds so far We want to determine P(q t = A) Lets assume for now that we cannot observe any outputs (we are blind folded) How can we compute this? A B 0.5

15 P(q t = A)? Simple answer: Lets determine P(Q) where Q is any path that ends in A Q = q 1, q t-1, A P(Q) = P(q 1, q t-1, A) = P(A q 1, q t-1 ) P(q 1, q t-1 ) = P(A q t-1 ) P(q 1, q t-1 ) = = P(A q t-1 ) P(q 2 q 1 ) P(q 1 ) Markov property! Initial probability A B

16 P(q t = A)? Simple answer: 1. Lets determine P(Q) where Q is any path that ends in A Q = q 1, q t-1, A P(Q) = P(q 1, q t-1, A) = P(A q 1, q t-1 ) P(q 1, q t-1 ) = P(A q t-1 ) P(q 1, q t-1 ) = = P(A q t-1 ) P(q 2 q 1 ) P(q 1 ) 2. P(q t = A) = ΣP(Q) where the sum is over all sets of t sates that end in A

17 P(q t = A)? Simple answer: 1. Lets determine P(Q) where Q is any path that ends in A Q = q 1, q t-1, A P(Q) = P(q 1, q t-1, A) = P(A q 1, q t-1 ) P(q 1, q t-1 ) = P(A q t-1 ) P(q 1, q t-1 ) = = P(A q t-1 ) P(q 2 q 1 ) P(q 1 ) 2. P(q t = A) = ΣP(Q) where the sum is over all sets of t sates that end in A Q: How many sets Q are there? A: A lot! (2 t-1 ) Not a feasible solution

18 P(q t = A), the smart way Lets define p t (i) = probability state i at time t = p(q t = s i ) We can determine p t (i) by induction 1. p 1 (i) = Π i 2. p t (i) =?

19 P(q t = A), the smart way Lets define p t (i) = probability state i at time t = p(q t = s i ) We can determine p t (i) by induction 1. p 1 (i) = Π i 2. p t (i) = Σ j p(q t = s i q t-1 = s j )p t-1 (j)

20 P(q t = A), the smart way Lets define p t (i) = probability state i at time t = p(q t = s i ) We can determine p t (i) by induction 1. p 1 (i) = Π i 2. p t (i) = Σ j p(q t = s i q t-1 = s j )p t-1 (j) This type of computation is called dynamic programming Time / state s1 t1.3 t2 t3 Complexity: O(n 2 *t) s2.7 Number of states in our HMM

21 Inference in HMMs Computing P(Q) and P(q t = s i ) Computing P(Q O) and P(q t = s i O) Computing argmax Q P(Q)

22 But what if we observe outputs? So far, we assumed that we could not observe the outputs In reality, we almost always can. v P(v A) P(v B) A B

23 But what if we observe outputs? So far, we assumed that we could not observe the outputs In reality, we almost always Does can. observing the sequence v P(v A) P(v B) , 6, 4, 5, 6, 6 Change our belief about the state? A B

24 P(q t = A) when outputs are observed We want to compute P(q t = A O 1 O t ) For ease of writing we will use the following notations (common in literature) a i,j = P(q t = s i q t-1 = s j ) b i (o t ) = P(o t s i )

25 P(q t = A) when outputs are observed We want to compute P(q t = A O 1 O t ) Lets start with a simpler question. Given a sequence of states Q, what is P(Q O 1 O t ) = P(Q O)? - It is pretty simple to move from P(Q) to P(q t = A ) - In some cases P(Q) is the more important question - Speech processing - NLP

26 P(Q O) We can use Bayes rule: P ( QO ) = P( O Q) P( Q) P( O) Easy, P(O Q) = P(o 1 q 1 ) P(o 2 q 2 ) P(o t q t )

27 P(Q O) We can use Bayes rule: P ( QO ) = P( O Q) P( Q) P( O) Easy, P(Q) = P(q 1 ) P(q 2 q 1 ) P(q t q t-1 )

28 P(Q O) We can use Bayes rule: P ( QO ) = P( O Q) P( Q) P( O) Hard!

29 P(O) What is the probability of seeing a set of observations: - An important question in it own rights, for example classification using two HMMs Define α t (i) = P(o 1, o 2, o t q t = s i ) α t (i) is the probability that we: 1. Observe o 1, o 2, o t 2. End up at state i How do we compute α t (i)?

30 Computing α t (i) " t +1 (i) = P(O 1 KO t +1 #q t +1 = s i ) = $ P(O 1 KO t #q t = s j #O t +1 #q t +1 = s i ) = j $ P(O t +1 #q t +1 = s i O 1 KO t #q t = s j )P(O 1 KO t #q t = s j ) = j $ P(O t +1 #q t +1 = s i O 1 KO t #q t = s j )" t ( j) = j $ P(O t +1 q t +1 = s i )P(q t +1 = s i q t = s j )" t ( j) = j $ j α 1 (i) = P(o 1 q 1 = i) = P(o 1 q 1 = s i )Π i We must be at a state in time t b i (O t +1 )a i, j " t ( j) chain rule Markov property

31 Example: Computing α 3 (B) We observed 2,3,6 α 1 (A) = P(2 q 1 = A) = P(2 q 1 = A)Π A =.2*.7 =.14, α 1 (B) =.1*.3 =.03 α 2 (A) = Σ j=a,,b b A (3)a j,a α 1 ( j)=.2*.8*.14+.2*.2*.03 = , α 2 (B) = α 3 (B) = Σ j=a,,b b B (6)a j,b α 2 ( j)=.3*.2* *.8*.0052 = Π A =0.7 Π b =0.3 v P(v A) P(v B) A B

32 Where we are We want to compute P(Q O) For this, we only need to compute P(O) We know how to compute α t (i) From here its easy α t (i) = P(o 1, o 2, o t q t = s i ) so P(O) = P(o 1, o 2, o t ) = Σ i P(o 1, o 2, o t q t = s i ) = Σ i α t (i) note that p(q t =s i o 1, o 2, o t ) = " t (i) # j " t (j) P(A B) = P(A B) / P(B)

33 Complexity How long does it take to compute P(Q O)? P(Q): O(n) P(O Q): O(n) P(O): O(n 2 t)

34 Inference in HMMs Computing P(Q) and P(q t = s i ) Computing P(Q O) and P(q t = s i O) Computing argmax Q P(Q)

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture