Hidden Markov Models NIKOLAY YAKOVETS

Size: px

Start display at page:

Download "Hidden Markov Models NIKOLAY YAKOVETS"

Sophie Sara Hampton
5 years ago
Views:

1 Hidden Markov Models NIKOLAY YAKOVETS

2 A Markov System N states s 1,..,s N S 2 S 1 S 3

3 A Markov System N states s 1,..,s N S 2 S 1 S 3 modeling weather

4 A Markov System state changes over time.. S 1 S 2 S 2 S 3 S 2 S 1 q t time q t 2{s 1,...,s N }

5 A Markov System state changes over time.. S 1 S 2 S S 3 q 2 S 2 S 1 t time modeling weather q t 2{s 1,...,s N }

6 A Markov Property system is memory less.. q t q t+1 S 1 S 2 S 2 S 3 S 2 S 1 time P (q t+1 = S j q t = S i )=P (q t+1 = S j q t = S i, any earlier history)

7 A Markov System Directed Graph S 2 S 1 S 3 P (q t+1 = S j q t = S i ) S i S j

8 Weather Prediction Initial P Transitional P

9 Weather Prediction Initial P Transitional P Probability of 3-day forecast?:

10 Weather Prediction Initial P Transitional P Probability of 3-day forecast?: P( )P( )P( )= 0.1 * 0.7 * 0.3 = 0.021

11 Towards Hidden Markov what if can t observe the current state? for example

12 CRAZY VENDING MACHINE Prefers dispensing either Coke or Iced Tea

13 CRAZY VENDING MACHINE Prefers dispensing either Coke or Iced Tea Changes its mind all the time

14 CRAZY VENDING MACHINE Prefers dispensing either Coke or Iced Tea Changes its mind all the time We don t know its preference at a given moment

15 CRAZY VENDING MACHINE observations hidden states

16 CRAZY VENDING MACHINE observation state state(t+1) state(t)

17 e.g. Initial P 1 0 Transitional P Output P

18 e.g. Probability of vending?:

19 e.g. Probability of vending?: Consider all HMM paths: T( )O( ) T( )O( ) +

20 e.g. Probability of vending?: Consider all HMM paths: T( )O( ) T( )O( ) + T( )O( ) T( )O( ) +

21 e.g. Probability of vending?: Consider all HMM paths: T( )O( ) T( )O( ) + T( )O( ) T( )O( ) + T( )O( ) T( )O( ) +

22 e.g. Probability of vending?: Consider all HMM paths: T( )O( ) T( )O( ) + T( )O( ) T( )O( ) + T( )O( ) T( )O( ) + T( )O( ) T( )O( ) =

23 Hidden Markov Set of states S:! S = {s 1,..,s N }

24 Hidden Markov Set of states S:! S = {s 1,..,s N } Output alphabet K:! K = {k 1,...,k M } = {1,...,M}

25 Hidden Markov Initial state probabilities Π:! ={ i },i2 S

26 Hidden Markov Initial state probabilities Π:! ={ i },i2 S State transition probabilities A:! A = {a ij },i,j 2 S

27 Hidden Markov Initial state probabilities Π:! ={ i },i2 S State transition probabilities A:! A = {a ij },i,j 2 S Symbol emission probabilities B:! B = {b ijk },i,j 2 S, k 2 K

28 Hidden Markov State sequence X:! X =(X 1,..,X T +1 )

29 Hidden Markov State sequence X:! X =(X 1,..,X T +1 ) Output sequence O:! O =(o 1,..,o T )

30 Fundamental Problems Evaluation: "!how likely is certain observation O?!! Given:!!μ = (A, B, Π)!!O! Find:!!P(O μ)?!

31 Naïve Evaluation

32 Naïve Evaluation

33 Naïve Evaluation

34 Naïve Evaluation

35 Naïve Evaluation (2T + 1) N T +1 calculations!

36 Smarter Evaluation Use DP! FW-BW Alg.

37 Smarter Evaluation Use DP! FW-BW Alg. DP Table: state over time

38 Smarter Evaluation Use DP! FW-BW Alg. DP Table: state over time store forward variables: i (t) =P (o 1 o 2 o t 1,X t = i µ)

39 Smarter Evaluation compute forward variables: 1. initialization: i (1) = i

40 Smarter Evaluation compute forward variables: 1. initialization: i (1) = i 2. induction: NX a j (t + 1) = i (t)a ij b ijot i=1

41 Smarter Evaluation compute forward variables: 1. initialization: i (1) = i 2. induction: NX a j (t + 1) = i (t)a ij b ijot 3. total: P (O µ) = i=1 NX i (T + 1) i=1

42 Smarter Evaluation much lower complexity than naïve: 2N 2 T calculations! vs. (2T + 1) N T +1 calculations!

43 Smarter Evaluation much lower complexity than naïve: 2N 2 T calculations! vs. (2T + 1) N T +1 calculations! similarly, can work backwards: i(t) =P (o t o T X t = i, µ)

44 Fundamental Problems Inference: "!!finding X that best explains O?! Given:!!μ = (A, B, Π)! "O! Find:!!argmax P(X O,μ)! X!

45 Smarter Inference Again, use DP! Viterbi Algorithm

46 Smarter Inference Again, use DP! Viterbi Algorithm Store: probability of the most probable path that leads to a node j(t) = max X 1 X t 1 P (X 1 X t 1,o 1 o t 1,X t = j µ)

47 Smarter Inference Again, use DP! Viterbi Algorithm Store: probability of the most probable path that leads to a node j(t) = max X 1 X t 1 P (X 1 X t 1,o 1 o t 1,X t = j µ) backtrack through max solution to find the path

48 Smarter Evaluation compute the variables (fill in the DP table): 1 initialization: i(1) = i

49 Smarter Evaluation compute the variables (fill in the DP table): 1 initialization: 2.2 induction: i(1) = i j(t + 1) = max 1appleiappleN i (t)a ij b ijot

50 Smarter Evaluation compute the variables (fill in the DP table): 1 initialization: 2.2 induction: i(1) = i j(t + 1) = max 1appleiappleN i (t)a ij b ijot 2.2 store backtrace: j(t + 1) = arg max 1appleiappleN i (t)a ij b ijot

51 Smarter Evaluation 3 termination and path readout:

52 Fundamental Problems Estimation: "!!finding μ that best explains O?! Given:!!Otraining! Find:!!argmax P(Otraining,μ)! μ!

53 Estimation: MLE no known analytic method

54 Estimation: MLE no known analytic method find local max using iterative hill-climb

55 Estimation: MLE no known analytic method find local max using iterative hill-climb Baum-Welch: (outline) 1 choose a model μ (perhaps randomly)

56 Estimation: MLE no known analytic method find local max using iterative hill-climb Baum-Welch: (outline) 1 choose a model μ (perhaps randomly) 2 estimate P(O μ)

57 Estimation: MLE no known analytic method find local max using iterative hill-climb Baum-Welch: (outline) 1 choose a model μ (perhaps randomly) 2 estimate P(O μ) 3 choose a revised model μ to maximize the values of the paths used a lot

58 Estimation: MLE no known analytic method find local max using iterative hill-climb Baum-Welch: (outline) 1 choose a model μ (perhaps randomly) 2 estimate P(O μ) 3 choose a revised model μ to maximize the values of the paths used a lot 4 repeat 1-3, hope to converge on values of μ

59 When HMMs are good.. Observations are ordered Random process can be represented by a stochastic finite state machine with emitting states

60 Why HMMs are good.. 1. Statistical Grounding 2. Modularity 3. Transparency of a Model 4. Incorporation of Prior Knowledge

61 Why HMMs are bad.. 1. Markov Chains 2. Local Maxima/Over Fitting 3. Slower Speed

62 Speech Recognition given an audio waveform, would like to robustly extract & recognize any spoken words

63 Target Tracking Radar-based tracking of multiple targets Visual tracking of articulated objects estimate motion of targets in 3D world from indirect, potentially noisy measurements

64 Robot Navigation Landmark SLAM (E. Nebot, Victoria Park) CAD Map (S. Thrun, San Jose Tech Museum) Estimated Map as robot moves, estimate its world geometry

65 Financial Forecasting predict future market behavior from historical data, news reports, expert opinions,..

66 Bioinformatics multiple sequence alignment, gene finding, motif/promoter region finding..

67 HMM Applications HMM can be applied in many more fields where the goal is to recover sequence that is not immediately observable: cryptoanalysis POS tagging MT activity recognition etc.

68 Thank You

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first