Chapter 05: Hidden Markov Models

Size: px

Start display at page:

Download "Chapter 05: Hidden Markov Models"

Jasmin Anderson
5 years ago
Views:

1 LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 05: Hidden Markov Models Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control Systems Learning and Inference in Graphical Models. Chapter 05 p. 1/21

2 References for this chapter Christopher M. Bishop, Pattern Recognition and Machine Learning, ch. 13, Springer, 2006 Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, ch. 15, Prentice Hall, 2003 Gernot A. Fink, Mustererkennung mit Markov-Modellen : Theorie, Praxis, Anwendungsgebiete, Teubner, 2003 Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, In: Proceedings of the IEEE, vol. 77, no. 2, pp , 1989 Andrew J. Viterbi, Error Bounds for convolutional codes and an asymptotically optimum decoding algorithm, In: IEEE Transactions on Information Theory, vol. 13, no. 2, pp , 1967 Stefan Hensel, Wirbelstrombasierte Lokalisierung von Schienenfahrzeugen in topologischen Karten, KIT Scientific Publishing, 2011 Learning and Inference in Graphical Models. Chapter 05 p. 2/21

3 Processes in time Model the following process with a Bayesian network: In every week, a student is either in good mood or in bad mood. The mood is changing randomly from week to week. If the mood was good in the previous week it will remain good in the subsequent week with probability of 0.7, otherwise it will change to bad mood. If the mood was bad, it will remain bad with a probability of 0.5. The change of mood over all the weeks is statistically independent. Extend this process: A lecturer is making a test each week and observes the result of the student which is either good or bad. Assume that the result is good with probability of 0.7 if the student was in good mood and with a probability of 0.4 if the student was in bad mood. Otherwise, the outcome of the test was bad. Learning and Inference in Graphical Models. Chapter 05 p. 3/21

4 Hidden Markov model Definitions: A Markov model or Markov chain is a Bayesian network which is organized as a chain of random variablesx i wherex i+1 solely depends onx i. X 1 X 2 X i X n A Hidden Markov model (HMM) is a Bayesian network with two kind of nodes, state variablesx i and observation nodesz i so thatx i+1 solely depends onx i andz i solely depends onx i. Z 1 Z 2 Z i Z n X 1 X 2 X i X n Markov chains and HMMs are used to model stochastic processes in time. i refers to the point in time. Learning and Inference in Graphical Models. Chapter 05 p. 4/21

5 Hidden Markov model Joint probability of a HMM: p(x 1,...,X n,z 1,...,Z n ) = p(x 1 ) n p(x i X i 1 ) n p(z i X i ) i=2 i=1 How can we calculatep(x n Z 1,...,Z n )? Sum-product algorithm for HMMs in this context also known as Forward algorithm Observation: once we calculated all messages until nodex i we only need to remindm Xi f Xi X i+1 to calculate the messages for nodex i+1. iterative algorithm Z i f Xi Z i f Xi 1 X i f Xi X i+1 X i X i+1 Learning and Inference in Graphical Models. Chapter 05 p. 5/21

6 Hidden Markov model How can we calculatep(x 1 Z 1,...,Z n )? Sum-product algorithm for HMMs in this context also known as Backward algorithm Z i f Xi Z i X i 1 f Xi 1 X i X i f Xi X i+1 Learning and Inference in Graphical Models. Chapter 05 p. 6/21

7 Hidden Markov model How can we calculatep(x i Z 1,...,Z n )? Sum-product algorithm for HMMs in this context also known as Forward backward algorithm (Rabiner, 1989) Z i f Xi Z i f Xi 1 X i X i f Xi X i+1 Learning and Inference in Graphical Models. Chapter 05 p. 7/21

8 Hidden Markov model How can we calculateargmax x1,...,x n p(x 1,...,X n Z 1,...,Z n )? Max-sum algorithm for HMMs also known as Viterbi algorithm (Viterbi, 1967) Forward-backward and Viterbi algorithm are usually used with categorical distributions. Application areas: speech recognition recognition of handwritings interpretation of sensor signals analysis of economical time series bioinformatics modeling of human behavior Learning and Inference in Graphical Models. Chapter 05 p. 8/21

Stefan Hensel) application in railways engineering task: detect

9 Example: switch detection Recognition of switches in eddy current sensor data (work of Dr.-Ing. Stefan Hensel) application in railways engineering task: detect switches in the signals of a sensor eddy current sensor Learning and Inference in Graphical Models. Chapter 05 p. 9/21

Example: switch detection Physical principle of the eddy current sensor an AC voltage is applied to a field coil E which induces an electromagnetic field which induces eddy currents in the rail an

10 Example: switch detection Physical principle of the eddy current sensor an AC voltage is applied to a field coil E which induces an electromagnetic field which induces eddy currents in the rail an all metal parts below the sensor the eddy currents change the electromagnetic field two sensor coils P1, P2 sense the strength of the electromagnetic field the output of P1 and P2 is subtracted to eliminate the influence of the field coil a second subsystem operates in parallel Learning and Inference in Graphical Models. Chapter 05 p. 10/21

11 Example: switch detection Signals of an eddy current sensor (after filtering) small peaks: clamps of sleepers large peaks: parts of switches, cables, etc. How can we use the signals? velocity estimation switch detection Learning and Inference in Graphical Models. Chapter 05 p. 11/21

12 Example: switch detection How can we use an HMM for switch detection? Observations: different parts of a switch cause different amplitudes of the signal the parts occur in a specific order depending on the way we pass a switch StateX i is one out of 17 possible values: no switch facing left II facing left III facing left IV facing left V facing right II facing right III facing right IV facing right V trailing left II trailing left III trailing left IV trailing left V trailing right II trailing right III trailing right IV trailing right V Learning and Inference in Graphical Models. Chapter 05 p. 12/21

13 Example: switch detection StateX i is one out of 17 possible values: no switch facing left II facing left III facing left IV facing left V facing right II facing right III facing right IV facing right V trailing left II trailing left III trailing left IV trailing left V trailing right II trailing right III trailing right IV trailing right V Transition probabilities are non-zero for reflexive transitions and transitions like: no switch facing left II, no switch facing right II,... facing left II facing left III, facing left III facing left IV,... facing left V no switch, facing right V no switch,... Learning and Inference in Graphical Models. Chapter 05 p. 13/21

14 Example: switch detection ObservationsZ i are real values. Conditional distributionsp(z i X i ) are given by the plot on the right. After specifying the HMM, we can apply the forward algorithm (sum-product) to determine p(x n Z 1,...,Z n ) apply the Viterbi algorithm (max-sum) to determine argmax x1,x n p(x 1,...,X n Z 1,...,Z n ) to determine most probable state values, i.e. to determine when we passed a switch in which way Learning and Inference in Graphical Models. Chapter 05 p. 14/21

15 Example: switch detection Results: tests have been made on several railway lines detection rate 70 90% Learning and Inference in Graphical Models. Chapter 05 p. 15/21

16 HMMs with continuous states Can we use the sum-product algorithm also for HMMs with real-valued state variablesx i? resolving the terms analytically in general hard or impossible requires conjugate distributions Special case: Gauss-linear models. A HMM is Gauss-linear if X 1 N(µ 1,Σ 1 ) X i+1 X i N(A i X i +b i,q i ) Z i X i N(H i X i +d i,r i ) Learning and Inference in Graphical Models. Chapter 05 p. 16/21

17 Gauss-linear HMMs Lemma: for Gauss-linear HMMs all distributionsp(x i Z 1,...,Z i ) are Gaussian. Proof: by induction over i Proof is very technical and omitted therefore Result: p(x i Z 1,...,Z i )=m Xi f Xi X i+1 (x i ) N( µ i, Σ i ) p(x i+1 Z 1,...,Z i )=m fxi X i+1 X i+1 (x i+1 ) N(µ i+1,σ i+1 ) with µ i =µ i +Σ i H T i (H i Σ i H T i +R i ) 1 (z i (H i µ i +d i )) Σ i =Σ i Σ i H T i (H i Σ i H T i +R i ) 1 HΣ i µ i+1 =A i µ i +b i Σ i+1 =A i Σi A T i +Q i Learning and Inference in Graphical Models. Chapter 05 p. 17/21

18 Gauss-linear HMMs Applying the sum-product algorithm we calculate: m X1 f X1 X 2 m fx1x2 X2 m X2 f X2 X 3 m fx2x3 X3 m X3 f X3 X 4. m fx3x4 X4. This yields a two-step algorithm called Kalman filter (Kalman, 1960) Innovation step: calculate µ i, Σ i fromµ i,σ i andz i, i.e. calculatem Xi f Xi X i+1 = p(x i Z 1,...,Z i ) Prediction step: calculateµ i+1,σ i+1 from µ i, Σ i, i.e. calculatem fxi X i+1 X i+1 = p(x i+1 Z 1,...,Z i ) Learning and Inference in Graphical Models. Chapter 05 p. 18/21

19 Kalman filter Application areas: state estimation in linear systems tracking of objects, estimation of object motion robot localization... Nonlinear extensions (extended Kalman filter, unscented Kalman filter) allow usage also for (slightly) nonlinear problems Learning and Inference in Graphical Models. Chapter 05 p. 19/21

20 Kalman filter Example 1: Assume a straight road on which a car is driving with almost constant velocity. We assume that we have a sensor, e.g. a stereo camera, with which we can obtain unbiased measurements of the vehicle position. Design a Kalman filter to estimate the velocity of the car. Example 2: Assume an object that is rotating around one axis. We have two sensors to measure the angular velocity. Sensor 1 is unbiased, i.e. on average it measures the correct velocity, but it is very noisy. Sensor too is low noise, however, it is biased by an unknown offset. We can assume that the angular velocity of the object changes only slightly over time. Design a Kalman filter to obtain accurate, unbiased estimates of the angular velocity. Learning and Inference in Graphical Models. Chapter 05 p. 20/21

21 Summary Hidden Markov models as special case of Bayesian networks Forward, backward and forward-backward algorithm as special cases of the sum-product algorithm Viterbi algorithm as special case of the max-sum algorithm Kalman filter as special case of the sum-product algorithm for Gauss-linear HMMs Learning and Inference in Graphical Models. Chapter 05 p. 21/21

Chapter 03: Bayesian Networks

LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 03: Bayesian Networks Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control