Inference in Explicit Duration Hidden Markov Models

Similar documents
Hidden Markov models: from the beginning to the state of the art

Bayesian Hidden Markov Models and Extensions

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

A Brief Overview of Nonparametric Bayesian Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Bayesian Nonparametrics for Speech and Signal Processing

Master 2 Informatique Probabilistic Learning and Data Analysis

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Bayesian Nonparametric Hidden Semi-Markov Models

Bayesian nonparametric models for bipartite graphs

Dynamic Approaches: The Hidden Markov Model

Graphical Models for Query-driven Analysis of Multimodal Data

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

Stochastic Variational Inference for the HDP-HMM

HMM part 1. Dr Philip Jackson

Bayesian Machine Learning - Lecture 7

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

The Infinite Factorial Hidden Markov Model

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Hidden Markov Models. Terminology, Representation and Basic Problems

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Learning from Sequential and Time-Series Data

ECE521 Lecture 19 HMM cont. Inference in HMM

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

Expectation Propagation in Dynamical Systems

Machine Learning for OR & FE

Conditional Random Field

Gaussian Mixture Model

order is number of previous outputs

An Introduction to Bayesian Machine Learning

Note Set 5: Hidden Markov Models

Intelligent Systems (AI-2)

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Lecture 3a: Dirichlet processes

Bayesian Nonparametric Models

Household Energy Disaggregation based on Difference Hidden Markov Model

Infinite Hierarchical Hidden Markov Models

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Linear Dynamical Systems

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Lecture 3: Machine learning, classification, and generative models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Methods for Machine Learning

Advanced Data Science

Chapter 05: Hidden Markov Models

IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1. Variational Infinite Hidden Conditional Random Fields

CPSC 540: Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Bayesian Nonparametrics: Dirichlet Process

Hidden Markov models

Non-parametric Clustering with Dirichlet Processes

Brief Introduction of Machine Learning Techniques for Content Analysis

An Evolutionary Programming Based Algorithm for HMM training

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models Hamid R. Rabiee

Probabilistic Time Series Classification

Infering the Number of State Clusters in Hidden Markov Model and its Extension

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models and Gaussian Mixture Models

Nonparametric Bayesian Methods (Gaussian Processes)

Gentle Introduction to Infinite Gaussian Mixture Modeling

Introduction to Artificial Intelligence (AI)

Basic math for biology

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Mixtures of Bernoulli Distributions

CSEP 573: Artificial Intelligence

Probabilistic Machine Learning

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Fast MCMC sampling for Markov jump processes and extensions

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS 343: Artificial Intelligence

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Two Useful Bounds for Variational Inference

Fast Inference and Learning with Sparse Belief Propagation

Intelligent Systems (AI-2)

Hidden Markov Models. Terminology and Basic Algorithms

Supplementary Notes: Segment Parameter Labelling in MCMC Change Detection

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Introduction to Machine Learning

Topic Modelling and Latent Dirichlet Allocation

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Statistical Machine Learning from Data

Approximate Inference Part 1 of 2

Expectation propagation for signal detection in flat-fading channels

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Efficient Sensitivity Analysis in Hidden Markov Models

Transcription:

Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 / 38

Hidden Markov Models Hidden Markov models (HMMs) [Rabiner, 1989] are an important tool for data exploration and engineering applications. Applications include Speech recognition [Jelinek, 1997, Juang and Rabiner, 1985] Natural language processing [Manning and Schütze, 1999] Hand-writing recognition [Nag et al., 1986] DNA and other biological sequence modeling applications [Krogh et al., 1994] Gesture recognition [Tanguay Jr, 1995, Wilson and Bobick, 1999] Financial data modeling [Rydén et al., 1998]... and many more. Wood (Columbia University) EDHMM Inference November, 2011 2 / 38

Graphical Model: Hidden Markov Model π m z 0 z 1 z 2 z 3 z T θ m y 1 y 2 y y K 3 T Wood (Columbia University) EDHMM Inference November, 2011 3 / 38

Notation: Hidden Markov Model z t z t 1 = m Discrete(π m ) y t z t = m F θ (θ m ) A =... π 1 π m π K... Wood (Columbia University) EDHMM Inference November, 2011 4 / 38

HMM: Typical Usage Scenario (Character Recognition) Training data: multiple observed y t = {v t, h t } sequences of stylus positions for each kind of character Task: train Σ different models, one for each character Latent states: sequences of strokes Usage: classify new stylus position sequences using trained models M σ = {A σ, Θ σ } P(M σ y 1,..., y T ) P(y 1,..., y T M σ )P(M σ ) Wood (Columbia University) EDHMM Inference November, 2011 5 / 38

Shortcomings of Original HMM Specification Latent state dwell times are not usually geometrically distributed P(z t = m,..., z t+l = m A) L = P(z t+l+1 = m z t+l = m, A) l=1 = Geometric(L; π m (m)) Prior knowledge often suggests structural constraints on allowable transitions The state cardinality of the latent Markov chain K is usually unknown Wood (Columbia University) EDHMM Inference November, 2011 6 / 38

Explicit Duration HMM / H. Semi-Markov Model [Mitchell et al., 1995, Murphy, 2002, Yu and Kobayashi, 2003, Yu, 2010] Latent state sequence z = ({s 1, r 1 },..., {s T, r T }) Latent state id sequence s = (s 1,..., s T ) Latent remaining duration sequence r = (r 1,..., r T ) State-specific duration distribution F r (λ m ) Other distributions the same An EDHMM transitions between states in a different way than does a typical HMM. Unless r t = 0 the current remaining duration is decremented and the state does not change. If r t = 0 then the EDHMM transitions to a state m s t according to the distribution defined by π st Problem: inference requires enumerating possible durations. Wood (Columbia University) EDHMM Inference November, 2011 7 / 38

EDHMM notation Latent state z t = {s t, r t } is tuple consisting of state identity and time left in state. s t s t 1, r t 1 r t s t, r t 1 y t s t F θ (θ st ) { I(s t = s t 1 ), r t 1 > 0 Discrete(π st 1 ), { r t 1 = 0 I(r t = r t 1 1), r t 1 > 0 F r (λ st ), r t 1 = 0 Wood (Columbia University) EDHMM Inference November, 2011 8 / 38

EDHMM: Graphical Model λ m r 0 r 1 r 2 r 3 r T π m s 0 s 1 s 2 s 3 s T θ m K y 1 y 2 y 3 y T Wood (Columbia University) EDHMM Inference November, 2011 9 / 38

Structured HMMs: i.e. left-to-right HMM [Rabiner, 1989] Example: Chicken pox Observations vital signs Latent states pre-infection, infected, post-infection a State transition structure can t go from infected to pre-infection a disregarding shingles Structured transitions also imply zeros in the transition matrix A, i.e. p(s t = m s t 1 = l) = 0 m < l Wood (Columbia University) EDHMM Inference November, 2011 10 / 38

Bayesian HMM We will put a prior on parameters so that we can effect a solution that conforms to our ideas about what the solution should look like Structured prior examples A i,j = 0 (hard constraints) A i,j P j A i,j (rich get richer) Regularization means that we can specify a model with more parameters than could possibly be needed infinite complexity (i.e. K ) avoids many model selection problems extra states can be thought of as auxiliary or nuisance variables inference requires sampling in a model with countably infinite support Posterior over latent variables encodes uncertainty about interpretation of data. Wood (Columbia University) EDHMM Inference November, 2011 11 / 38

Bayesian HMM H z π m z 0 z 1 z 2 z 3 z T H y θ m y 1 y 2 y y K 3 T π m H z θ m H y z t z t 1 = m Discrete(π m ) y t z t = m F θ (θ m ) Wood (Columbia University) EDHMM Inference November, 2011 12 / 38

Infinite HMMs (IHMM) [Beal et al., 2002, Teh et al., 2006] c 0 γ c 1 π m z 0 z 1 z 2 z 3 z T H y θ m y 1 y 2 y y 3 T K, Sticky Infinite HMM [Fox et al., 2011] Wood (Columbia University) EDHMM Inference November, 2011 13 / 38

Inference for the Explicit Duration HMM (EDHMM) Simple Idea Result Infinite HMMs and EDHMMs share fundamental characteristic: countable support Inference techniques for Bayesian nonparametric (infinite) HMMs can be used for EDHMM inference Approximation-free, efficient inference algorithm for EDHMM inference Wood (Columbia University) EDHMM Inference November, 2011 14 / 38

Bayesian EDHMM: Graphical Model H r λ m r 0 r 1 r 2 r 3 r T H z π m s 0 s 1 s 2 s 3 s T H y θ m K y 1 y 2 y 3 y T A choice of prior λ m H r Gamma(H r ) π m H z Dir(1/K, 1/K,..., 1/K, 0, 1/K,..., 1/K, 1/K) Wood (Columbia University) EDHMM Inference November, 2011 15 / 38

EDHMM Inference: Beam Sampling [Dewar, Wiggins and W, 2011] We employ the forward-filtering, backward slice-sampling approach for the IHMM of [Van Gael et al., 2008], in which the state and duration variables s and r are sampled conditioned on auxiliary slice variables u. Net result: efficient, always finite forward-backward procedure for sampling latent states and the amount of time spent in them. Wood (Columbia University) EDHMM Inference November, 2011 16 / 38

Auxiliary Variables for Sampling Objective: get samples of x. λ x Wood (Columbia University) EDHMM Inference November, 2011 17 / 38

Auxiliary Variables for Sampling Objective: get samples of x. Sometimes it is easier to introduce an auxiliary variable u and to Gibbs sample the joint P(x, u) (i.e. sample from P(x u; λ) then P(u x, λ), etc.) then discard the u values than it is to directly sample from p(x λ). λ x λ x u Wood (Columbia University) EDHMM Inference November, 2011 17 / 38

Auxiliary Variables for Sampling Objective: get samples of x. Sometimes it is easier to introduce an auxiliary variable u and to Gibbs sample the joint P(x, u) (i.e. sample from P(x u; λ) then P(u x, λ), etc.) then discard the u values than it is to directly sample from p(x λ). Useful when: p(x λ) does not have a known parametric form but adding u results in a parametric form and when x has countable support and sampling it requires enumerating all values. λ x λ x u Wood (Columbia University) EDHMM Inference November, 2011 17 / 38

Slice Sampling: A very useful auxiliary variable sampling trick Pedagogical Example: x λ Poisson(λ) (countable support) enumeration strategy for sampling x (impossible) 1 auxiliary variable u with P(u x, λ) = I(0 u P(x λ)) P(x λ) Note: Marginal distribution of x is P(x λ) = u P(x, u λ) = u = u = u P(x λ)p(u x, λ) I(0 u P(x λ)) P(x λ) P(x λ) I(0 u P(x λ)) = P(x λ) 1 Necessary in EDHMM Wood (Columbia University) EDHMM Inference November, 2011 18 / 38

Slice Sampling: A very useful auxiliary variable sampling trick This suggests a Gibbs sampling scheme: alternately sampling from P(x u, λ) I(u P(x λ)) finite support, uniform above slice, enumeration possible P(u x, λ) = I(0 u P(x λ)) P(x λ) uniform between 0 and y = P(x λ) then discarding the u values to arrive at x samples marginally distributed according to P(x λ). PoissonPDF(r t ; λ m = 5) 0.15 0.1 0.05 u t =.05 0 1 2 3 4 5 6 7 8 9 10 r t Wood (Columbia University) EDHMM Inference November, 2011 19 / 38

EDHMM Graphical Model with Auxiliary Variables r 0 r 1 r 2 r 3 r T u 1 u 2 u T s 0 s 1 s 2 s 3 s T y 1 y 2 y 3 y T Wood (Columbia University) EDHMM Inference November, 2011 20 / 38

EDHMM Inference: Beam Sampling With auxiliary variables defined as and p(u t z t, z t 1 ) = I(0 < u t < p(z t z t 1 )) p(z t z t 1 ) p(z t z t 1 ) = t, r t ) (s t 1, r t 1 )) p((s { = r t 1 > 0, I(s t = s t 1 )I(r t = r t 1 1) r t 1 = 0, π st 1 s t F r (r t ; λ st ). one can run standard forward-backward conditioned on u s. Forward-backward slice sampling only has to consider a finite number of successor states at each timestep. Wood (Columbia University) EDHMM Inference November, 2011 21 / 38

Forward recursion ˆα t (z t ) = p(z t, Y t 1, U t 1) = z t 1 p(z t, z t 1, Y t 1, U t 1) z t 1 p(u t z t, z t 1 )p(z t, z t 1, Y t 1, U t 1 1 ) = z t 1 I(0 < u t < p(z t z t 1 ))p(y t z t )ˆα t 1 (z t 1 ). Wood (Columbia University) EDHMM Inference November, 2011 22 / 38

Backward Sampling Recursively sample a state sequence from the distribution p(z t 1 z t, Y, U) which can expressed in terms of the forward variable: p(z t 1 z t, Y, U) p(z t, z t 1, Y, U) p(u t z t, z t 1 )p(z t z t 1 )ˆα t 1 (z t 1 ) I(0 < u t < p(z t z t 1 ))ˆα t 1 (z t 1 ). Wood (Columbia University) EDHMM Inference November, 2011 23 / 38

Results To illustrate EDHMM learning on synthetic data, five hundred datapoints were generated using a 4 state EDHMM with Poisson duration distributions λ = (10, 20, 3, 7) and Gaussian emission distributions with means all unit variance. µ = ( 6, 2, 2, 6) Wood (Columbia University) EDHMM Inference November, 2011 24 / 38

EDHMM: Synthetic Data Results 10 emission/mean 5 0 5 10 0 100 200 300 400 500 time Wood (Columbia University) EDHMM Inference November, 2011 25 / 38

EDHMM: Synthetic Data, State Mean Posterior 600 count 400 200 0 10 5 0 5 10 emission mean Wood (Columbia University) EDHMM Inference November, 2011 26 / 38

EDHMM: Synthetic Data, State Duration Parameter Posterior 300 count 200 100 0 0 5 10 15 20 25 duration rate Wood (Columbia University) EDHMM Inference November, 2011 27 / 38

EDHMM: Nanoscale Transistor Spontaneous Voltage Fluctuation 2 1 emission/mean 0 1 2 3 0 1000 2000 3000 4000 5000 time Wood (Columbia University) EDHMM Inference November, 2011 28 / 38

EDHMM: States Not Distinguishable By Output Task: understand system with states that have identical observation distributions and differ only in duration distribution. Observation distributions have means µ 1 = 0, µ 2 = 0, and µ 3 = 3 and the duration distributions have rates λ 1 = 5, λ 2 = 15, λ 3 = 20. (Next slide) a) observations; b) true states overlaid with 20 randomly selected state traces produced by the sampler. Samples from the posterior observation distribution mean are shown in c), and samples from the posterior duration distribution rates are shown in d). Wood (Columbia University) EDHMM Inference November, 2011 29 / 38

EDHMM: States Not Distinguishable By Output yt 8 6 4 2 02 4 0 100 200 300 400 500 time (a) xt 2.0 1.5 1.0 0.5 0.0 0 100 200 300 400 500 time (b) p(µj Y) 6 5 4 3 2 1 0 2 1 0 1 y 2 3 4 (c) p(λj Y) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 5 10 15 20 25 time (d) Wood (Columbia University) EDHMM Inference November, 2011 30 / 38

EDHMM vs. IHMM: Modeling the Morse Code Cepstrum Wood (Columbia University) EDHMM Inference November, 2011 31 / 38

Wrap-Up Novel inference procedure for EDHMMs that doesn t require truncation and is more efficient than considering all possible durations. number of transitions visited 55 50 45 40 35 30 25 0 200 400 600 800 1000 iteration Figure: Mean number of transitions considered per time point by the beam sampler for 1000 post-burn-in iterations. Compare to (KT ) 2 = O(10 6 ) transitions that would need to be considered by standard forward backward without truncation, the only surely safe, truncation-free alternative. Wood (Columbia University) EDHMM Inference November, 2011 32 / 38

Future Work Novel Gamma process construction for dependent, structured, infinite dimensional HMM transition distributions. Generalize to spatial prior on HMM states ( location ) Simultaneous location and mapping Process diagram modeling for systems biology Applications; seeking users Wood (Columbia University) EDHMM Inference November, 2011 33 / 38

Questions? Thank you! Wood (Columbia University) EDHMM Inference November, 2011 34 / 38

Bibliography I M J Beal, Z Ghahramani, and C E Rasmussen. The Infinite Hidden Markov Model. In Advances in Neural Information Processing Systems, pages 29 245, March 2002. M Dewar, C Wiggins, and F Wood. Inference in hidden Markov models with explicit state duration distributions. In Submission, 2011. E B Fox, E B Sudderth, M I Jordan, and A S Willsky. A Sticky HDP-HMM with Application to Speaker Diarization. Annals of Applied Statistics, 5(2A):1020 1056, 2011. F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997. B H Juang and L R Rabiner. Mixture Autoregressive Hidden Markov Models for Speech Signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6):1404 1413, 1985. Wood (Columbia University) EDHMM Inference November, 2011 35 / 38

Bibliography II A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modelling. Journal of Molecular Biology, 235:1501 1531, 1994. C. Manning and H. Schütze. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, 1999. C Mitchell, M Harper, and L Jamieson. On the complexity of explicit duration HMM s. IEEE Transactions on Speech and Audio Processing, 3 (3):213 217, 1995. K P Murphy. Hidden semi-markov models (HSMMs). Technical report, MIT, 2002. R. Nag, K. Wong, and F. Fallside. Script regonition using hidden Markov models. In ICASSP86, pages 2071 2074, 1986. L R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE, pages 257 286, 1989. Wood (Columbia University) EDHMM Inference November, 2011 36 / 38

Bibliography III T. Rydén, T. Teräsvirta, and S. rasbrink. Stylized facts of daily return series and the hidden markov model. Journal of Applied Econometrics, 13(3):217 244, 1998. D.O. Tanguay Jr. Hidden Markov models for gesture recognition. PhD thesis, Massachusetts Institute of Technology, 1995. Y W Teh, M I Jordan, M J Beal, and D M Blei. Hierarchical Dirichlet Processes. Journal of the American Statistical Association, 101(476): 1566 1581, 2006. J Van Gael, Y Saatci, Y W Teh, and Z Ghahramani. Beam sampling for the infinite hidden Markov model. In Proceedings of the 25th International Conference on Machine Learning, pages 1088 1095. ACM, 2008. A.D. Wilson and A.F. Bobick. Parametric hidden Markov models for gesture recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 21(9):884 900, 1999. Wood (Columbia University) EDHMM Inference November, 2011 37 / 38

Bibliography IV S Yu and H Kobayashi. An Efficient Forward Backward Algorithm for an Explicit-Duration Hidden Markov Model. Signal Processing letters, 10 (1):11 14, 2003. S Z Yu. Hidden semi-markov models. Artificial Intelligence, 174(2): 215 243, 2010. Wood (Columbia University) EDHMM Inference November, 2011 38 / 38