ECE521 Lecture 19 HMM cont. Inference in HMM

Similar documents
STA 4273H: Statistical Machine Learning

Linear Dynamical Systems

STA 414/2104: Machine Learning

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Introduction to Machine Learning CMU-10701

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models. Terminology and Basic Algorithms

Bayesian Networks BY: MOHAMAD ALSABBAGH

Multivariate Bayesian Linear Regression MLAI Lecture 11

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bioinformatics 2 - Lecture 4

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Note Set 5: Hidden Markov Models

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Recent Advances in Bayesian Inference Techniques

2 : Directed GMs: Bayesian Networks

Bayesian Machine Learning - Lecture 7

Computational Genomics and Molecular Biology, Fall

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Deep Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

Hidden Markov models

Machine Learning for OR & FE

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Pattern Recognition and Machine Learning

Graphical Models Seminar

Conditional Random Field

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Dynamic Approaches: The Hidden Markov Model

COMP90051 Statistical Machine Learning

13: Variational inference II

Graphical Models for Collaborative Filtering

Sequential Monte Carlo Methods for Bayesian Computation

Probabilistic Machine Learning

Lecture 15. Probabilistic Models on Graph

Lecture 21: Spectral Learning for Graphical Models

Approximate Inference

Mixtures of Gaussians. Sargur Srihari

The Expectation-Maximization Algorithm

CSEP 573: Artificial Intelligence

Linear Dynamical Systems (Kalman filter)

CS 343: Artificial Intelligence

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

9 Forward-backward algorithm, sum-product on factor graphs

Markov Models and Hidden Markov Models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Introduction to Machine Learning

Hidden Markov Models in Language Processing

Lecture 6: Entropy Rate

Introduction to Stochastic Processes

Probabilistic Graphical Models

Hidden Markov Models

Machine Learning Techniques for Computer Vision

CS 5522: Artificial Intelligence II

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Based on slides by Richard Zemel

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

CS 188: Artificial Intelligence Spring Announcements

Introduction to Graphical Models

CS 5522: Artificial Intelligence II

Probabilistic Machine Learning. Industrial AI Lab.

p L yi z n m x N n xi

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models. Terminology and Basic Algorithms

Stochastic Variational Inference

CS 5522: Artificial Intelligence II

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 5522: Artificial Intelligence II

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Dimension Reduction (PCA, ICA, CCA, FLD,

Machine Learning for Data Science (CS4786) Lecture 19

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Learning Energy-Based Models of High-Dimensional Data

L23: hidden Markov models

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

ECE521 week 3: 23/26 January 2017

CS 343: Artificial Intelligence

p(d θ ) l(θ ) 1.2 x x x

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

HMM part 1. Dr Philip Jackson

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Statistical Methods for NLP

Lecture 4: Probabilistic Learning

Hidden Markov Models

BAYESIAN MACHINE LEARNING.

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Transcription:

ECE521 Lecture 19 HMM cont. Inference in HMM

Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2

Formally, a hidden Markov model defines a generative process on a sequence of observed random variables through its corresponding latent sequence. HMMs are generative models. state: observation/input: State variables: Hidden & Markov Observations: 3

Applications of HMMs: Speech recognition: Cortana, Siri, Google Assistant (before 2016) inputs: observed Fourier coeff. states: utterances Bioinformatics: GeneMark and its variance inputs: raw DNA sequence states: protein-coding region or not Communication: coding theory, error-correction-codes inputs: raw bit stream states: corrected bit stream Tracking: Kalman filtering, particle filtering inputs: raw sensory data states: location, velocity, pose Source: Dahl et al. 2012; Kitzman et al 2010; Lu et al. 2007 4

Definition: Observations: States: State transition probabilities: Emission probabilities: 5

Definition: Observations: Sequence modelling assumption: sharing conditional probability distribution. We gain computational efficiency: state transition memory requirement O(D* Z ^2) vs. O( Z ^2) States: State transition probabilities: (shared across the sequence/timesteps) Emission probabilities: (shared across the sequence/timesteps) 6

Example1: Consider an HMM with three discrete states: We have the following transition probabilities 0.9 stochastic finite state machine 1 0.9 0.1 0.1 0.9 3 0.1 2 7

Example2: Consider an HMM with three discrete states: We have the following transition probabilities 0.5 The transition probability defines a degree 2 Markov model. Generate states: we can generate a sequence of states as the following: Consider an initial state z1 = 2, we can sample z2 and repeat the process for the next timestep. 0.5 1 0.5 0.1 0.5. 3 2 0.1 0.8 8

Example2: Consider an HMM with three discrete states: We have the following transition probabilities 0.5 The transition probability defines a degree 2 Markov model. 0.5. Generate states: we can generate a sequence of states as the following: 3 2 Consider an initial state z1 = 2 0.1 The samples of a sequence hidden states: 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 0.5 1 0.5 0.1 0.8 9

Example2: Consider an HMM with three discrete states: We have the following transition probabilities 0.5 1 0.5 0.5 0.1 0.8 Prediction: we can even predict the future states given the current state using the transition probability Consider the following state sequence: 2, 2, 2, 2, 2, 3, 3, 2, 2, What is the next states in the sequence? 0.5. 3 2 0.1 10

Example2: Consider an HMM with three discrete states: We have the following transition probabilities 0.5 Prediction: we can even predict the future states given the current state using the transition probability Consider the following state sequence: 2, 2, 2, 2, 2, 3, 3, 2, 2, What is the 15th hidden state in the sequence? 1 0.1 0.5 0.5 0.5. 3 2 0.1 0.8

Example3: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following emission probabilities: a particular realization: Generate observations: let the state be fixed at z=2. A sequence of observations can be generated(emitted) from the emission probability distribution by sampling x: C G C C C C C C C G C C C C 12

Example3: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following emission probabilities: a particular realization: Generate observations: let the state be fixed at z=1. A sequence of observations can be generated(emitted) from the emission probability distribution by sampling x: T T T A T T A A T T T A T T 13

Example4: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following transition and emission probabilities: Generate states and observations: ancestral sampling: a particular realization: 2 2 2 2 2 3 3 2 2 2 2 3 2 2 C G C C C G G C C G C G C C 14

Example5: Consider an HMM with three discrete states: The observations are 2D Gaussians: x2 x1 Source: Bishop 2008 samples 15

Example6: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following transition and emission probabilities: Inference: What are latent states that generate the given observations??????????????? Given a sequence of observations: C C C C G C C T T A C G C C 16

Example6: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following transition and emission probabilities: Inference: What are latent states that generate the given observations??????????????? Given a sequence of observations: C C C C G C C T T A C G C C 17

Example6: Consider an HMM with three discrete states: The observations are also discrete random variables: We have the following transition and emission probabilities: Inference: What are latent states that generate the given observations? z=2? z=1? z=2? Given a sequence of observations: C C C C G C C T T A C G C C 18

We can mathematically reason about the inference problems using the joint probability of the HMM and the Bayes rule. Joint distributions of the observations and the latent states in an HMM: transition emission Inference: the conditional distribution (posterior) over the latent states given the observations. Prediction: the marginal distribution of the next observation given the previous observations. 19

Inference in HMMs Inference and marginalization (and generation) are the fundamental operations performed on a graphical model. They are the two sides of the same coin. We have to perform marginalization for inference. E.g. given the joint dist. Computing the marginal distribution of the observations requires sum over the latent states: The marginal dist. is used for normalizing the posterior inference: 20

Sidetracking: Inference in HMMs Let us consider a simpler marginalization problem first. Recall the degree 2 Markov model that has a chain structure: z1 zt Joint distribution: Suppose we would like to obtain the marginal distribution of zt that is Is there any shortcut to perform this summation? Insight: each term in the product depends only on a subset of the marginalized variables! 21

Sidetracking: Inference in HMMs The shortcut idea is distribute the summations into the product: z1 zt First, we make the summation over each random variable explicit This is the intuition behind the message-passing algorithms Then, perform local summation by distributing each sum into the products: Can we simplify this further? 22

Inference in HMMs We can then perform the same distribution trick to obtain the marginals in HMMs: Then, perform local summation by distributing each sum into the products: Additional emission distribution comparing to the plain Markov model 23

Learning in HMMs Learning in HMMs is the same with all the other graphical models. For HMMs, we would like to like to adapt the parameters in the transition and the emission probability distributions. First, obtain the marginal likelihood of the data/observations by performing marginalization over the latent state variables Then, perform maximum likelihood estimation (MLE) through gradient descent by following the gradient of the log marginal likelihood: 24