Graphical Models Seminar

Similar documents
Cheng Soon Ong & Christian Walder. Canberra February June 2018

Hidden Markov Models

STA 414/2104: Machine Learning

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Linear Dynamical Systems

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Advanced Data Science

Hidden Markov Models

Hidden Markov Models. Terminology and Basic Algorithms

COMP90051 Statistical Machine Learning

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models Part 2: Algorithms

Dynamic Approaches: The Hidden Markov Model

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models. Terminology and Basic Algorithms

11 The Max-Product Algorithm

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Master 2 Informatique Probabilistic Learning and Data Analysis

Sequential Data and Markov Models

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Basic math for biology

9 Forward-backward algorithm, sum-product on factor graphs

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Note Set 5: Hidden Markov Models

L23: hidden Markov models

Statistical Processing of Natural Language

Hidden Markov Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Factor Graphs and Message Passing Algorithms Part 1: Introduction

CS532, Winter 2010 Hidden Markov Models

Linear Dynamical Systems (Kalman filter)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models. Terminology, Representation and Basic Problems

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Statistical NLP: Hidden Markov Models. Updated 12/15

Data Mining in Bioinformatics HMM

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Directed Probabilistic Graphical Models CMSC 678 UMBC

Hidden Markov Models

6.867 Machine learning, lecture 23 (Jaakkola)

CS711008Z Algorithm Design and Analysis

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

p(d θ ) l(θ ) 1.2 x x x

ECE521 Lecture 19 HMM cont. Inference in HMM

CS 7180: Behavioral Modeling and Decision- making in AI

Bayesian Machine Learning - Lecture 7

Introduction to Probabilistic Graphical Models: Exercises

Inference in Bayesian Networks

Sequential Supervised Learning

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Dept. of Linguistics, Indiana University Fall 2009

Hidden Markov Models

Machine Learning for OR & FE

A gentle introduction to Hidden Markov Models

Hidden Markov Models NIKOLAY YAKOVETS

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Bioinformatics 2 - Lecture 4

An Introduction to Bioinformatics Algorithms Hidden Markov Models

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Today s Lecture: HMMs

Statistical Methods for NLP

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

1. Markov models. 1.1 Markov-chain

Hidden Markov Models

Probabilistic Graphical Models

5. Sum-product algorithm

Statistical Learning

BMI/CS 576 Fall 2016 Final Exam

Introduction to Machine Learning Midterm, Tues April 8

p L yi z n m x N n xi

Machine Learning Lecture 14

Hidden Markov Models in Language Processing

Computational Genomics and Molecular Biology, Fall

Statistical Methods for NLP

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Machine Learning from Data

Hidden Markov Modelling

Introduction to Artificial Intelligence (AI)

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Hidden Markov Models (I)

Multiscale Systems Engineering Research Group

Transcription:

Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013 1 / 23

Forward-Backward Algorithm for HMMs Sum-Product Algorithm for HMMs Viterbi Algorithm 2 / 23

Introduction Sequential data rainfall measurements currency exchange rate acoustics/ speech recognition nucleotide base pair sequences (DNA) sequence of characters in an English sentence 3 / 23

State-Space Model z 0 z 1 z 2 z n z N x 1 x 2 x n x N Figure: Bayesian network for a state space model (z i+1 z i 1 z i ) N p(z 0,..., z N, x 1,..., x N ) = p(z 0 ) p(z i z i 1 )p(x i z i ) (1) i=1 Hidden Markov model: Discrete state-space model Linear dynamical system: Gaussian state-space model. 4 / 23

Forward-Backward Algorithm Find the posterior marginals of the hidden state variables z n given the observations X = {x 1,..., x N } in a SSM. Tree structure: Computation of exact marginals Focus on discrete random variables (HMM). 5 / 23

Derivation of Forward-Backward Algorithm Use of chain rule and conditional independence. p(z n X ) = p(z n, X ) p(x ) = p(x 0,..., x n, z n )p(x n+1,..., x N z n ) p(x ) = α(z n)β(z n ) p(x ) (2) (3) (4) 6 / 23

Forward Recursion α(z n ) = p(x 0,..., x n, z n ) (5) = p(x 0,..., x n z n )p(z n ) (6) = p(x 0,..., x n 1 z n )p(x n z n )p(z n ) (7) = p(x n z n )p(x 0,..., x n 1, z n ) (8) = p(x n z n ) p(x 0,..., x n 1, z n 1, z n ) z n 1 (9) = p(x n z n ) z n 1 p(x 0,..., x n 1, z n z n 1 )p(z n 1 ) (10) = p(x n z n ) z n 1 p(x 0,..., x n 1 z n 1 )p(z n z n 1 )p(z n 1 ) (11) = p(x n z n ) z n 1 p(x 0,..., x n 1, z n 1 )p(z n z n 1 ) (12) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) (13) 7 / 23

Backward Recursion β(z n ) = p(x n+1,..., x N z n ) (14) = z n+1 p(x n+1,..., x N, z n+1 z n ) (15) = z n+1 p(x n+1,..., x N z n, z n+1 )p(z n+1 z n ) (16) = z n+1 p(x n+2,..., x N z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (17) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (18) 8 / 23

Forward-Backward Algorithm Forward recursion α(z n ) = p(x n z n ) z n 1 α(z n 1 )p(z n z n 1 ) (19) Backward recursion β(z n ) = z n+1 β(z n+1 )p(x n+1 z n+1 )p(z n+1 z n ) (20) Starting conditions α(z 0 ) = p(z 0 ), β(z N ) = 1. p(z n X ) = α(z n)β(z n ) p(x ) (21) 9 / 23

Summary Efficient recursive forward-backward algorithm to compute the posterior marginals of the hidden state variables given the observations Computational costs for K states, N steps: O(K 2 N) Brute force costs: O(K N ) 10 / 23

Sum-Product Algorithm for HMMs z 0 z 1 z n 1 z n x 1 x n 1 x n Figure: Bayesian network for a state space model (z i+1 z i 1 z i ) 11 / 23

Sum-Product Algorithm for HMMs Ψ 0 z 1 z n 1 Ψn z n g 1 g n 1 g n x 1 x n 1 x n Figure: Factor graph for a state space model (z i+1 z i 1 z i ) Absorb z 0 in Ψ 0. Replace directed edged by undirected edges Insert factor nodes 12 / 23

Sum-Product Algorithm for HMMs f 1 z 1 z n 1 fn z n Figure: Factor graph for a HMM Absorb the emission probability into transition probability factors f n (z n 1, z n ) = p(z n z n 1 )p(x n z n ) (22) f 1 (z 1 ) = p(z 0 )p(z 1 z 0 )p(x 1 z 1 ) (23) 13 / 23

Sum-Product Algorithm for HMMs f 1 z 1 z n 1 fn z n Figure: Factor graph for a HMM µ fn 1 z n 1 (z n 1 ) = µ zn 1 f n (z n 1 ) (24) µ fn zn (z n ) = f (z }{{} n 1, z n )µ zn 1 f n (z n ) (25) z n 1 α(z n) Analogously for β messages. = z n 1 p(z n z n 1 )p(x n z n ) µ zn 1 f n (z n 1 ) }{{} α(z n 1 ) (26) 14 / 23

Sum-Product Algorithm for HMMs The forward-backward algorithm is the sum-product algorithm in a HMM. 15 / 23

Viterbi Algorithm Find most probable sequence of states {ẑ 1,..., ẑ N } given all observations {x 1,..., x N } Max-product algorithm {ˆx 1,..., ˆx N } = argmax z 1,...,z N p(z 1,..., z N, x 1,..., x N ) (27) Max-sum algorithm {ˆx 1,..., ˆx N } = argmax z 1,...,z N log p(z 1,..., z N, x 1,..., x N ) (28) For HMMs, this is knows as the Viterbi algorithm 16 / 23

Derivation of Viterbi Algorithm Using chain rule and conditional independence. Use factor graph notation and check equivalence. Rule of thumb: Exchange summations by maximizations in the sum-product algorithm. Take log of the argument for the max-sum algorithm. 17 / 23

Recursion for Viterbi Algorithm f 1 z 1 z n f n+1 z n+1 Figure: Factor graph for a HMM µ fn+1 z n+1 (z n+1 ) }{{} ω(z n+1 ) µ zn f n+1 (z n ) = µ fn z n (z n ) (29) (30) = max z n { log fn+1 (z n, z n+1 ) + µ zn f n+1 (z n ) } (31) = max z n {log p(z n+1 z n )p(x n+1 z n+1 ) + µ fn z n (z n )} (32) = log p(x n+1 z n+1 ) + max{log p(z n+1 z n ) + µ fn z z n (z n )} (33) n }{{} ω(z n) 18 / 23

Viterbi Algorithm (Max-Sum) Initialization Forward recursion ω(z 0 ) = log p(z 0 ) (34) ω(z n+1 ) = log p(x n+1 z n+1 ) + max z n {log p(z n+1 z n ) + ω(z n )} (35) Store (most probable previous state) φ(z n ) = argmax z n 1 {log p(x n z n ) + log p(z n z n 1 ) + ω(z n )} (36) At the root node z max N = argmax z N µ fn z N (z N ) (37) Backtracking 19 / 23

Intuition Number of possible paths through trellis O(K N ) Figure: Two paths in a fragment of a trellis Store only the most probable incoming path for each state in each step Backtracking 20 / 23

Example (Wikipedia) Figure: Doctor in a village. Observed sequence {normal, cold, dizzly} 21 / 23

Summary Sum-product/ Forward-backward algorithm: Compute state posterior in a HMM Max-sum/ Max-product/ Viterbi algorithm: Compute most likely sequence of hidden states in a HMM Costs linear instead of exponential in N. Not covered: Baum-Welch algorithm: Estimate parameters in a HMM 22 / 23

References Bishop, PRML, Chapters 8.4 and 13. Further reading Factor graphs (with a slightly improved notation and emphasis on linear dynamical systems): Loeliger et al. (2007), The Factor Graph Approach to Model-Based Signal Processing 23 / 23