Hidden Markov Models. Terminology and Basic Algorithms

Similar documents
Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Graphical Models Seminar

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Dynamic Approaches: The Hidden Markov Model

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Linear Dynamical Systems

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE521 Lecture 19 HMM cont. Inference in HMM

Introduction to Machine Learning CMU-10701

Note Set 5: Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models. Three classic HMM problems

CS711008Z Algorithm Design and Analysis

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Statistical Machine Learning from Data

Data-Intensive Computing with MapReduce

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Master 2 Informatique Probabilistic Learning and Data Analysis

Parametric Models Part III: Hidden Markov Models

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

order is number of previous outputs

L23: hidden Markov models

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models

Basic math for biology

Linear Dynamical Systems (Kalman filter)

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Genome 373: Hidden Markov Models II. Doug Fowler

Lecture 9. Intro to Hidden Markov Models (finish up)

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Machine Learning for natural language processing

CS 188: Artificial Intelligence Fall 2011

Sequential Data and Markov Models

Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Lecture 11: Hidden Markov Models

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Introduction to Artificial Intelligence (AI)

Multiscale Systems Engineering Research Group

Sequences and Information

Conditional Random Field

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

HIDDEN MARKOV MODELS

Hidden Markov models 1

Hidden Markov Models

Advanced Data Science

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Hidden Markov Models 1

Math 350: An exploration of HMMs through doodles.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

O 3 O 4 O 5. q 3. q 4. Transition

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Factor Graphs and Message Passing Algorithms Part 1: Introduction

CS838-1 Advanced NLP: Hidden Markov Models

Lecture 3: ASR: HMMs, Forward, Viterbi

Computational Genomics and Molecular Biology, Fall

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models

8: Hidden Markov Models

Today s Lecture: HMMs

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Bayesian Networks BY: MOHAMAD ALSABBAGH

An Introduction to Bioinformatics Algorithms Hidden Markov Models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models (I)

A gentle introduction to Hidden Markov Models

Lecture 3: Markov chains.

Sequence Labeling: HMMs & Structured Perceptron

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

CS 5522: Artificial Intelligence II

CSEP 573: Artificial Intelligence

Hidden Markov Models

CSE 473: Artificial Intelligence

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Pattern Recognition and Machine Learning

COMP90051 Statistical Machine Learning

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

EECS730: Introduction to Bioinformatics

Bioinformatics 2 - Lecture 4

Transcription:

Hidden Markov Models Terminology and Basic Algorithms

What is machine learning? From http://en.wikipedia.org/wiki/machine_learning Machine learning, a branch of artificial intelligence, is about the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and nonspam folders. [...] Machine learning focuses on prediction, based on known properties learned from the training data.

What is machine learning? Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be learned. To us, the core of machine learning (in bioinformatics) boils down to three things: Building a mathematical model that captures some desired structure of the data that you are working on. Training the model (i.e. set the parameters of the model) based on existing data to optimize it as well as we can. Making predictions using the model on new data.

Motivation We make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent and identically distributed (iid)... but this assumption is not always the best, fx (1) measurements of weather patterns, (2) daily values of stocks, (3) acoustic features in successive time frames used for speech recognition, (4) the composition of texts, (5) the composition of DNA, or...

Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

The model, i.e. p(xn xn-1): A sequence of observations: Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

The model, i.e. p(xn xn-1): A sequence of observations: Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. Extension A higher Markov chain then the chain of observations is aorder 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

Hidden Markov Models What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable? Latent values H H L L H Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Hidden Markov Models What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable? Latent values Markov Model H Hidden Markov Model H L L H Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Computational problems Hidden Markov Models Determine the likelihood of the sequence of observations What if the n'th observation in a chain of observations is influenced Predict the next observation in the by a corresponding latent (i.e. hidden) variable? sequence of observations Latent values Markov Model Find the most likely underlying H H L L H explanation of the sequence of observation (i.e. most likely sequence of hidden values). Hidden Markov Model Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Hidden Markov Models What if the n'th observation in a chain of observations is influenced The predictive distribution by a corresponding latent (i.e. hidden) variable? p(xn+1 x1,...,xn) Latent values Markov Model H Lon alll for observation xn+1 can be shownhto depend previous observations, i.e. the sequence of observations is not a Markov chain of any order... Hidden Markov Model Observations H If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Hidden Markov Models What if the n'th observation in a chain of observations is influenced Thevariable? joint distribution by a corresponding latent Observations Markov Model H Hidden Markov Model H L L H Latent values If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Hidden Markov Models Emission probabilities What if the n'th observation in a chain of observations is influenced Transition probabilities Thevariable? joint distribution by a corresponding latent Observations Markov Model H Hidden Markov Model H L L H Latent values If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

Transition probabilities Notation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

Transition probabilities Notation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

Transition Probabilities The transition probabilities: Notation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

State transition diagram Transition Probabilities Notation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state If the observed values xn are discrete (e.g. D symbols), the emission probabilities Ф is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable...

Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state If the observed values xn are discrete (e.g. D symbols), the emission probabilities Ф is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable...

Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state znk = 1 iff the n'th latent variable in the If the observed values xn are discrete (e.g. D symbols), the emission sequence is in state k, otherwise it is probabilities Ф is a KxD table of probabilities which for each of the K 0, i.e. the product just picks the states specifies the probability of emitting each observable... emission probabilities corresponding to state k...

HMM joint probability distribution Observables: Latent states: Model parameters: If A and Ф are the same for all n then the HMM is homogeneous

HMM joint probability distribution Observables: Latent states: Model parameters: If A and Ф are the same for all n then the HMM is homogeneous

HMMs as a generative model A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state... Model M: A run follows a sequence of states: H H L L H And emits a sequence of symbols:

HMMs as a generative model A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state... Model M: A run follows a sequence of states: H H L L H And emits a sequence of symbols: End A special End-state can be added to generate finite output

Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation

Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation

Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation The sum has KN terms, but it can be computed in O(K2N) time...

The forward-backward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn α(znk) k 1 n N

The forward-backward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn

The forward-backward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Using α(zn) and β(zn) we get the likelihood of the observations as:

The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn

The α-recursion

The α-recursion

The α-recursion Prod. rule Using HMM Prod. rule Sum rule Prod. rule Using HMM Prod. rule

The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion: α(znk) k 1 n N

The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion: Basis:

The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn

The β-recursion

The β-recursion Sum rule Prod. rule Using HMM Using HMM

The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion: β(znk) k 1 n N

The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion: Basis:

The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation

Predicting the next observation

Predicting the next observation Prod. rule and HMM Sum rule Prod. rule and HMM Prod. rule

Predicting the next observation Prod. rule and HMM Sum rule Prod. rule and HMM Prod. rule

Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation The Viterbi algorithm: Finds the most probable sequence of states generating the observations...

The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Intuition: Find the longest path from column 1 to column n, where length is its total probability, i.e. the probability of the transitions and emissions along the path... h

The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion: ω(znk) k 1 n N

The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion: Basis: The path itself can be retrieve in time O(KN) by backtracking Takes time O(K2N) and space O(KN) using memorization

Another kind of decoding α(zn) is the joint probability of observing x1,...,xn and being in state zn β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Using α(zn) and β(zn) we can also find the most likely state to be in the n'th step: Finding z*1,...,z*n is called posterior decoding.

Another kind of decoding Using α(zn) and β(zn) we can also find the most likely state to be in the n'th step: Finding z*1,...,z*n is called posterior decoding.

Viterbi vs. Posterior decoding A sequence of states z1,...,zn where P(x1,...,xN, z1,...,zn) > 0 is a legal (or syntactically correct) decoding of X. Viterbi finds the most likely syntactically correct decoding of X. What does Posterior decoding find? Does it always find a syntactically correct decoding of X? Using α(zn) and β(zn) we can also find the most likely state to be in the n'th step: Finding z*1,...,z*n is called posterior decoding.

Summary Introduced hidden Markov models (HMMs) The forward-backward algorithms for determining the likelihood of a sequence of observations, and predicting the next observation in a sequence of observations. The Viterbi-algorithm for finding the most likely underlying explanation (sequence of latent states) of a sequence of observation Next: How to implement the basic algorithms (forward, backward, and Viterbi) in a numerically sound manner.