Lecture 9. Intro to Hidden Markov Models (finish up)

Similar documents
Hidden Markov Models. x 1 x 2 x 3 x K

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

HMM: Parameter Estimation

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Introduction to Machine Learning CMU-10701

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Three classic HMM problems

HIDDEN MARKOV MODELS

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. x 1 x 2 x 3 x K

Stephen Scott.

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models

Hidden Markov Models for biological sequence analysis

Advanced Data Science

Hidden Markov Models (I)

CS711008Z Algorithm Design and Analysis

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models for biological sequence analysis I

Today s Lecture: HMMs

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

STA 414/2104: Machine Learning

Stephen Scott.

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Computational Genomics and Molecular Biology, Fall

11.3 Decoding Algorithm

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models,99,100! Markov, here I come!

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

6 Markov Chains and Hidden Markov Models

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Ron Shamir, CG 08

Statistical NLP: Hidden Markov Models. Updated 12/15

order is number of previous outputs

Hidden Markov Models

HMM : Viterbi algorithm - a toy example

EECS730: Introduction to Bioinformatics

HMMs and biological sequence analysis

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Linear Dynamical Systems (Kalman filter)

L23: hidden Markov models

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models 1

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Basic math for biology

Hidden Markov Models and Gaussian Mixture Models

O 3 O 4 O 5. q 3. q 4. Transition

Multiple Sequence Alignment using Profile HMM

Lecture 12: Algorithms for HMMs

Statistical Methods for NLP

8: Hidden Markov Models

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Computational Genomics and Molecular Biology, Fall

Graphical Models Seminar

Lecture 11: Hidden Markov Models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Hidden Markov Methods. Algorithms and Implementation

Lecture 7 Sequence analysis. Hidden Markov Models

CS 7180: Behavioral Modeling and Decision- making in AI

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

1 What is a hidden Markov model?

HMM : Viterbi algorithm - a toy example

Automatic Speech Recognition (CS753)

Dynamic Approaches: The Hidden Markov Model

Lecture 12: Algorithms for HMMs

Markov chains and Hidden Markov Models

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Statistical Methods for NLP

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Statistical Sequence Recognition and Training: An Introduction to HMMs

Parametric Models Part III: Hidden Markov Models

Hidden Markov Models. Terminology, Representation and Basic Problems

Chapter 4: Hidden Markov Models

Hidden Markov Models (HMMs)

Hidden Markov models

Pair Hidden Markov Models

Directed Probabilistic Graphical Models CMSC 678 UMBC

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Transcription:

Lecture 9 Intro to Hidden Markov Models (finish up)

Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is the probability state i emits character a Initial distribution vector π i

Assumptions Markov assumption States depend on previous states Stationary assumption Transition probabilities are independent of time ( memoryless ) Output independence Observations are independent of previous observations

Example: dishonest casino From Durbin

Cases Example Observations Hidden state Football Plays Coach Text Words Shakespeare / Casino monkey Rolled numbers Fair/loaded DNA ACGT Coding/not

In class (re)review Suppose in instead of a dishonest casino we used fair and loaded coins. Just like before the player shifts between fair and loaded states. How could we model this?

Heads Tails 0.9 fair start 0.5 0.5 0.1 0.1 loaded Heads 0.9 Tails

Basic problems Evaluation What is the probability that the observations were generated by a given model? Decoding Given a model and a sequence of observations, what is the most likely state observations? Learning: Given a model and a sequence of observations, how should we modify the model parameters to maximize p{observe model)

Decoding Text: Shakespeare or Monkey? Case 1: Fehwufhweuromeojulietpoisonjigjreijge Case 2: mmmmbananammmmmmmbananammm

Forward algorithm

Question: Suppose the sequence of our game is: HHHTHHHTTHHTH? Heads Tails 0.9 fair start 0.5 0.5 0.1 0.1 loaded Heads 0.9 Tails What is the probability of the sequence given the model?

The structure of the Forward algorithm is essentially the same as that of the Viterbi algorithm, except that a maximization operation is replaced by summation.

Solutions Problem Algorithm Complexity Evaluation Forward/ O(TN 2 ) Backward Decoding Viterbi O(TN 2 ) Learning Baum-Welch (EM) O(TN 2 ) T is # timesteps (or observations) N = # states

Observed sequence, hidden path and Viterbi path From Durbin

Learning If state path is known and there are no hidden states, this is easy and involves: Counting how often each parameter is used Normalizing to get probabilities Then treating it just like Markov chain models Harder without knowing state paths Idea: estimate counts by considering every path weighted by its probability

Parameter estimation for HMMs We generally need to estimate transition and emission probabilities a ij and e k (b). We have in hand a set of training examples, that correspond to output from the HMM. Two potential strategies: Estimation when state sequence is known Estimation when paths are unknown

Estimation when state sequence is known Easier than estimation when paths unknown Maximum likelihood estimators are: a kl = l ' Akl A kl ' ek( b) = b' Ek( b) Ek( b') A kl = number of transitions k to l in training data + r kl E k (b) = number of emissions of b from k in training data + r k (b)

Potential problems Maximum likelihood estimators are prone to overfitting For example, states never encountered For this reason, we introduce rkl and rk(b), which reflect prior biases Can be interpreted as parameters of a Bayesian Dirichlet prior.

Estimation when paths are unknown More complex than when paths are known Because we can t use maximum likelihood estimators, we will use an iterative algorithm Baum-Welch

Baum-Welch Algorithm Aka the Forward-Backward algorithm Also an example of an expectation maximization (EM) algorithm Idea: hidden state path is the best that explains a training sequence

Overview More formally, Baum-Welch calculates Akl and Ek(b) as the expected number of times each transition or emission is used. This will use the same Forward and Backward probabilities as posterior decoding. Topic of discussion for Thurs

Drawbacks ML estimators Vulnerable to overfitting if not enough data Estimations can be undefined if never used in training set (so use of pseudocounts) Baum-Welch Many local maximums instead of global maximum can be found, depending on starting values of parameters This problem will be worse for large HMMs

Example from Durbin 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 1: 0.19 2: 0.19 3: 0.23 4: 0.08 5: 0.23 6: 0.06 1: 0.07 2: 0.10 3: 0.10 4: 0.17 5: 0.05 6: 0.52 Note transition probabilities are different from real ones Partly a result of local minima, but its never possible to Estimate parameters exactly

Other methods Durbin also discusses an alternative method called Viterbi training based on the Viterbi algorithm. Does not maximize the true likelihood as a function of model parameters, but rather finds the model from the most probable paths. For this reason it generally does worse than Baum-Welch, but it is widely used.

Topology Some characteristics: number of nodes, alphabet, subset of edges We want to exploit domain knowledge Limits number of states / edges Still expressive enough to model relationships Length distributions (Durbin 3.4)

Silent states States that do not emit symbols B ε Also in other places in HMM

Example NOTRE NOTRE NOTMEI NOTME EM4(R ) = 1 em5(m) = 5

Insertions

http://genome.jouy.inra.fr/doc/genome/suite-logicielle/gcg-11.1/html/figure/hmmessay_2.gif