Hidden Markov Models. Hosein Mohimani GHC7717

Similar documents
HIDDEN MARKOV MODELS

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

11.3 Decoding Algorithm

Hidden Markov Models

Hidden Markov Models 1

Hidden Markov Models

HMM: Parameter Estimation

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models for biological sequence analysis

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. x 1 x 2 x 3 x K

CSCE 471/871 Lecture 3: Markov Chains and

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models for biological sequence analysis I

Stephen Scott.

CS711008Z Algorithm Design and Analysis

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. x 1 x 2 x 3 x K

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Bioinformatics: Biology X

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models. Ron Shamir, CG 08

Advanced Data Science

Hidden Markov Models (I)

Hidden Markov Models

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Computational Genomics and Molecular Biology, Fall

6 Markov Chains and Hidden Markov Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Statistical Methods for NLP

Hidden Markov Models. x 1 x 2 x 3 x N

8: Hidden Markov Models

Brief Introduction of Machine Learning Techniques for Content Analysis

O 3 O 4 O 5. q 3. q 4. Transition

Markov Chains and Hidden Markov Models. = stochastic, generative models

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

p(d θ ) l(θ ) 1.2 x x x

Bioinformatics: Biology X

8: Hidden Markov Models

Statistical Sequence Recognition and Training: An Introduction to HMMs

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models. Terminology, Representation and Basic Problems

CS 361: Probability & Statistics

L23: hidden Markov models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Review of probabilities

MACHINE LEARNING 2 UGM,HMMS Lecture 7

1 What is a hidden Markov model?

Expectation-Maximization (EM) algorithm

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Methods. Algorithms and Implementation

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lecture 11: Hidden Markov Models

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

STA 414/2104: Machine Learning

Statistical NLP: Hidden Markov Models. Updated 12/15

Hidden Markov Models (HMMs)

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Hidden Markov Models. Terminology and Basic Algorithms

Linear Dynamical Systems

Pair Hidden Markov Models

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Lecture 5: December 13, 2001

Machine Learning for natural language processing

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

Basic math for biology

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Assignment 4: Solutions

EECS730: Introduction to Bioinformatics

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Dynamic Approaches: The Hidden Markov Model

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Automatic Speech Recognition (CS753)

order is number of previous outputs

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Lecture 12: Algorithms for HMMs

Advanced Algorithms. Lecture Notes for April 5, 2016 Dynamic programming, continued (HMMs); Iterative improvement Bernard Moret

Introduction to Hidden Markov Models (HMMs)

Transcription:

Hidden Markov Models Hosein Mohimani GHC7717 hoseinm@andrew.cmu.edu

Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or a biased coin (head with probability 3/4) Can you guess which of the following sequences are generated by a fair coin, and which ones with a biased one? 1 HTHHHHHTHTHTTTTHHTHTTTHHTTTHHHTHHTTTHTTT 2 THTHHHTHHHHHTHTTHHHTHHHHHHHHHHHTHHHHHHH 3 HHTHTTHHHTTHHTTTHHHHHHHTHTHHHTHHHHHHHHH 4 HTHTHTTTHTTHHHHTHHTHTTHHHTHTTTTHTHTHTHHH

Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or a biased coin (head with probability 3/4) 1 HTHHHHHTHTHTTTTHHTHTTTHHTTTHHHTHHTTTHTTT 19/40 2 THTHHHTHHHHHTHTTHHHTHHHHHHHHHHHTHHHHHHH 31/40 3 HHTHTTHHHTTHHTTTHHHHHHHTHTHHHTHHHHHHHHH 28/40 4 HTHTHTTTHTTHHHHTHHTHTTHHHTHTTTTHTHTHTHHH 21/40

Fair et Casino Problem The dealer changes the coin occasionally HTHTTHHTHTHTTTTHHTHTTTHHTTTHHHTHHTTTHTTTTHTHH HTHHHHHTHTTHHHTHHHHHHHHHHHTHHHHHHHHHTHTTHHH TTHHTTTHHHHHHHTHTHHHTHHHHHHHHHHTHTHTTTHTTHHH HTHHTHTTHHHTHTTTTHTHTHTHHH Can you guess which parts the fair coin was used, and in which parts the iased coin was used?

Fair et Casino Problem The dealer changes the coin occasionally HTHTTHHTHTHTTTTHHTHTTTHHTTTHHHTHHTTTHTTTTHTHH HTHHHHHTHTTHHHTHHHHHHHHHHHTHHHHHHHHHTHTTHHH TTHHTTTHHHHHHHTHTHHHTHHHHHHHHHHTHTHTTTHTTHHH HTHHTHTTHHHTHTTTTHTHTHTHHH FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FF FFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Fair et Casino Problem : multiple hypothesis testing Is the following sequence coming from a fair (1/2,1/2) coin, or a biased (3/4,1/4) coin? HTHHHHHTHTHTTTTHHTHTTTHHTTTHHHTHHTTTHTTT 19/40 Hypothesis 1 : Fair coin (1/2,1/2) Hypothesis 2 : iased coin (3/4,1/4) Which of the two hypothesis is more likely to happen?

Lets compue the probabilities Consider sequence HTH What is the likelihood of HTH under fair coin hypothesis? What is the likelihood of HTH under biased coin hypothesis? Which of the two hypothesis is more likely?

Lets compute the probabilities P(HTH fair) = P(H fair).p(t fair).p(h fair) = 1/2.1/2.1/2 = 1/8 P(HTH biased) = P(H biased).p(t biased).p(h biased) = ¾.1/4. ¾ = 9/64 iased coin hypothesis is slightly more likely

Multiple Hypothesis Testing seq=hthhhhhththtttthhthttthhttthhhthhttthttt What is the likelihood of this sequence under each hypothesis? Under fair hypothesis Log-likelihood P(seq fair) = ( " # )40 =9.10-13 log P(seq fair) = 40.log(1/2) = -12.04 Under biased hypothesis P(seq biased) = ( % & )19 ( " & )21 =9.10-16 log P(seq biased) = 19.log(3/4) + 21.log(1/4) = -15.01 Fair hypothesis is more likely than biased hypothesis

Conditional Probabilities. P x F = -/" P, (x - ) = " # 0. P x = -/" P 2 (x - ) = " % 4 & 034 = %4 & 4 & 0 (k is the number of heads in x) P x F > P x dealer probably use a fair coin k < 9 :;< = % P x F < P x dealer probably use a biased coin k > 9 :;< = % A(,) Log odd-ratio : log # = 9 log A E ( F ) A( 2) -/" # = n k log A G ( F ) # 3

Markov Model of Fair et Casino F F F F F F F H H T T H T H H T H H H H T T We observe a sequence coin tosses HHTTHTHHTHHHHTT How can we predict when the coin was fair/biased? e.g. the most likely sequence FFFFFFF that resulted in this outcome?

Fair et Casino Problem Given a sequence of coin tosses, determine when a dealer used a fair coin and when he used a biased coin Input : A sequence x=x 1 x 2 x n of coin tosses (either H or T) made by two possible coins (F or ). Output : A sequence π = π 1 π 2 π n with each π i being either F or indicating that x i is the result of tossing the fair or biased coin, respectively.

Is this even possible? In practice, any sequence π = π 1 π 2 π n can generate any outcome x=x 1 x 2 x n We are interested in the most probable sequence π that explains the outcome x Fair coin : P F (H)=P F (T)=1/2 iased coin : P F (H)=3/4, P F (T)=1/4

Markov Model of Fair et Casino F F F F F F F H H T T H T H H T H H H H T T F/ F/ F/ F/ F/ F/ F/ H/T H/T H/T H/T H/T H/T H/T

Hidden Markov Model of Fair et Casino Hidden States F/ F/ F/ F/ F/ F/ F/ Observed States H/T H/T H/T H/T H/T H/T H/T F 0.9 0.1 F F 1/2 1/2 H 0.1 0.9 3/4 1/4 T Transition Probabilities Emission Probabilities

Hidden Markov Model An abstract machine with ability to produce output by coin tossing At the beginning, the machine is in one of the k hidden states (Fair or iased in case of Fair et Casino) At each step, HMM makes two decisions : What state will I move into (Fair or iased) What symbol from an alphabet Σ will I emit (Head or Tail) The observer see emitted symbols, but not HMM states The goal of the observer is to infer the most likely state by analyzing the emitted symbols.

HMM : formal definition Formally, an HMM M is defined by An alphabet of emitted symbols Σ A set of hidden states Q A Q x Q matrix of transition probabilities A=(a kl ), where a kl is the probability of changing to state k after being in state l A Q x Σ matrix of emission probabilities E=e k (b), describing the probability of emitting the symbol b during a step in which the HMM is in step k

ack to the casino fair bet problem Each row of the matrix A is a state die with Q sides Each row of the matrix E is a state die with Σ sides The casino fair bet problem M(Σ,Q,A,E) Σ = {0,1}, corresponding to tails (0) or heads (1) Q = {F,}, corresponding to fair (F) and biased () coin a FF =a =0.9, a F =a F =0.1 e F (0)=1/2, e F (1)=1/2, e (0)=1/4, e (1)=3/4, F 0.9 0.1 0.1 0.9 F F 1/2 1/2 3/4 1/4 H T

Casino Fair et problem : example

Decoding Problem Goal : Find an optimal hidden path of states given observations. Input : Sequence of observations x=x 1... x n generated by an HMM M(P, Q, A, Σ). Output: A path that maximizes P(x π) over all possible paths π=π 1 π 2 π n.

Viterbi Decoding Developed by Andrew Viterbi, 1967 Andrew Viterbi, Electrical Engineering Department, USC Hidden states Observed states x 1 x 2 x 3 x n

Viterbi Decoding Given the observed states, what is the most likely path in the hidden state? Hidden states x 1 x 2 x 3 x n Observed states

Viterbi Decoding Hidden states s k,i Observed states x 1 x 2 x 3 x n s k,i probability of the most likely path to state k Q at step 1 i n, given the observation x 1 x 2 x i

Viterbi Decoding s 1,1 s 1,2 s 1,3 s 1,4 s 1,n s 2,1 s 2,2 s 2,3 s 2,4 s 2,n s m,1 s m,2 s m,3 s m,4 s m,n s k,i probability of the most likely path to state k Q at step 1 i n, given the observation x 1 x 2 x i

Viterbi Decoding s 1,i a 1l s 2,i a 2l s l,i+1 s m,i a ml x i+1 e l (x i+1) The most likely path to l,i+1 passes from k,i for some 1 k m The likelihood of the optimal path to l,i+1 equals the maximum of the likelihood of the path k,i, times the transition probability from state k to state l, and the emission probability e l (x i+1 ) for 1 k m s l,i+1 = max kεq {s k,i.a kl.e l (x i+1 )}

Viterbi Decoding Hidden states x 1 x 2 x 3 x n Observed states probability of the optimal path at last step n, is the maximum s k,n for k Q

Finding the optimal path y recording the maximal state at each step, we can compute the hidden states π 1 π 2 π n that maximize p(x 1 x 2 x n π 1 π 2 π n )

Forward & ackward algorithms Given observation x and stat k, what is the probability P(π - =k x) that the HMM was at state k at time i? P(x π - =k) = P(x 1,, xn π - =k) = P(x 1,, xi π - =k) P(x i+1,, xn π - =k) = f k (i)b k (i) f k,i the probability of emitting prefix x 1,,x i given π - =k. P(π - =k x) is affected not only by the values x 1,,x i but also x i+1,,x n b k,i the probability of emitting prefix x i+1,,x n given π - =k. f k,i and b k,i using an iterative method very similar to what we saw in the previous slides.

Forward & ackward algorithm f k,i can be computed using the following iterations : f k,i = ^_` f l,i 1. alk.ek (xi) Maximization replaced by summation b k,i can be computed using the following iterations : b k,i = b l,i+1.akl.el (xi +1) ^_`

Complexity Viterbi decoding works in n Q 2 time ecause each step involves multiplication, we might end up with overflows (very small/large numbers) To avoid overflows, we can switch to logarithmic space by defining S k,i =log s k,i S l,i+1 = max kεq {S k,i +log(a kl )+loge l (x i+1 )}

Estimating HMM parameters How to estimate parameters from transition probabilities Q and and emission probabilities Σ? x 1,, x m : training sequence Find maximum likelihood Σ and Q from x 1,, x m max b,` c d/" P( x j Σ, Q) Optimization in multi-dimensional parameter space Σ, Q

Supervised & unsupervised estimation If for each x i, we now the corresponding hidden state π i, the problem is easy If π i are unknown, the problem is more difficult.

Parameter estimation in Supervised case A kl : the number of transitions from state k to state l E k (b) : the number of times b emitted from state k a f^ = A f^ h_` A fh e f (b) = E f(b) m_b E f (σ)

aum-welch iterations (unsupervised case) Guess initial state labels π i for each input x i and perform the following iterations : Estimate Θ from labels π i Compute optimal states π i from Θ and x i using Viterbi decoding