Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Similar documents
Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Basic math for biology

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

HMM : Viterbi algorithm - a toy example

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

HMM : Viterbi algorithm - a toy example

Introduction to Machine Learning CMU-10701

order is number of previous outputs

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

Lecture 4: State Estimation in Hidden Markov Models (cont.)

O 3 O 4 O 5. q 3. q 4. Transition

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

Markov Chains and Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Lecture 11: Hidden Markov Models

Pair Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

STA 414/2104: Machine Learning

HIDDEN MARKOV MODELS

Advanced Data Science

Linear Dynamical Systems

Hidden Markov Models

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Graphical Models Seminar

Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

COMP90051 Statistical Machine Learning

Hidden Markov models

Hidden Markov Models. Terminology, Representation and Basic Problems

1. Markov models. 1.1 Markov-chain

Hidden Markov Models,99,100! Markov, here I come!

Lecture 9. Intro to Hidden Markov Models (finish up)

Statistical Sequence Recognition and Training: An Introduction to HMMs

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models and Gaussian Mixture Models

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Hidden Markov Models

CSCE 471/871 Lecture 3: Markov Chains and

L23: hidden Markov models

Multiple Sequence Alignment using Profile HMM

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Modelling

Machine Learning 4771

Hidden Markov Models

1 What is a hidden Markov model?

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov models 1

Hidden Markov Models in Language Processing

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Statistical Methods for NLP

Directed Probabilistic Graphical Models CMSC 678 UMBC

Computer Intensive Methods in Mathematical Statistics

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

HMM: Parameter Estimation

Note Set 5: Hidden Markov Models

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

CRF for human beings

Conditional Random Field

Multiscale Systems Engineering Research Group

Lecture 12: Algorithms for HMMs

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Hidden Markov Models. x 1 x 2 x 3 x K

Sequential Supervised Learning

1 Hidden Markov Models

Today s Lecture: HMMs

Hidden Markov Models and Gaussian Mixture Models

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

11.3 Decoding Algorithm

Hidden Markov Models (I)

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Evolutionary Models. Evolutionary Models

Bayesian Machine Learning - Lecture 7

Statistical Processing of Natural Language

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

CS838-1 Advanced NLP: Hidden Markov Models

Graphical Models for Collaborative Filtering

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Computational Genomics and Molecular Biology, Fall

CS532, Winter 2010 Hidden Markov Models

Transcription:

Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26

Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition probability P ij = P(x t+1 = j x t = i) = P(φ t (i) = j) for finitely many states i and j, with initial distribution π 0 (i) = P(x 0 = i). It becomes a Markov chain. Special Meeting 2/26

Hidden Markov Model Hidden It has two processes: (1) the evolution of state is internal and unobservable, but (2) the observation is obtained from each internal state according to the transition probability Q jk = P(y t = k x t = j) Special Meeting 3/26

Hidden Example: DNA Sequence Alignment DNA is composed of an alphabet of four nucleotides, A, C, G, and T, and may have been acquired from the common ancestor. The problem is complicated due to insertions, deletions, and mutations. We can introduce three hidden states, match (M), insertion (I), and deletion (D). Special Meeting 4/26

Hidden HMM for DNA Sequence Alignment A family of models is introduced by parameters (θ 0, θ 1, θ 2, θ 3 ). The initial distribution π 0 has the form π 0 (M) = 1 2θ 0 ; π 0 (I ) = π 0 (D) = θ 0, and the transition probabilities P ij and Q jk are expressed in terms of θ 1, θ 2, and θ 3. Special Meeting 5/26

Hidden DNA Sequence Alignment: A Challenge Two sequences are not aligned: The observed values y 0, y 1,... are determined, and therefore, the sequences are aligned only when the control values, M (Match/mismatch), I (Insert), or D (Delete), are estimated. It seems impossible to pursue the sequence alignment! Special Meeting 6/26

Hidden DNA Sequence Alignment A dynamic programming with indefinite time horizon works for Viterbi Algorithm when it is applied for DNA sequence alignment. In order for memory usage to be effective we set the maximum length of time horizon to be the total length of the two sequences. > source("viterbi.r") > source("dnaprofile.r") > DNA = strsplit(scan("gene57.txt", what="character"), "") > cat(dna[[1]], "\n", DNA[[2]], "\n", sep="") > th = c(0.1, 0.1, 0.1, 0.1) > out = viterbi(3, length(dna[[1]])+length(dna[[2]]), k0, cc) > aligned(out[[1]]) Special Meeting 7/26

Hidden Conditional Distribution Suppose that a pair (X, Y ) of discrete random variables has a joint frequency function p(x, y). Then we can introduce the conditional frequency function p(x, y) p(x y) = p(y) Here the marginal frequency function p(y) is known as the normalizing constant in the sense that it guarantees p(x y) = 1 x Since p(x y) is proportional to p(x, y), we simply write p(x y) p(x, y) Special Meeting 8/26

Hidden The joint distribution of the evolution x 0,..., x T, and the observation y 0,..., y T p(x 0,..., x T, y 0,..., y T ) = π 0 (x)p x0,x 1 Q x0,y 0 P xt,y T Q xt,y T is proportional to the conditional distribution of the evolution given y 0,..., y T. p(x 0,..., x T y 0,..., y T ) Special Meeting 9/26

Hidden : Filtering Recursion The filtering problem is to compute the conditional distribution of the internal state x T π T T (i) = p(x T = i y 0,..., y T ) = p(x 0,..., x T y 0,..., y T ) x 0,...,x T 1 given y 0,..., y T. It is formulated by the forward algorithm. 1. π 0 0 (i) π 0 (i)q i,y0 2. π t t (j) i π t 1 t 1 (i)p i,j Q j,yt for t = 1,..., T. Special Meeting 10/26

Hidden : Smoothing Recursion Let t < T. The smoothing problem is to compute the conditional distribution of the internal state x t π t T (i) = p(x t = i y 0,..., y T ) = p(x 0,..., x t y 0,..., y T ) x 0,...,x t 1,x t+1,...,x T given y 0,..., y T. It combines π t t (i) forward with the backward algorithm. 1. β T T (j) = 1 2. β k 1 T (i) = j Then it formulates β k T (j)p i,j Q j,yk for k = T,..., t + 1. π t T (i) π t t (i)β t T (i) Special Meeting 11/26

Hidden Baum-Welch Algorithm Let t < T. The bivariate smoothing problem is to compute the transition probability λ t T (i, j) = p(x t = i, x t+1 = j y 0,..., y T ) conditioned upon y 0,..., y T. The forward-backward algorithm can be applied to obtain λ t T (i, j) π t t (i)p i,j Q j,yt+1 β t+1 T (j) The summation gives the univariate smoothing π t T (i) = j λ t T (i, j) Special Meeting 12/26

Hidden Maximum Likelihood for The transition probabilities P ij are considered as model parameters. Having observed the evolution x 0, x 1,..., x T, we can infer the parameters by maximizing the likelihood L = P x0,x 1 P x1,x 2 P xt 1,x T This maximum likelihood estimate (MLE) is proportional to the occurrence count P ij T I (x t 1 = i, x t = j) t=1 where I (x t 1 = i, x t = j) = 1 or 0, indicating the occurrence of transition. Special Meeting 13/26

Hidden Model Inference for HMM The initial probabilities π 0 (i), and the transition probabilities P ij and Q jk becomes model parameters. Given the observation y 0,..., y T, L(θ) = π 0 (x 0 )P x0,x 1 Q x0,y 0 P xt 1,x T Q xt,y T x 0,...,x T is the likelihood. The evolution x 0,..., x T is not observed, and called the latent variables. It is not tractable to maximize the likelihood L with probability constraints for π 0 (i) (i.e., π 0 (i) 0 and i π 0(i) = 1) as well as for P ij, and Q jk, and find their estimate analytically. Special Meeting 14/26

Hidden Maximization with latent variables If the hidden transitions x 0,..., x T are assumed to be known, the maximum likelihood estimate for P ij can be formulated with the occurrence of hidden transitions. Then based on the current estimate of π 0 (i), P ij, and Q jk, it can be replaced with the conditional expectation of occurrence count as follows. [ T ] P ij E I (x t 1 = i, x t = j) y 0,..., y T = t=1 T λ t 1 T (i, j) t=1 where the conditional probabilities λ t 1 T (i, j) can be obtained via Baum-Welch algorithm. Special Meeting 15/26

Baum-Welch Training Hidden 1. Estimate Internal States: Given the current estimate π 0 (i), P ij, and Q jk, compute λ t 1 T (i, j) by Baum-Welch algorithm 2. Update Model Parameters: For example, π 0 (i) π 0 T (i) P ij T 1 t=0 Q jk t:y t=k λ t T (i, j) π t T (j) 3. Repeat the above steps until it converges. Special Meeting 16/26

Hidden Dynamical System with Control Knowing the state x t at time t, the control value u t is used to determine x t+1. Then the evolution of states is governed by x t+1 = φ t (x t, u t ) Special Meeting 17/26

Hidden Optimal Control Problem Given the initial state x 0 and the control sequence u 0,..., u T, we obtain the trajectory x 0,..., x T. Then the real value V = c 0 (x 0, u 0 ) + + c T (x T, u T ) + k T (x T ) is defined over the horizon from t = 0 to T, and viewed as the running and the terminal reward (or cost). The optimal control problem is to find the control sequence u 0,..., u T to maximize the reward V (or, to minimize the cost). Special Meeting 18/26

Hidden Starting from the terminal reward k T (x T ), we can work backward and find the optimal value V 0 (x 0 ). Then from any time t on the remaining control sequence becomes optimal. 1. V T +1 (i) = k T (i) 2. Compute backward for t = T,..., 0, V t (i) = max u t [c t (i, u t ) + V t+1 (φ t (i, u t ))] ψ t (i) = argmax u t [c t (i, u t ) + V t+1 (φ t (i, u t ))] 3. Set u 0 = ψ 0(x 0 ), and calculate u t = ψ t (φ t (x t 1, u t 1 )) forward for t = 1,..., T. Special Meeting 19/26

Hidden Log Likelihood for Optimal Decoding Assume that the model parameters π 0 (i), P ij, and Q jk are known. Given the observation y 0,..., y T, the log likelihood becomes where V = k 0 (x 0 ) + c 1 (x 0, x 1 ) + + c T (x T 1, x T ) k 0 (x 0 ) = log (π 0 (x 0 )Q x0,y 0 ) c t (x t 1, x t ) = log ( P xt 1,x t Q xt,y t ), t = 1,..., T The MLE problem is to obtain the optimal decoding x 0,..., x T. Special Meeting 20/26

Hidden Viterbi Decoding Algorithm Starting from the initial cost k 0 (x 0 ), we can work forward and find the optimal value V T (x T ). 1. V 0 (i) = k 0 (i) 2. Compute forward for t = 1,..., T, V t (j) = max x t 1 [V t 1 (x t 1 ) + c t (x t 1, j)] ψ t (j) = argmax [V t 1 (x t 1 ) + c t (x t 1, j)] x t 1 3. Set x T = argmax x T V T (x T ), and calculate x t 1 = ψ t(x t ) backward for t = T,..., 1. Special Meeting 21/26

Viterbi Training Hidden 1. Estimate Internal States: Given the current estimate π 0 (i), P ij, and Q jk, decode x 0,..., x T by Viterbi algorithm. 2. Update Model Parameters: For example, P ij Q jk T I (x t 1 = i, x t = j) t=1 T I (x t = j, y t = k) t=0 3. Repeat the above steps until it converges. Special Meeting 22/26

Hidden DNA Sequence Alignment: Position State Setting Two sequences are dynamically aligned: Starting from the empty state (0, 0) of aligned sequences, we choose the control value M (Match/mismatch), I (Insert), or D (Delete) to add letters to the aligned sequences. Special Meeting 23/26

Hidden DNA Sequence Alignment: Dynamical System It introduces the dynamical system: Given the current state x t = (i, j) it updates with the control value u t = M, I, or D (i + 1, j + 1) if u t = M; x t+1 = φ t (x t, u t ) = (i, j + 1) if u t = I ; (i + 1, j) if u t = D Special Meeting 24/26

Hidden Needleman-Wunsch Algorithm At the state x = (i, j) the reward function c(x, u) is given by { 1 if u = M and the pair at (i, j) match; c(x, u) = 0 otherwise. When the two sequences have N and L letters, x = (N, j) or (i, L) becomes the boundary state with terminal cost k(x) = 0. Then the dynamical programming principle applies with indefinite time horizon. Special Meeting 25/26

Hidden Comparison with Needleman-Wunsch algorithm If the reward function is changed, the algorithm can be adjusted for Needleman-Wunsch algorithm. Note that it is not an HMM, and that it does not require the prior estimate of parameter θ. Compare the outputs with Viterbi decoding with various prior estimates for θ. > source("needleman.r") > DNA = strsplit(scan("gene57.txt", what="character"), "") > cat(dna[[1]], "\n", DNA[[2]], "\n", sep="") > out = viterbi(3, length(dna[[1]])+length(dna[[2]]), k0, cc) > aligned(out[[1]]) Special Meeting 26/26