Applications of Hidden Markov Models

Similar documents
Hidden Markov Models Part 2: Algorithms

STA 414/2104: Machine Learning

Pair Hidden Markov Models

p(d θ ) l(θ ) 1.2 x x x

EECS730: Introduction to Bioinformatics

Computational Genomics and Molecular Biology, Fall

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Log-Linear Models, MEMMs, and CRFs

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Evolutionary Models. Evolutionary Models

STA 4273H: Statistical Machine Learning

Multiple Sequence Alignment using Profile HMM

Hidden Markov Models: All the Glorious Gory Details

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

A brief introduction to Conditional Random Fields

Today s Lecture: HMMs

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Graphical models for part of speech tagging

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Hidden Markov Models. x 1 x 2 x 3 x K

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Design and Implementation of Speech Recognition Systems

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Introduction to Machine Learning Midterm, Tues April 8

Lecture 9. Intro to Hidden Markov Models (finish up)

Dynamic Approaches: The Hidden Markov Model

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Statistical Machine Learning from Data

Pattern Recognition and Machine Learning

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Sean Escola. Center for Theoretical Neuroscience

Basic math for biology

Machine Learning (CSE 446): Neural Networks

Outline of Today s Lecture

Restricted Boltzmann Machines for Collaborative Filtering

Machine Learning for Structured Prediction

Lecture 7 Sequence analysis. Hidden Markov Models

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Stephen Scott.

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Lecture 11: Hidden Markov Models

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Lecture 21: Spectral Learning for Graphical Models

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models: Maxing and Summing

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning for natural language processing

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

16.1 L.P. Duality Applied to the Minimax Theorem

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Hidden Markov Models

CS 7180: Behavioral Modeling and Decision- making in AI

Lecture 6: Gaussian Mixture Models (GMM)

Hidden Markov Models. x 1 x 2 x 3 x K

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

The Expectation-Maximization Algorithm

COMP90051 Statistical Machine Learning

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Hidden Markov Models. Terminology, Representation and Basic Problems

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

1 Kernel methods & optimization

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

HMMs and biological sequence analysis

3 : Representation of Undirected GM

Jordan normal form notes (version date: 11/21/07)

Pairwise sequence alignment and pair hidden Markov models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Data-Intensive Computing with MapReduce

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Conditional Random Field

Automatic Speech Recognition (CS753)

A minimalist s exposition of EM

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Methods for NLP

Introduction to Restricted Boltzmann Machines

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:

Statistical Methods for NLP

Implementation of EM algorithm in HMM training. Jens Lagergren

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Math 350: An exploration of HMMs through doodles.

EM with Features. Nov. 19, Sunday, November 24, 13

Expectation Maximization (EM)

Transcription:

18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation Recall our notation for Hidden Markov Models: T (i, j) = Pr[i j], the probability of transitioning from state i to state j. E(i, x) = Pr[x emitted at state i]. The starting distribution over states is π. D x = diag(e(:, x)), that is, D x is a square matrix with E(i, x) as entry (i, i), and zeros elsewhere. Using this notation, the probability of a certain sequence of symbols x 1,..., x n being output from the chain is: Pr[x 1,..., x n ] = πd x1 T T D xn 1. This can be split into two important auxiliary quantities that frequently appear: f(k, i) = Pr[x 1,..., x k, s k = i), the probability that symbols x 1,..., x k are output and the Markov chain is at state i at time k. As a vector over all states, f(k, :) = πd T T D xk. x 1 b(k, i) = Pr[x k+1,..., x n, s k = i], the probability that symbols x k+1,..., x n are output when the Markov chain is at state i at time k. As a vector over all states, b(k, :) = T D T D xn 1. x k+1 Thus we get that Pr[x 1,..., x n ] = f(k, i)b(k, i). i states Using these quantities, we can compute the probability that, given a sequence x 1,..., x n of output symbols, the Markov chain was in some state i at time k: f(k, i)b(k, i) Pr[s k = i] =. j f(k, j)b(k, j) 18-1

18-2 Lecture 18: November 9, 2004 However, this isn t very useful because it doesn t capture dependencies between states and output symbols at different times. Tropical Notation A semi-ring is a tuple (S, id +, id, +, ), where S is a set of elements, id + and id are the additive and multiplicative identities (respectively), and + and are the addition and muliplication operations. The operations are closed, commutative, associative, and distributes over +. There are many examples of semi-rings: Example 1 (R + {0}, 0, 1, +, ), the non-negative reals under standard addition and multiplication, is a semi-ring. Example 2 (R { },, 0, min T, +) (the Boltzmann semi-ring), where min(a, b) = T log(exp( a/t ) + exp( b/t )), T is a semi-ring useful in statistical mechanics. Note that as T 0 +, min T approaches min, in the sense that it becomes a closer and closer approximation of the min operation. The semi-ring at this limit is known as one of the tropical semi-rings. Example 3 Likewise, (R { },, 0, max, +), which arises by taking the Boltzmann semi-ring as T, is the other tropical semi-ring. Let s interpret our HMM quantities in the tropical semi-ring from Example 3. To do so, we simply take logarithms of the real quantities, replace by max, and multiplication by +. Recall that Pr[x 1,..., x n ] = π i E(i 1, x 1 )T (i 1, i 2 ) = f(k, i)b(k, i). i 1,...,i n We tropicalize this expression, viewing it in the appropriate semi-ring: max(log π i + log E(i 1, x 1 ) + log T (i 1, i 2 ) + ) = max(f (k, i) + b(k, i))

Lecture 18: November 9, 2004 18-3 where f and b are defined similarly to f and b, but in the semi-ring. From this, we get a more compact and easier to evaluate expression for the most likely state at time k: state k = arg max(f (k, i) + b(k, i)). i It is possible to derive other interesting results via tropicalization: The Viterbi algorithm can be viewed as a tropicalization of the likelihood calculation; Approximations can be made about random gapped alignment scores; Tropical determinants of distance matrices provide useful information about trees. Training an HMM We want to compute the HMM that was mostly likely to produce a given sequence of symbols. Input: the sequence (or, more generally, set of sequences) x 1,..., x n and the number of states in the HMM. Output: the HMM which maximizes Pr[x 1,..., x n ]. In general, this is a nonlinear optimization problem with constraints: T 0, E 0, T 1 = 1, E 1 = 1. The objective being optimized, Pr[x 1,..., x n ], can actually be written as a polynomial in T and E. Unfortunately, the general problem of polynomial optimization is NP -hard. Expectation Maximization Expectation Maximization (EM) is a very common tool (often used in machine learning) to solve these kinds of optimization problems. It is an iterative method which walks through the solution space and is guaranteed to increase the likelihood at every step. It follows from general techniques for doing gradient search on constrained polynomials, and always converges to some local (but possibly not global) optimum.

18-4 Lecture 18: November 9, 2004 In EM, we start with some initial E and T and iterate, computing more likely values with each iteration. Given the values of E and T at one iteration, the values in the next iteration (called E ˆ and Tˆ) are given by the equations: Ê(i, x) f(k, i)b(k, i) k : x k =x Tˆ(i, j) f(k, i)t (i, j)e(j, x k )b(k + 1, j) (where f and b are calculated using E and T ). k HMMs Applied to Global Alignment Figure 18.1: An HMM to support alignment Consider the problem of global alignment with affine gap penalties. We can view this problem in terms of finding an HMM that produces the observed sequences with good likelihood. We limit the HMM to a specific structure (see figure 18.1), which consists of several layers, each containing one of the following three kinds of states: Match state: this emits one base (usually with high bias), and links to the insertion state at the same layer and the deletion state at the next layer; Insertion state: this emits a random base and links back to itself, as well as to the deletion and matching states of the next layer; Deletion state: this emits nothing, and links to the deletion and match states of the next layer.

Lecture 18: November 9, 2004 18-5 A path through the HMM corresponds to an alignment in the following way: passing through a match state corresponds to matching bases between the two sequences; cycling through an insertion state corresponds to some number of inserted bases (i.e., gaps in one sequence); passing through deletion states corresponds to some number of deleted bases (i.e., gaps in the other sequence). Extensions of Alignment via HMMs Using HMMs we can do alignment to profiles (related sequences serving as training data). Multisequence alignment is useful in many ways: The Viterbi algorithm produces good multialignments (recall that the previous best algorithm for this problem took time and space exponential in the number of sequences). Transition probabilities give position dependent alignment penalties, which can be much more accurate than in the case of generic affine gap penalties. Probabilities within the HMM highlight regions of both high and low conservation of bases across different sequences. Pairs of HMMs can provide means for performing clustering: after training two HMMs on different profiles, a new sequence can be clustered by choosing the HMM from which it would be produced with the higher likelihood.