Hidden Markov Models (HMMs) November 14, 2017

Similar documents
Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Data Mining in Bioinformatics HMM

Computational Genomics and Molecular Biology, Fall

HMMs and biological sequence analysis

EECS730: Introduction to Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

O 3 O 4 O 5. q 3. q 4. Transition

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Three classic HMM problems

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. x 1 x 2 x 3 x K

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. Ron Shamir, CG 08

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Computational Genomics and Molecular Biology, Fall

Lecture 3: Markov chains.

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Introduction to Hidden Markov Models (HMMs)

Hidden Markov Models (I)

Markov Models & DNA Sequence Evolution

Today s Lecture: HMMs

HIDDEN MARKOV MODELS

Stephen Scott.

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Introduction to Machine Learning CMU-10701

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models. Terminology, Representation and Basic Problems

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models for biological sequence analysis

8: Hidden Markov Models

DNA Feature Sensors. B. Majoros

Biology 644: Bioinformatics

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Chapter 4: Hidden Markov Models

BMI/CS 576 Fall 2016 Final Exam

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models: All the Glorious Gory Details

Advanced Data Science

Hidden Markov Models Part 1: Introduction

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Basic math for biology

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Hidden Markov Models

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

11.3 Decoding Algorithm

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

Dynamic Approaches: The Hidden Markov Model

Dept. of Linguistics, Indiana University Fall 2009

Statistical NLP: Hidden Markov Models. Updated 12/15

Evolutionary Models. Evolutionary Models

Hidden Markov Models

Pair Hidden Markov Models

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Lecture 7 Sequence analysis. Hidden Markov Models

Data-Intensive Computing with MapReduce

Hidden Markov Models for biological sequence analysis I

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Parametric Models Part III: Hidden Markov Models

6 Markov Chains and Hidden Markov Models

Lecture 9. Intro to Hidden Markov Models (finish up)

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CS711008Z Algorithm Design and Analysis

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

order is number of previous outputs

Hidden Markov Models and some applications

Statistical Methods for NLP

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

HMM part 1. Dr Philip Jackson

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Hidden Markov Models

STA 414/2104: Machine Learning

Lecture 11: Hidden Markov Models

Alignment Algorithms. Alignment Algorithms

8: Hidden Markov Models

Hidden Markov Models NIKOLAY YAKOVETS

Machine Learning for natural language processing

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models and some applications

Hidden Markov Models Part 2: Algorithms

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Transcription:

Hidden Markov Models (HMMs) November 14, 2017

inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes have open reading frames and are segmented (interrupted by introns) in eukaryotic genomes. Looking at a large genomic region with lots of open reading frames, which ones belong to genes?

inferring a hidden truth: simple HMMs hidden states (intended words, gene presence) are related to the observed states (static, lots of open reading frames). Each hidden state has one observed state. Characteristics of the current hidden state are governed in some way by the previous hidden state, but not that state s previous state or any earlier states. Characteristics of the current observed state are governed by the current hidden state.

Markov chains 3 states of weather: sunny, cloudy, rainy Observed once a day at the same time All transitions are possible, with some probability Each state depends only on the previous state

Markov chains: another view start state1 state2 state3 state4 state5 time t1 t2 t3 t4 t5

Markov chains State transition matrix: the probability of the weather today given yesterday s weather The rows of the transition matrix must sum to one Initial distribution must be defined (day one: p(sunny)=?, p(cloudy)=?, p(rainy)=?...)

Markov chains P(x L x L-1, x L-2,... x 1 ) = P(x L x L-1 ) for all L. What does this mean?? The current state, L, does not depend on anything but the previous state. This is the memoryless property. Very important.

First-order Markov model P(x) = probability of a particular sequence of observations x = {x 1, x 2,... x n } p ij = probability that if the previous symbol is i, the next symbol will be j. Under this model, p(accgata) (probability of observing this sequence) is just p A p AC p CC p CG p GA p AT p TA where p AC = p(there will be a C after an A) = p(c A), and that probability does NOT depend on anything in the sequence besides that preceding A. Then p(accgata) = p 1 Πp ij

Higher order Markov chains Sunny = S, cloudy = C 2nd order Markov model: weather depends on yesterday plus the day before Not all state transitions are possible! SSCSCC = S 1 S 2 + S 2 C 3 + C 3 S 4 + S 4 C 5 + C 5 C 6 SS SC CS CC

Hidden Markov Models Back to the weather example. All we can observe now is the behavior of a dog only he can see the weather, we cannot!!! Dog can be in, out, or standing pathetically on the porch. This depends on the weather in a quantifiable way. How do we figure out what the weather is if we can only observe the dog?

Hidden Markov Models Dog s behavior is the emission of the weather (the hidden states) Output matrix = emission probabilities Hidden states = system described by Markov model Observable states = side effects of the Markov model

Hidden Markov Models: another view observation 1 observation 2 observation 3 observation 4 observation 5 πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 time t1 t2 t3 t4 t5 q s can be sunny, cloudy, or rainy. Observations are the dog s behavior (in, out, porch)

Hidden Markov Models All we observe is the dog: IOOOIPIIIOOOOOPPIIIIIPI What s the underlying weather (the hidden states)? How likely is this sequence, given our model of how the dog works? What portion of the sequence was generated by each state?

Hidden Markov Models start: p(c)=0.2, p(r)=0.2, p(s)=0.6 All we observe is the dog: IOIOIPI weather? guess: RSRSRRR? p(dog s behavior) = 0.023 but p(rsrsrrr) is only 0.00012. CCCCCCC? p(dog) = 0.002 but p(weather) = 0.0094

Hidden Markov Models: the three questions Evaluation Given a HMM, M, and a sequence of observations, x Find P(x M) Decoding Given a HMM, M, and a sequence of observations, x Find the sequence Q of hidden states that maximizes P(x, Q M) Learning Given an unknown HMM, M, and a sequence of observations, x Find parameters θ that maximize P(x θ, M)

Hidden Markov model: Five components 1. A set of N hidden states S 1, S 2, S N S 1 = sunny S 2 = cloudy S 3 = rainy N=3

Hidden Markov model: Five components 2. An alphabet of distinct observation symbols A = {in, out, porch} = {I,O,P}

Hidden Markov model: Five components 3. Transition probability matrix P = (p ij ) where q t is the shorthand for the hidden state at time t. q t = S i means that the hidden state at time t was state S i p ij = P(q t+1 = S j q t = S i ) transition matrix: hidden states!

Hidden Markov model: Five components 4. Emission probabilities: For each state S i and a in A b i (a) = p(s i emits symbol a) The probabilities b i (a) form an NxM matrix where N=#hidden states, M=#observed states b 1 (O) = p(s 1 emits out ) = 0.7

Hidden Markov model: Five components 5. An initial distribution vector π = (π i ) where π i = P(q 1 = S i ). start: p(c)=0.2, p(r)=0.2, p(s)=0.6 p(q 1 = S 1 ) = probability that the (hidden) first state is sunny = 0.6 so π = (0.6,0.2,0.2) NOTE that the first emitted state is not specified in the initial distribution vector. That s part of the model.

HMMs: another view x = {O,I,I,O,P} S = {S, C, R} x1 x2 x3 x4 x5 bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5

HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) this tells you how unusual the observations are, regardless of hidden states. One way to do this is brute force: find all possible sequences of hidden states calculate p(x Q) for each then p(x) = Σp(X Q)p(Q) (sum over ALL hidden state sequences Q) But this takes an exponential number of calculations... 2NT N where N=#hidden states and T=length of observed sequence

HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) Forward algorithm: Calculate the probability of the sequence of observations up to and including time t: P(x 1 x 2 x 3 x 4... x t ) =?? that s the same problem. If we knew the hidden state for x t, we could use that, so let α(t, i) = P(x 1 x 2 x 3 x 4... x t, q t = S i ) (a joint probability)

HMM: solve problem 1 (evaluation)

evaluation I O P P I bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5=t

HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) We can also use the Backward algorithm for this problem. Briefly: where T is the total number of observed states. Generalize:

evaluation I O P P I bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5=t

HMM forward-backward algorithm If you re following this you realize the forward and backward algorithms alone are just formalizations of the brute force method! How does this help? Now, if we fix the identity of the hidden state at time t, we can calculate the probability of the sequence. These formulas come in handy later.

HMM forward-backward algorithm The forward and backward algorithms work together: I know how to calculate the probability of a sequence up to a time point, if I know the hidden state at that time point (α(t,i)). I know how to calculate the probability of an observed sequence from the end backward to a time point, given the hidden state at that time point (β(t,i))

HMM forward-backward algorithm dog: IOOOPPPIO 123456789 p(1-4, 4=S)*p(4-9 4=S) p(1-4, 4=C)*p(4-9 4=C) p(1-4, 4=R)*p(4-9 4=R)} p(iooopppio)

HMM: solve problem 2 (decoding) Decoding: Given a HMM M and a sequence x, find the sequence π of states that maximizes P(x, Q M) We ll use the Viterbi algorithm. Assumptions needed for the Viterbi algorithm: 1) the observed and hidden events must be in a sequence 2) an observed event must correspond to one and only one hidden event 3) computing the most likely sequence up to point t depends only on the observed event at t and the most likely sequence up to t-1.

HMM: solve problem 2 (decoding) Viterbi algorithm

HMM: solve problem 2 (decoding) Viterbi algorithm In this problem, we don t really care what the probability of the sequence is. It happened. What we want to know is whether it was sunny when the dog was inside on the third day.

HMM: solve problem 2 (decoding) Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms. Observed: OOIP What is q3?

HMM: solve problem 2 (decoding) Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms. OOIP What is q3? time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P

HMM: solve problem 2 (decoding) OOIP What is q3? time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P need to figure out the most likely hidden states & look at which is at t3.

HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P

HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P

HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P = π(q1 ) * bq1(o) * p12 * bq2(o) * p23 * bq3(i) = bq4(p) * p34

HMM: solve problem 2 (decoding) Sunny Cloudy Rainy O O I P

HMM: solve problem 2 (decoding) O O I P = π(q1 ) * bq1(o) * p12 * bq2(o) * p2s * bs(i) = bq4(p) * ps4

HMM: solve problem 2 (decoding) O O I P

HMM: solve problem 2 (decoding) O O I P

HMM: solve problem 2 (decoding) The Viterbi algorithm is very powerful and can distinguish subtle features of strings Originally designed for speech processing Dishonest casino problem

One example: very crude gene finding evaluating nonoverlapping chunks of 21bp sequence C= coding (no stop codon) N= noncoding (one or more stop codons) n i+1 exon intron intergenic 21bp coding 21bp noncoding n i exon 0.4 0.5 0.1 intron 0.2 0.8 0 intergenic 0.1 0 0.9 exon 0.90 0.1 intron 0.2 0.8 intergenic 0.3 0.7 CCCCCNNNNNNCCNNNCCCCCCCCCNNCNCCCNNNNNN

Applications of HMMs and Markov Models Sequence alignment pairwise and multiple PFAM HMMPro HMMER SAM Making profiles to describe sequence families Finding signals in DNA Gene finding (GLIMMER, GENSCAN) Motif finding Segmentation analysis (microarray data, any signals) Finding CpG islands