Genome 373: Hidden Markov Models II. Doug Fowler

Similar documents
CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Evolutionary Models. Evolutionary Models

CS711008Z Algorithm Design and Analysis

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

HIDDEN MARKOV MODELS

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. Three classic HMM problems

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models for biological sequence analysis

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

STA 414/2104: Machine Learning

Lecture 3: Markov chains.

Alignment Algorithms. Alignment Algorithms

BMI/CS 576 Fall 2016 Final Exam

Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models for biological sequence analysis I

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

STA 4273H: Statistical Machine Learning

Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Data Mining in Bioinformatics HMM

Hidden Markov Models. Terminology and Basic Algorithms

Dynamic Approaches: The Hidden Markov Model

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models

Today s Lecture: HMMs

Hidden Markov Models (I)

11.3 Decoding Algorithm

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Hidden Markov Models. x 1 x 2 x 3 x K

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Lecture 7 Sequence analysis. Hidden Markov Models

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Basic math for biology

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Chapter 4: Hidden Markov Models

Hidden Markov Models: All the Glorious Gory Details

CSCE 471/871 Lecture 3: Markov Chains and

CSC321 Lecture 7 Neural language models

Linear Dynamical Systems

Hidden Markov Models. x 1 x 2 x 3 x K

Advanced Data Science

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018

Bioinformatics: Biology X

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Methods. Algorithms and Implementation

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Multiscale Systems Engineering Research Group

Introduction to Machine Learning CMU-10701

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. Ron Shamir, CG 08

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov Models. Hosein Mohimani GHC7717

Lecture 11: Hidden Markov Models

Introduction to Hidden Markov Models (HMMs)

Physics Motion Math. (Read objectives on screen.)

Sequences and Information

EECS730: Introduction to Bioinformatics

Hidden Markov Models. Terminology, Representation and Basic Problems

Stephen Scott.

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Machine Learning, Fall 2011: Homework 5

Probability. CS 3793/5233 Artificial Intelligence Probability 1

2 : Directed GMs: Bayesian Networks

order is number of previous outputs

Hidden Markov Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Pair Hidden Markov Models

Hidden Markov Models Part 2: Algorithms

Chapter 1 Review of Equations and Inequalities

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

6 Markov Chains and Hidden Markov Models

Hidden Markov Models (Part 1)

Statistical Sequence Recognition and Training: An Introduction to HMMs

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Dynamic Programming: Hidden Markov Models

Transcription:

Genome 373: Hidden Markov Models II Doug Fowler

Review From Hidden Markov Models I What does a Markov model describe?

Review From Hidden Markov Models I A T A Markov model describes a random process of transitions from one state to another in a state space.

Review From Hidden Markov Models I A T A Markov model describes a random process of transitions from one state to another in a state space. What does a Markov model produce, if used generatively?

Review From Hidden Markov Models I A T Sequence: AAAATTTT A Markov model describes a random process of transitions from one state to another in a state space. A sequence of states/symbols And what governs the sequence?

Review From Hidden Markov Models I A T Sequence: AAAATTTT A Markov model describes a random process of transitions from one state to another in a state space. A sequence of states/symbols And what governs the sequence? The transition probabilities

Review From Hidden Markov Models I Sequence: AAAATTTT We learned that a Markov model describes a random process of transitions from one state to another in a state space. What is hidden in a hidden Markov model and how does this relate to emission probabilities?

Review From Hidden Markov Models I A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 Sequence: AAAATTTT State path:??????? We learned that a Markov model describes a random process of transitions from one state to another in a state space. In a HMM, states are unknown to us and associated with a set of emission probabilities so that many different state paths can generate a given sequence

Review From Hidden Markov Models I This image cannot currently be displayed. AAAATTTT Sequence: State path #1: aaaat t t State path #2: t t t t aaaa L Y P (x, ) =a 0 1 i=1 e i (x i )a i i+1 Finally, recall that we can calculate the probability of any particular (hidden) state path giving rise to a sequence! P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1

Review From Hidden Markov Models I A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 AAAATTTT Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 This is the crux of a HMM and illustrates how we can use HMMs to calculate the probability of a particular state path if we have a model and emission/transition probabilities P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1

Outline The Viterbi Algorithm (or, how can we find the most probable state path?) The Forward-Backward Algorithm (or, how can we find the probability of a state at a particular time) What is an algorithm, anyhow? A procedure for solving a problem

Recalling Our Motivation Given a sequence, we want to be able to predict the major features of genes in the sequence (e.g. create gene models) Start TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTAGACTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATGA GAGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT Exon 1 Intron 1 Exon 2 Stop TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTATGCTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATAG AGGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT

Recalling Our Motivation We want a model that can predict whether each base in a sequence is in one of a known set of states (intergenic, start exon, intron, stop) Start TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTAGACTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATGA GAGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT Exon 1 Intron 1 Exon 2 Stop TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTATGCTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATAG AGGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT

How Can We Find the Most Probable Path? Can anyone tell me a way to find the most probable state path? Hint: we talked about a way to calculate the probability of any individual state path given a sequence: P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1

Could We Work Out Every Possibility? Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things

Could We Work Out Every Possibility? Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things which is that there are a very large number of possible state paths! In fact, there are S^N possibilities for S states and N symbols

Could We Work Out Every Possibility? No. Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things which is that there are a very large number of possible state paths! In fact, there are S^N possibilities for S states and N symbols A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 Two states, 100 positions 2 100 =1.3 10 30 Even a fast computer won t help you much

Sound Like A Familiar Problem? You all have seen something very similar to this already the goal is to find the optimal path without having to explicitly test every possibility

Sound Like A Familiar Problem? You all have seen something very similar to this already sequence alignment There, you learned about dynamic programming approaches to find the best alignment between two sequences without examining all the possibilities

Sound Like A Familiar Problem? You all have seen something very similar to this already sequence alignment There, you learned about dynamic programming approaches to find the best alignment between two sequences without examining all the possibilities The Viterbi algorithm is similar, finding the most probable state path given a sequence and a model without examining all the possible state paths

The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 Let s go back to our AT example, with one small change (note A-rich emission probabilities). Can someone talk through the parts of this model?

Begin 0.5 The Viterbi Algorithm A We can write down a graph of all possible state paths for an example sequence (AAT) T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 0.5 a-rich a-rich a-rich A A T

The Viterbi Algorithm Begin 0.5 0.5 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = We calculate the probability of each transition/emission step

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T What calculation should we do here, to get P(A,A a-rich, a-rich)?

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 Multiply P(A a-rich) by the appropriate transition and emission probabilities

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.5*0.2 = **0.7 = 0.007 We calculate the probability of each transition/emission step

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 We calculate the probability of each transition/emission step and discard all but the most likely path leading to each state

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T How about for the state at position 2? Take a minute to do the two probability calculations.

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.2 = 0.007 0.5*0.2 = **0.2 = 0.018 And which path should we discard?

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.2 = 0.007 0.5*0.2 = **0.2 = 0.018 And which path should we discard?

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Here is the answer for the third step of the 8 (2 states ^ 3 symbols) possible paths, the Viterbi algorithm leaves us with two

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Which should we pick?

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 We can pick the most likely

Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Note, we didn t switch to this step what would happen if we the kept getting T s?

Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Eventually, the likeliest state path would become one with a transition to a state!

Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Said another way, if several paths converge on a particular state instead of recalculating them all when we calculate probabilities for the next step we discard the less likely paths

Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 For practical reasons, we typically operate in log space (i.e. take the log of the probabilities), since the probabilities get very small very quickly

Outline The Viterbi Algorithm (or, how can we find the most probable state path?) The Forward-Backward Algorithm (or, how can we find the probability of a state at a particular time)

A slightly different question P ( i = k x) What if we are interested in the probability that the HMM was in a particular state k at a particular position i?

A slightly different question P ( i = k x) Any thoughts about conceptually how to do this?

A slightly different question P ( i = k x) = P (x, i = k) P (x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths

A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths

A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths

A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) What problem are we doing to run into here, without an algorithm to help?

A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) As before, the number of possible state paths is too large to brute force

The forward-backward algorithm P ( 2 = A, A, T ) Begin a-rich a-rich a-rich A A T Let s revisit our simple example. Our goal is to calculate the probability, given the model and the sequence, that the state at position 2 was

The forward-backward algorithm P ( 2 = A, A, T ) Begin a-rich a-rich a-rich A A T What arrows should we remove to illustrate possible paths through state space that correspond to our question?

The forward-backward algorithm Begin a-rich a-rich a-rich A A T These are the possible paths through state space where 2 =

The forward-backward algorithm Begin a-rich a-rich a-rich A A T Let s first just consider the forward part of the problem: probability of seeing AA and reaching the state

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = f (2) = 0.2 (0.35 + ) In the forward algorithm, we sum all the joint transition/emission probabilities leading to 2 =

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = f (2) = 0.025 This gives us probability of seeing AA and reaching the state

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T But, if our goal is to calculate Why? P ( 2 = A, A, T ) we re not done yet.

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T But, if our goal is to calculate P ( 2 = A, A, T ) we re not done yet. Why? Because we have the rest of the sequence to account for!

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T So, let s consider the backward part of the problem, which is the probability of getting the rest of the sequence given that 2 =

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T b (2) = ( 0.3+ 0.8) In the backward algorithm we sum the emission and transition probabilities across all states

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T P ( 2 = A,A,T) = f (2) b (2) P (x) Now, we can solve our problem. The probability of the model being in a state at position 2 is equal to the product of the forward and backward probabilities divided by probability of all paths P(x). How could we obtain this quantity?

The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T P ( 2 = A,A,T) = f (2) b (2) P (x) One way is to use the forward algorithm, summing over all the possible ending states of the final position

General form of the F-B Algorithm f k,i = e k (x i ) X l f l,i 1 a lk P (transitioning from l to k) P (emitting x i i = k) P (sequence from 1 to i i = k) P (sequence from 1 to i-1 i 1 = l) In our simple example with a three symbol sequence, we calculated one step forward and one step backward to the middle position. The forward algorithm is recursive, with each calculation being reused rather than recomputed

General form of the F-B Algorithm f k,i = e k (x i ) X l f l,i 1 a lk b k.i = X l e l (x i+1 )b l,i+1 a kl The power of these algorithms is that they eliminate the need to calculate all possible state paths

Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: results:

Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results:

Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results: the most likely state path

Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results: the most likely state path This process is often referred to as decoding an HMM, because it reveals the most likely sequence of (hidden, encoded) states Begin 0.5 0.5 a-rich a-rich a-rich A A T

Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: results:

Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: moving forward or backward, sum the probabilities of paths leading to a particular state π i = k results:

Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: moving forward or backward, sum the probabilities of paths leading to a particular state π i = k results: the probability that the model was in state k at position i The F-B algorithm can be used to solve many other decoding problems (e.g. find the most probable state at position i)