Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Similar documents
An Introduction to Bioinformatics Algorithms Hidden Markov Models

HIDDEN MARKOV MODELS

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models 1

Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. Hosein Mohimani GHC7717

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models for biological sequence analysis I

11.3 Decoding Algorithm

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models. x 1 x 2 x 3 x K

Stephen Scott.

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

HMM: Parameter Estimation

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. x 1 x 2 x 3 x K

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models (I)

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS711008Z Algorithm Design and Analysis

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Introduction to Machine Learning CMU-10701

Hidden Markov Models. Ron Shamir, CG 08

6 Markov Chains and Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Markov chains and Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x N

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

HMMs and biological sequence analysis

Today s Lecture: HMMs

Hidden Markov Models

Brief Introduction of Machine Learning Techniques for Content Analysis

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Bioinformatics: Biology X

Advanced Data Science

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Lecture 5: December 13, 2001

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Lecture 7 Sequence analysis. Hidden Markov Models

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Chapter 4: Hidden Markov Models

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Data Mining in Bioinformatics HMM

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models (HMMs)

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Hidden Markov Models. Terminology, Representation and Basic Problems

Bio nformatics. Lecture 3. Saad Mneimneh

Basic math for biology

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

L23: hidden Markov models

Lecture 3: Markov chains.

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

STA 414/2104: Machine Learning

Pairwise alignment using HMMs

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Methods. Algorithms and Implementation

Hidden Markov Models and some applications

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical Sequence Recognition and Training: An Introduction to HMMs

Graphical Models Seminar

order is number of previous outputs

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Parametric Models Part III: Hidden Markov Models

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

Dynamic Approaches: The Hidden Markov Model

8: Hidden Markov Models

Lecture 12: Algorithms for HMMs

Hidden Markov Models (HMMs) November 14, 2017

Graphical Models 101. Biostatistics 615/815 Lecture 12: Hidden Markov Models. Example probability distribution. An example graphical model

Lecture 12: Algorithms for HMMs

Advanced Algorithms. Lecture Notes for April 5, 2016 Dynamic programming, continued (HMMs); Iterative improvement Bernard Moret

Hidden Markov Models and some applications

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Transcription:

Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info

Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm HMM Parameter Estimation Viterbi training Baum-Welch algorithm

CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of occurrence of a dinucleotide is ~ 1/16. However, the frequencies of dinucleotides in DNA sequences vary widely. In particular, CG is typically underepresented (frequency of CG is typically < 1/16)

Why CG-Islands? CG is the least frequent dinucleotide because C in CG is easily methylated and has the tendency to mutate into T afterwards However, the methylation is suppressed around genes in a genome. So, CG appears at relatively high frequency within these CG islands So, finding the CG islands in a genome is an important problem

Markov Chains Given a sequence x = AGCTCTC...T and transition probabilities a AG =p(x i =G x i-1 =A) What is p(x)?

Markov Chains p(x)=p(x L,x L-1,x L-2,...,x 1 ) =p(x L x L-1,x L-2,,x 1 ).p(x L-1 x L-2,...,x 1 )...p(x 1 ) Markov Assumption p(x i x i-1,x i-2,...,x 1 ) = p(x i x i-1 ) then p(x)=p(x L x L-1 ).p(x L-1 x L-2 )...p(x 1 ) =p(x 1 )Π i=1 p(x i-1 x i-2 )

Markov Chains Start State Given a sequence x = bagctctc...te Transition probabilities a AG =p(x i =G x i-1 =A) p(x) =p(x 1 )Π i=1 a p(x 1 )=1 p(x 2 x 1 )=1/4 x i x i-1

Using Markov Model for CG discrimination Use curated sequences of CG islands (non- CG islands) to build two markov models (model + and model - ) Estimate a st =c st / Σ t' c st'

Markov Model for CG discrimination For a given sequence x calculate log (p(x model + )/ p(x model - )) non-cpg islands cpg islands

What if we don't known any CG islands?

CG Islands and the Fair Bet Casino The CG islands problem can be modeled after a problem named The Fair Bet Casino The game is to flip coins, which results in only two possible outcomes: Head or Tail. The Fair coin will give Heads and Tails with same probability ½. The Biased coin will give Heads with prob. ¾.

The Fair Bet Casino (cont d) Thus, we define the probabilities: P(H F) = P(T F) = ½ P(H B) = ¾, P(T B) = ¼ The crooked dealer changes between Fair and Biased coins with probability 10%

The Fair Bet Casino Problem Input: A sequence x = x 1 x 2 x 3 x n of coin tosses made by two possible coins (F or B). Output: A sequence π = π 1 π 2 π 3 π n, with each π i being either F or B indicating that x i is the result of tossing the Fair or Biased coin respectively.

Problem Fair Bet Casino Problem Any observed outcome of coin tosses could have been generated by any sequence of dices π! Need to incorporate a way to grade different sequences differently. Decoding Problem Find sequence π with maximum probability

Hidden Markov Model (HMM) Can be viewed as an abstract machine with k hidden states that emits symbols from an alphabet Σ. Each state has its own probability distribution, and the machine switches between states according to this probability distribution. While in a certain state, the machine makes 2 decisions: What state should I move to next? What symbol - from the alphabet Σ - should I emit?

HMM for Fair Bet Casino (cont d) HMM model for the Fair Bet Casino Problem

Why Hidden? Observers can see the emitted symbols of an HMM but have no ability to know which state the HMM is currently in. Thus, the goal is to infer the most likely hidden states of an HMM based on the given sequence of emitted symbols.

HMM Parameters Σ: set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing

HMM Parameters (cont d) A = (a kl ): a Q x Q matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k (b)): a Q x Σ matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

HMM for Fair Bet Casino The Fair Bet Casino in HMM terms: Σ = {0, 1} (0 for Tails and 1 Heads) Q = {F,B} F for Fair & B for Biased coin. Transition Probabilities A *** Emission Probabilities E Fair Biased Tails(0) Heads(1) Fair a FF = 0.9 a FB = 0.1 Biased a BF = 0.1 a BB = 0.9 Fair Biased e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

Hidden Paths A path π = π 1 π n in the HMM is defined as a sequence of states. Consider path π = FFFBBBBBFF and sequence x = 0101110100 Probability that x i was emitted from state π i X 0 1 0 1 1 1 0 1 0 0 1 π = F F F B B B B B F F F P(x i π i ) ½ ½ ½ ¾ ¾ ¾ ¼ ¾ ½ ½ ½ P(π i π i-1 ) 9 / 10 9 / 10 1 / 10 9 / 10 9 / 10 9 / 10 9 / 10 1 / 10 9 / 10 9 / 10 9 / 10 Transition probability from state π i-1 to state π i

P(x,π) Calculation P(x,π): Probability that sequence x was generated by the path π: n P(x,π) = Π i=1 P(x i π i ) P(π i+1 π i ) n = Π i=1 e πi (x i ) a π i, π i+1

HMM - Algorithms We will describe algorithms that allow us to compute: argmax π P(π,x) - Most probable path for a given string x (Viterbi path) P(π i = k x) - Posterior probability that the i th state is k, given x A more refined measure of which states x may be in

Decoding Problem Goal: Find an optimal hidden path of states given observations. Input: Sequence of observations x = x 1 x n generated by an HMM M(Σ, Q, A, E) Output: A path that maximizes P(x,π) over all possible paths π.

Building Manhattan for Decoding Problem Andrew Viterbi used the Manhattan grid model to solve the Decoding Problem. Explore the fact that the solution for (π 1 π n ) is based on the solution from (π 1 π n-1 ) A node S i,k in the graph encodes the probability that P(x 1,..x i,π i1,,π i =k). The Viterbi algorithm finds the path that maximizes P(x,π) among all possible paths.

Edit Graph for Decoding Problem S1,1 1 S2,1 S3,1 Sn,1 states 2 S1,2 S2,2 S3,2 2 Sn,2 S1,k S2,k K S3,k Sn,k x 1 x 2 x 3 x n observations

Edit Graph for Decoding Problem S1,1 states S1,2 S1,k x 1 Initialization: s 1,l = e l (x i ) Recursions: s i+1,l = e l (x i+1 ) max j Є Q {s ji a jl }

Edit Graph for Decoding Problem S1,1 S2,1 states S1,2 S2,2 S1,k S2,k x 1 x 2 Initialization: s 1,l = e l (x i ) Recursions: s i+1,l = e l (x i+1 ) max j Є Q {s ji a jl }

Edit Graph for Decoding Problem S1,1 S2,1 states S1,2 S2,2 S1,k S2,k x 1 x 2 Initialization: s 1,l = e l (x i ) Recursions: s i+1,l = e l (x i+1 ) max j Є Q {s ji a jl }* (keep pointer to max)

Edit Graph for Decoding Problem S1,1 S2,1 S3,1 Sn,1 states S1,2 S2,2 S3,2 S,n,k S1,k S2,k S3,k Sn,k x 1 x 2 x 3 x n Initialization: s 1,l = e l (x i ) Recursions: s i+1,l = e l (x i+1 ) max j Є Q {s ji a jl }* (keep pointer to max)

Edit Graph for Decoding Problem S1,1 S2,1 S3,1 Sn,1 states S1,2 S2,2 S3,2 S,n,k S1,k S2,k S3,k Sn,k x 1 x 2 x 3 x n Let π * be the optimal path. Then, 1. find P(x,π * ) = max l Є Q {s n,l } 2. backtrack on pointers

Edit Graph for Decoding Problem S1,1 1 S2,1 S3,1 Sn,1 states 2 S1,2 S2,2 S3,2 2 Sn,2 S1,k S2,k K S3,k Sn,k x 1 x 2 x 3 x n Let π * be the optimal path. Then, 1. find P(x,π * ) = max l Є Q {s n,l } 2. backtrack on pointers

Viterbi Algorithm Computational complexity n*k 2 The value of the product can become extremely small, which leads to overflowing. To avoid overflowing, use log value instead. s i+1,k = loge l (x i+1 ) + max k Є Q {s i,k + log(a kl )}

Posterior Problem Given: a sequence of coin tosses generated by an HMM. Goal: find the probability that the dealer was using a biased coin at a particular time (posterior problem).

Posterior Algorithm We want to compute P(π i = k x), the probability distribution on the i th position, given x We start by computing P(π i = k, x) = P(x 1 x i, π i = k, x i+1 x N ) = P(x 1 x i, π i = k) P(x i+1 x N x 1 x i, π i = k) = P(x 1 x i, π i = k) P(x i+1 x N π i = k) Forward, f i k Backward, b k i Then, P(π i = k x) = P(π i = k, x) / P(x)

Backward Algorithm (derivation) Define backward probability b k,i as the probability of being in state π i = k and emitting the suffix x i+1 x n b i k = P(x i+1 x N π i = k) starting from i th state = k, generate rest of x = Σ πi+1 π N P(x i+1,x i+2,, x N, π i+1,, π N π i = k) = Σ l Σ πi+1 π P(x N i+1,x i+2,, x N, π i+1 = l, π i+2,, π N π i = k) = Σ l e l (x i+1 ) a kl Σ πi+1 πn P(x i+2,, x N, π i+2,, π N π i+1 = l) = Σ l e l (x i+1 ) a kl b l (i+1)

Backward Algorithm We can compute b i k for all k, i, using dynamic programming/edit graph Initialization: b n,k = 1, for all k Iteration: b k (i) = Σ l e l (x i+1 ) a kl b l (i+1) Termination: P(x) = Σ l a 0l e l (x 1 ) b l (1)

Forward Algorithm Define forward probability f k,i as the probability of being in state π i = k and emitting the prefix x 1 x i. Initialization: f 0 (0) = 1 Iteration: f k (i) = e k (x i ) Σ l f l (i 1) a lk Termination: P(x) = Σ k f k (N)

Posterior Decoding We can now calculate P(π i = k x) and perform the poster decoding f k (i) b k (i) P(π i = k x) = π^i P(x) = argmax k P(π i = k x) State 1 x 1 x 2 x 3 x N l P(π i =l x) k

Example: Fair dice P(π i = fair x) for a given sequence of dices x

HMM Parameter Estimation So far, we have assumed that the transition and emission probabilities are known. However, in most HMM applications, the probabilities are not known. It s very hard to estimate the probabilities.

HMM Parameter Estimation Problem Given HMM with states and alphabet (emission characters) Independent training sequences x 1, x m Find HMM parameters Θ (that is, a kl, e k (b)) that maximize P(x 1,, x m Θ) the joint probability of the training sequences.

Maximize the likelihood P(x 1,, x m Θ) as a function of Θ is called the likelihood of the model. The training sequences are assumed independent, therefore P(x 1,, x m Θ) = Π i P(x i Θ) The parameter estimation problem seeks Θ that realizes max Θ i P ( x i Θ) In practice the log likelihood is computed to avoid underflow errors

Two situations Known paths for training sequences CpG islands marked on training sequences One evening the casino dealer allows us to see when he changes dice Unknown paths CpG islands are not marked Do not see when the casino dealer changes dice

Known paths A kl = # of times each k l is taken in the training sequences E k (b) = # of times b is emitted from state k in the training sequences Compute a kl and e k (b) as maximum likelihood estimators: a = A / A e kl k ( b) kl = E k l' kl' ( b) / b' E k ( b' )

Pseudocounts Some state k may not appear in any of the training sequences. This means A kl = 0 for every state l and a kl cannot be computed with the given equation. To avoid this overfitting use predetermined pseudocounts r kl and r k (b). A kl = # of transitions k l + r kl E k (b) = # of emissions of b from k + r k (b) The pseudocounts reflect our prior biases about the probability values.

Unknown paths: Viterbi training Idea: use Viterbi decoding to compute the most probable path for training sequence x Start with some guess for initial parameters and compute π* the most probable path for x using initial parameters. Iterate until no change in π* : 1. Determine A kl and E k (b) as before 2. Compute new parameters a kl and e k (b) using the same formulas as before 3. Compute new π* for x and the current parameters

Viterbi training analysis The algorithm converges precisely There are finitely many possible paths. New parameters are uniquely determined by the current π*. There may be several paths for x with the same probability, hence must compare the new π* with all previous paths having highest probability. Does not maximize the likelihood Π x P(x Θ) but the contribution to the likelihood of the most probable path Π x P( Θ, π*) In general performs less well than Baum-Welch

Overview introduction to Hidden Markov models basic evaluation algorithms Viterbi Path, Posterior Decoding basic training methods supervised case unsupervised case: Viterbi & Baum Welch (not seem today)

References Original slides from bio algorithms Some slides were adapted from CS 262 course at Stanford given by Serafim Batzoglou