HIDDEN MARKOV MODELS

Similar documents
An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models 1

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

11.3 Decoding Algorithm

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models

Hidden Markov Models for biological sequence analysis

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models for biological sequence analysis I

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CSCE 471/871 Lecture 3: Markov Chains and

Markov Chains and Hidden Markov Models. = stochastic, generative models

Stephen Scott.

HMM: Parameter Estimation

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models (I)

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models. x 1 x 2 x 3 x K

CS711008Z Algorithm Design and Analysis

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Ron Shamir, CG 08

Bioinformatics: Biology X

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

HMMs and biological sequence analysis

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

Hidden Markov Models and some applications

Advanced Data Science

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Chapter 4: Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models and some applications

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

11 Hidden Markov Models

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

6 Markov Chains and Hidden Markov Models

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Today s Lecture: HMMs

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Methods. Algorithms and Implementation

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models. x 1 x 2 x 3 x N

Genome 373: Hidden Markov Models II. Doug Fowler

Lecture 5: December 13, 2001

L23: hidden Markov models

8: Hidden Markov Models

Hidden Markov Models (HMMs)

Data Mining in Bioinformatics HMM

Hidden Markov models in population genetics and evolutionary biology

Stephen Scott.

Brief Introduction of Machine Learning Techniques for Content Analysis

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

order is number of previous outputs

EECS730: Introduction to Bioinformatics

Bio nformatics. Lecture 3. Saad Mneimneh

Biology 644: Bioinformatics

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Statistical Sequence Recognition and Training: An Introduction to HMMs

Pairwise alignment using HMMs

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

STA 414/2104: Machine Learning

Pattern Recognition with Hidden Markov Modells

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models (HMMs) November 14, 2017

Markov chains and Hidden Markov Models

Pair Hidden Markov Models

Hidden Markov Models NIKOLAY YAKOVETS

EM algorithm and applications Lecture #9

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

Lecture 7 Sequence analysis. Hidden Markov Models

Basic math for biology

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lecture 3: Markov chains.

Sequences and Information

Statistical Methods for NLP

Hidden Markov Models. Terminology, Representation and Basic Problems

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

1 What is a hidden Markov model?

8: Hidden Markov Models

Transcription:

HIDDEN MARKOV MODELS

Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

CG-Islands (=CpG islands) Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of occurrence of a dinucleotide is ~ 1/16. However, the frequencies of dinucleotides in DNA sequences vary widely. In particular, CG is typically underrepresented (frequency of CG is typically < 1/16)

Why CG-Islands? CG is the least frequent dinucleotide because C in CG is easily methylated and has the tendency to mutate into T afterwards However, the methylation is suppressed around genes in a genome. So, CG appears at relatively high frequency within these CG islands So, finding the CG islands in a genome is an important problem

CG Islands and the Fair Bet Casino The CG islands problem can be modeled after a problem named The Fair Bet Casino The game is to flip coins, which results in only two possible outcomes: Head or Tail. The Fair coin will give Heads and Tails with same probability ½. The Biased coin will give Heads with prob. ¾.

The Fair Bet Casino (cont d) Thus, we define the probabilities: P(H F) = P(T F) = ½ P(H B) = ¾, P(T B) = ¼ The crooked dealer changes between Fair and Biased coins with probability 10%

The Fair Bet Casino Problem Input: A sequence x = x 1 x 2 x 3 x n of coin tosses made by two possible coins (F or B). Output: A sequence π = π 1 π 2 π 3 π n, with each π i being either F or B indicating that x i is the result of tossing the Fair or Biased coin respectively.

Problem Fair Bet Casino Problem Any observed outcome of coin tosses could have been generated by any sequence of states! Need to incorporate a way to grade different sequences differently. Decoding Problem

P(x fair coin) vs. P(x biased coin) Suppose first that dealer never changes coins. Some definitions: P(x fair coin): prob. of the dealer using the F coin and generating the outcome x. P(x biased coin): prob. of the dealer using the B coin and generating outcome x.

P(x fair coin) vs. P(x biased coin) P(x fair coin)=p(x 1 x n fair coin) Π i=1,n p (x i fair coin)= (1/2) n P(x biased coin)= P(x 1 x n biased coin)= Π i=1,n p (x i biased coin)=(3/4) k (1/4) n- k = 3 k /4 n k - the number of Heads in x.

P(x fair coin) vs. P(x biased coin) P(x fair coin) = P(x biased coin) 1/2 n = 3 k /4 n 2 n = 3 k n = k log 2 3 when k = n / log 2 3 (k ~ 0.67n)

Log-odds Ratio We define log-odds odds ratio as follows: log 2 (P(x fair coin) / P(x biased coin)) = Σ k i=1 log 2 (p + (x i ) / p - (x i )) = n k log 2 3

Computing Log-odds Ratio in Sliding Windows x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x n Consider a sliding window of the outcome sequence. Find the log-odds for this short window. Biased coin most likely used 0 Fair coin most likely used Disadvantages: - the length of CG-island is not known in advance - different windows may classify the same position differently Log-odds value

Hidden Markov Model (HMM) Can be viewed as an abstract machine with k hidden states that emits symbols from an alphabet Σ. Each state has its own probability distribution, and the machine switches between states according to this probability distribution. While in a certain state, the machine makes 2 decisions: What state should I move to next? What symbol - from the alphabet Σ - should I emit?

Why Hidden? Observers can see the emitted symbols of an HMM but have no ability to know which state the HMM is currently in. Thus, the goal is to infer the most likely hidden states of an HMM based on the given sequence of emitted symbols.

HMM Parameters Σ: : set of emission characters. Ex.: Σ = {H, T} for coin tossing Σ = {1, 2, 3, 4, 5, 6} for dice tossing Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing

HMM Parameters (cont d) A = (a kl ): a Q x Q matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k (b)): a Q x Σ matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e F (1) = ½ e B (0) = ¼ e B (1) = ¾

HMM for Fair Bet Casino The Fair Bet Casino in HMM terms: Σ = {0, 1} (0 for Tails and 1 Heads) Q = {F,B} F for Fair & B for Biased coin. Transition Probabilities A *** Emission Probabilities E Fair Biased Tails(0) Heads(1) Fair a FF = 0.9 a FB = 0.1 Fair e F (0) = ½ e F (1) = ½ Biased a BF = 0.1 a BB = 0.9 Biased e B (0) = ¼ e B (1) = (1) = ¾

HMM for Fair Bet Casino (cont d) HMM model for the Fair Bet Casino Problem

Hidden Paths A path π = π 1 π n in the HMM is defined as a sequence of states. Consider path π = FFFBBBBBFFF and sequence x = 01011101001 Probability that x i was emitted from state π i x 0 1 0 1 1 1 0 1 0 0 1 π = F F F B B B B B F F F P(x π i i ) ½ ½ ½ ¾ ¾ ¾ ¼ ¾ ½ ½ ½ P(π i-1 π i ) ½ 9 / 9 10 / 1 10 / 9 10 / 9 10 / 9 10 / 9 10 / 1 10 / 9 10 / 9 10 / 10 Transition probability from state π i-1 to state π i

P(x π) Calculation P(x π): Probability that sequence x was generated by the path π: n P(x π) = P(π 0 π 1 ) Π P(x i π i ) P(π i π i+1 i=1 = a π0, π 1 Π e π i (x i ) a πi, π i+1 i+1 )

P(x π) Calculation P(x π): Probability that sequence x was generated by the path π: n P(x π) = P(π 0 π 1 ) Π P(x i π i ) P(π i π i+1 i=1 = a π0, π 1 Π e π i (x i ) a πi, π i+1 = Π e π i+1 i+1 (x i+1 if we count from i=0 instead of i=1 i+1 ) a πi, π i+1 i+1 )

Decoding Problem Goal: Find an optimal hidden path of states given observations. Input: Sequence of observations x = x 1 x n generated by an HMM M(Σ, Q, A, E) Output: A path that maximizes P(x π) over all possible paths π.

HMM for Fair Bet Casino (cont d) HMM model for the Fair Bet Casino Problem

Hidden Paths A path π = π 1 π n in the HMM is defined as a sequence of states. Consider path π = FFFBBBBBFFF and sequence x = 01011101001 Probability that x i was emitted from state π i x 0 1 0 1 1 1 0 1 0 0 1 π = F F F B B B B B F F F P(x π i i ) ½ ½ ½ ¾ ¾ ¾ ¼ ¾ ½ ½ ½ P(π i-1 π i ) ½ 9 / 9 10 / 1 10 / 9 10 / 9 10 / 9 10 / 9 10 / 1 10 / 9 10 / 9 10 / 10 Transition probability from state π i-1 to state π i

P(x π) Calculation P(x π): Probability that sequence x was generated by the path π: n P(x π) = P(π 0 π 1 ) Π P(x i π i ) P(π i π i+1 i=1 = a π0, π 1 Π e π i (x i ) a πi, π i+1 i+1 )

Decoding Problem Goal: Find an optimal hidden path of states given observations. Input: Sequence of observations x = x 1 x n generated by an HMM M(Σ, Q, A, E) Output: A path that maximizes P(x π) over all possible paths π.

Building Manhattan for Decoding Problem Andrew Viterbi used the Manhattan grid model to solve the Decoding Problem. Every choice of π = π 1 π n corresponds to a path in the graph. The only valid direction in the graph is eastward. This graph has Q 2 (n-1) edges.

Edit Graph for Decoding Problem

Decoding Problem vs. Alignment Problem Valid directions in the alignment problem. Valid directions in the decoding problem.

Decoding Problem as Finding a Longest Path in a DAG The Decoding Problem is reduced to finding a longest path in the directed acyclic graph (DAG) above. Notes: the length of the path is defined as the product of its edges weights, not the sum.

Decoding Problem (cont d) Every path in the graph has the probability P(x π). The Viterbi algorithm finds the path that maximizes P(x π) ) among all possible paths. The Viterbi algorithm runs in O(n Q 2 ) time.

Decoding Problem: weights of edges w (k, i) (l, i+1) The weight w is given by:???

Decoding Problem: weights of edges n P(x π) = Π e π i+1 i=0 w i+1 (x i+1 i+1 ). a πi, π i+1 (k, i) (l, i+1) The weight w is given by:??

Decoding Problem: weights of edges i-th term = e π i+1 i+1 (x i+1 w i+1 ). a πi, π i+1 (k, i) (l, i+1) The weight w is given by:?

Decoding Problem: weights of edges i-th term = e π i (x i ). a πi, π i+1 = e l (x i+1 ). a kl for π i =k, π i+1 w i+1 =l (k, i) (l, i+1) The weight w=e l (x i+1 ). a kl

Decoding Problem and Dynamic Programming s l,i+1 = max k Є Q {s k,i weight of edge between (k,i) and max k Є Q {s k,i k,i and (l,i+1)}= a kl e l (x i+1 ) }= e l (x i+1 ) max max k Є Q {s k,i k,i a kl kl }

Decoding Problem (cont d) Initialization: s begin,0 = 1 s k,0 = 0 for k begin. Let π * be the optimal path. Then, P(x π * ) = max k Є Q {s k,n. a k,end }

Decoding Problem (cont d) Initialization: s begin,0 = 1 s k,0 = 0 for k begin. Let π * be the optimal path. Then, P(x π * ) = max k Є Q {s k,n. a k,end } Is there a problem here?

Viterbi Algorithm The value of the product can become extremely small, which leads to overflowing.

Viterbi Algorithm The value of the product can become extremely small, which leads to overflowing. To avoid overflowing, use log value instead. s k,i+1 = loge l (x i+1 ) + max k Є Q {s k,i + log(a kl )}