Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Similar documents
An Introduction to Bioinformatics Algorithms Hidden Markov Models

HIDDEN MARKOV MODELS

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models 1

Hidden Markov Models

11.3 Decoding Algorithm

Hidden Markov Models for biological sequence analysis

Hidden Markov Models

Hidden Markov Models. Hosein Mohimani GHC7717

Stephen Scott.

Hidden Markov Models for biological sequence analysis I

L23: hidden Markov models

CSCE 471/871 Lecture 3: Markov Chains and

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Lecture 6: Entropy Rate

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models (HMMs) November 14, 2017

Data Mining in Bioinformatics HMM

Advanced Data Science

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models and some applications

Bioinformatics: Biology X

6 Markov Chains and Hidden Markov Models

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

HMM: Parameter Estimation

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models. x 1 x 2 x 3 x K

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

8: Hidden Markov Models

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models and some applications

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CS711008Z Algorithm Design and Analysis

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Lecture 9. Intro to Hidden Markov Models (finish up)

O 3 O 4 O 5. q 3. q 4. Transition

Chapter 4: Hidden Markov Models

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Statistical Problem. . We may have an underlying evolving system. (new state) = f(old state, noise) Input data: series of observations X 1, X 2 X t

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. x 1 x 2 x 3 x N

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Models (HMMs)

Naïve Bayes classification

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Biology 644: Bioinformatics

Hidden Markov Models NIKOLAY YAKOVETS

Hidden Markov Models

Brief Introduction of Machine Learning Techniques for Content Analysis

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Statistical Machine Learning from Data

Hidden Markov Models (I)

Hidden Markov Models. x 1 x 2 x 3 x K

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

HMM part 1. Dr Philip Jackson

Hidden Markov Models. Ron Shamir, CG 08

Bayesian Networks BY: MOHAMAD ALSABBAGH

order is number of previous outputs

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Lecture #5. Dependencies along the genome

Hidden Markov Models Part 2: Algorithms

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Parametric Models Part III: Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

HMMs and biological sequence analysis

2 : Directed GMs: Bayesian Networks

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Markov chains and Hidden Markov Models

STA 414/2104: Machine Learning

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Hidden Markov Models

Sequential Data and Markov Models

Statistical Sequence Recognition and Training: An Introduction to HMMs

EECS730: Introduction to Bioinformatics

Statistical Methods for NLP

Genome 373: Hidden Markov Models II. Doug Fowler

STA 4273H: Statistical Machine Learning

Introduction to Artificial Intelligence (AI)

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Hidden Markov Models

Hidden Markov Models. Terminology, Representation and Basic Problems

CS 7180: Behavioral Modeling and Decision- making in AI

Dynamic Approaches: The Hidden Markov Model

Pairwise alignment using HMMs

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Lecture 11: Hidden Markov Models

Transcription:

Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search

(Hidden) Markov models

Why consider probabilistics models of sequences?! Classification! Machine learning! Data mining which category (family, ) the sequence belongs to?! Estimating likelihood of an observation! Simulating temporal processes! Average-case analysis of algorithms!

Knowledge assumptions! Basics of probability theory! Conditional probability! Bayes formula P(A B) = P(A B) = P(B A) P(A) P(B) P(A & B) P(B)

Probabilistic model of sequences! Simplest model: all letters are i.i.d. Bernoulli distributed random variables e.g. P(A)=P(C)=P(G)=P(T)=0.25 or P(A)=P(T)=0.2 and P(C)=P(G)=0.3

Markov chains (models)! Markov chain of order k: P(x i ) depends on x i-1, x i-2,, x i-k А.А.Марков (1856-1922)

Markov chains: example! Ex: assume three letters {Rainy,Cloudy, Sunny}, and a Markov chain of order 1 (first order): 0.4 0.3 0.6 Rainy 0.3 0.1 0.2 Sunny 0.1 Cloudy 0.2 0.8

Markov chains (cont) If the weather today is Sunny, than the proba of following S-S-R-R-S-C-S is P[ SSSRRSCS Model] = = P[S] P[S S] 2 P[R S] P[R R] P[S R] P[C S] P[S C] =1 (0.8) 2 (0.1)(0.4)(0.3)(0.1)(0.2) 1.536 10 4 Example: given that today is Cloudy, what is the proba that it will be Rainy the day after tomorrow?

Markov chains (cont)! Given that the model is in a known state, what is the probability it stays in that state for exactly d days?! The answer is! Thus the expected number of consecutive days in the same state is! So the expected number of consecutive Sunny days, according to the model is 5.

Hidden Markov models! at each moment the model is at one of the hidden states (finite number)! each hidden state holds a (Bernoulli) distribution for emitting letters (emission probabilities)! switching between hidden states is defined by transition probabilities! Example: you don't know the weather (S,C,R) but you observe if the person you see carries an umbrella, and P(umbrella S)=0.05 P(umbrella C)=0.2 P(umbrella R)=0.9

CpG-Islands

Why CpG-Islands? By CFCF - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30029083

CpG Islands and the Fair Bet Casino! The CG islands problem can be modeled after a problem named The Fair Bet Casino! The game is to flip coins, which results in only two possible outcomes: Head or Tail! The Fair coin will give Heads and Tails with same probability ½ : P(H F) = P(T F) = ½! The Biased coin will give Heads with probability ¾ : P(H B) = ¾, P(T B) = ¼! The dealer changes between Fair and Biased coins with probability 0.1

The Fair Bet Casino Problem! Input: sequence x = x 1 x 2 x 3 x n of coin tosses made by two possible coins (F or B)! Output: sequence π = π 1 π 2 π 3 π n, with each π i being either F or B indicating that x i is the result of tossing the Fair or Biased coin respectively

Decoding problem! Any observed outcome of coin tosses could have been generated by any sequence of states! Goal: compute the most likely sequence π producing x, i.e. π maximizing P(π x)! This problem is called the decoding problem

Warm-up: what if the coin stays the same?! Assume that the dealer never changes the coin! P(x F): probability of the outcome x provided that the dealer uses the F coin all along! P(x B): same if the dealer uses the B coin! P(x F)=P(x 1 x n F)=Π i=1,n P(x i F)= (1/2) n! P(x B)=P(x 1 x n B)=(3/4) k (1/4) n-k = 3 k /4 n where k is the number of Heads in x

What if the coin stays the same? (cont)! P(x F)=P(x B) (1/2) n =3 k /4 n k = n / log 2 3 (k ~ 0.63n)! We can compute the log-odds ratio to measure the discrimination of F vs B: log 2 (P(x F)/ P(x B)) = n k log 2 3

Hidden Markov Model (HMM)! Can be viewed as an abstract machine with k hidden states that emits symbols (observations) from an alphabet Σ! Each state has its own probability distribution of moving to another state (transition probabilities). Altogether, they define a Markov chain on the states! Each state has a probability distribution of emitting symbols of Σ (emission probabilities)! While in a certain state, the machine randomly decides:! what is the next state! what symbol is emitted

HMM Parameters

HMM Parameters (cont d)

Summary: HMM for Fair Bet Casino Fair Biased Tails(0) Heads(1) Fair Biased Fair Biased

HMM for Fair Bet Casino (cont) F B H F H F

Hidden Paths! A path π = π 1 π n in the HMM is defined as a sequence of states.! Consider path π = FFFBBBBBFFF and sequence x =THTHHHTHTTH

P(x π) Calculation! P(x π): Probability that sequence x was generated by the path π: P(x π) = Π P(x i π i ) P(π i-1 π i ) assuming that P(π 0 π 1 ) is the probability P(π 1 ) of π 1 to be the starting state

Decoding Problem! Goal: Find an "optimal" (most likely) hidden path of states given observations.! Input: Sequence of observations x = x 1 x n generated by an HMM M(Σ, Q, A, E)! Output: A path that maximizes P(x π) over all possible paths π=π 1 π n.

Viterbi algorithm (1967)! Consider prefix x 1 x i! For each hidden state π i =l, let s l,i be the maximum probability (over i-1 previous states) to observe x 1 x i and arrive to state l! Why computing s l,i? Assume that the sequence of states π* that realizes max{s l,n l Q}! observe that max π P(π x)=max π P(π and x)/p(x)! the π* is the most likely decoding

Viterbi algorithm (1967)! Consider prefix x 1 x i! For each hidden state π i =l, let s l,i be the maximum probability (over i-1 previous states) to observe x 1 x i and arrive to state l! Why computing s l,i? Assume that the sequence of states π* that realizes max{s l,n l Q}! observe that max π P(π x)=max π P(π and x)/p(x)! the π* is the most likely decoding! How to compute s l,i? By dynamic programming! s l,i =max{s k,i-1 a kl e l (x i ) k Q}

DP implementation! Consider the graph

DP implementation! Every choice of π = π 1 π n corresponds to a path in the graph.! This graph has Q 2 n edges! Initialization: s l,0 = probability for the model to start from l! DP recurrence: s l,i =max{s k,i-1 a kl e l (x i) k Q}! Resulting path π is retrieved by "backtracing" starting from node argmax{s l,n l Q}! time complexity O( Q 2 n)

Decoding Problem vs. Alignment Problem

Decoding Problem as Finding a heaviest Path in a DAG! The Decoding Problem can be reduced to finding a heaviest path in the directed acyclic graph (DAG)! Note: the weight of the path is defined as the product of its edges weights, not the sum

Computer arithmetic problems

Example! Two hidden states: raining, not-raining! Proba to stay in the same state is 0.7, to change 0.3! Probabilities modelling the person's behaviour:! The initial probability of raining is 0.5! Question: what is the most likely sequence of hidden states for (umbrella, umbrella, no umbrella)?

Many applications! speech recognition! handwriting recognition! computational finance!! bioinformatics! gene prediction! protein classification! protein secondary structure and protein folding! DNA motif discovery (binding sites)!.

HMM in speech recognition from [Gales&Young, Foundations and Trends in Signal Processing, 2007]

Main problems for HMMs

Computing the probability of x (exercise)

Forward-Backward Problem Given: a sequence of coin tosses π = π 1 π n generated by an HMM. Goal: compute the probability that the dealer was using a biased coin at a particular time. In general: Given: a sequence x = x 1 x n Goal: find the probability P(π i = k x)

Plan of the computation P(π i = k x) = P(x,π i = k) P(x) = f k (i) b k (i) P(x) k P(π i = k x) =1

Forward algorithm forward probability f k (i)=p(x 1 x i, π i = k) dynamic programming again! the recurrence for the forward algorithm: f k (i) = e k (x i ) l Q f l (i 1) a lk base case: f k (1) = p 0 (k) e k (x 1 )

Backward algorithm However, forward probability is not the only factor affecting P(π i = k x). The sequence of transitions and emissions that the HMM undergoes between π i+1 and π n also affects P(π i = k x). forward x i backward

Backward algorithm (cont) backward probability b k (i)=p(π i = k, x i+1 x n ) dynamic programming (of course ) the recurrence for the backward algorithm: b k (i) = l Q e l (x i+1 ) b l (i +1) a kl base case: bk (n 1) = a kl e l (x n ) l Q

Forward-Backward algorithm! The probability that the dealer used a biased coin at a moment i: P(π i = k x) = P(x,π i = k) P(x) = f k (i) b k (i) P(x)! P(x) can be recovered from the fact that k P(π i = k x) =1! Remark: FB algorithm cannot replace Viterbi algorithm

Example (cont)! In the example raining not-raining, what is the probability that it was not raining on day 2 if the observations are (umbrella, umbrella, no umbrella)?