Hidden Markov Models 1

Similar documents
An Introduction to Bioinformatics Algorithms Hidden Markov Models

HIDDEN MARKOV MODELS

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models. Hosein Mohimani GHC7717

11.3 Decoding Algorithm

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models for biological sequence analysis

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. Ron Shamir, CG 08

CS711008Z Algorithm Design and Analysis

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Stephen Scott.

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models (I)

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

2 : Directed GMs: Bayesian Networks

O 3 O 4 O 5. q 3. q 4. Transition

HMM: Parameter Estimation

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Lecture 9. Intro to Hidden Markov Models (finish up)

CSCE 471/871 Lecture 3: Markov Chains and

Introduction to Machine Learning CMU-10701

Chapter 4: Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

HMMs and biological sequence analysis

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models

11 Hidden Markov Models

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Bioinformatics: Biology X

Today s Lecture: HMMs

Hidden Markov Models. x 1 x 2 x 3 x K

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Hidden Markov Models and some applications

Hidden Markov Models and some applications

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Advanced Data Science

2 : Directed GMs: Bayesian Networks

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Introduction to Artificial Intelligence Midterm 2. CS 188 Spring You have approximately 2 hours and 50 minutes.

Biology 644: Bioinformatics

Bioinformatics 2 - Lecture 4

Pair Hidden Markov Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

8: Hidden Markov Models

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Undirected Graphical Models

order is number of previous outputs

Hidden Markov Models (HMMs)

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

6 Markov Chains and Hidden Markov Models

1 Ways to Describe a Stochastic Process

CS 361: Probability & Statistics

Final Examination CS 540-2: Introduction to Artificial Intelligence

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models. x 1 x 2 x 3 x N

EECS730: Introduction to Bioinformatics

8: Hidden Markov Models

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. Terminology and Basic Algorithms

L23: hidden Markov models

Bio nformatics. Lecture 3. Saad Mneimneh

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Qualifier: CS 6375 Machine Learning Spring 2015

Machine Learning, Midterm Exam

Pairwise alignment using HMMs

Hidden Markov Methods. Algorithms and Implementation

STA 4273H: Statistical Machine Learning

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Hidden Markov Models. Terminology, Representation and Basic Problems

Information Theory (Information Theory by J. V. Stone, 2015)

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

Computational Genomics and Molecular Biology, Fall

Lecture #5. Dependencies along the genome

Introduction to Hidden Markov Models (HMMs)

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Linear Dynamical Systems

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Statistical Methods for NLP

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Directed Probabilistic Graphical Models CMSC 678 UMBC

Transcription:

Hidden Markov Models

Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one would expect that the probability of occurrence of any given dinucleotide is. 6 However, the frequencies of dinucleotides in DNA sequences vary widely. In particular, CG is typically underepresented (frequency of CG is typically < ) 6 2

Example From a 29829 base sequence Expected value 0.0625 CG is 7 times smaller than expected 3

Why so few CGs? CG is the least frequent dinucleotide because C in CG is easily methylated. And, methylated Cs are easily mutated into Ts. However, methylation is suppressed around genes and transcription factor binding sites So, CG appears at relatively higher frequency in these important areas These localized areas of higher CG frequency are called CG-islands Finding the CG islands within a genome is among the most reliable gene finding approaches 4

CG Island Analogy The CG islands problem can be modeled by a toy problem named "The Fair Bet Casino" The outcome of the game is determined by coin flips with two possible outcomes: Heads or Tails However, there are two different coins A Fair coin: Heads and Tails with same probability. 2 The Biased coin: 3 Heads with prob., 4 4 Tails with prob.. 5

The "Fair Bet Casino" Thus, we define the probabilities: P(H Fair) =, P(T Fair) = 2 3 P(H Bias) =, P(T Bias) = 4 2 4 The house doesn t want to get caught switching between coins, so they do so infrequently Changes between Fair and Biased coins with probability 0% 6

Fair Bet Casino Problem Input: A sequence x = x, x 2, x 3,, x n of observed coin tosses made by some combination of the two possible coins (F or B). Output: A sequence π = π, π 2, π 3,, π n, with each π i being either F or B indicating that x i is the result of tossing the Fair or Biased coin respectively. 7

Problem Subtleties Any observed outcome of coin tosses could have been generated by any combination of coin exchanges However, all coin-exchange combinations are not equally likely. Tosses: T H H H H Coins: F F F F F P(Tosses Coins) = /2 /2 /2 /2 /2 = /32 Coins2: B B B B B P(Tosses Coins2) = /4 3/4 3/4 3/4 3/4 = 8/024 Coins3: F F B B B P(Tosses Coins3) = /2 /2 3/4 3/4 3/4 = 27/256 We ask, "What coin-exchange combination has the highest probability of generating the observed series of tosses?" The coin tosses are a signal, and figuring out the most likely coin-exchange sequence is a Decoding Problem 8

Let's consider the extreme cases Suppose that the dealer never exchanges coins. Some definitions: P(x Fair): prob. of generating the x using the Fair coin. P(x Biased): prob. of generating x using the Biased coin. 9

P(x fair coin) vs. P(x biased coin) P(x Fair) = P( x x n Fair) = p( x i Fair) = ( ) i=,n 2 n P(x Biased) = P( x x n Biased) = 3 p( x i Biased) = = ( ) i=,n 4 ( 4 ) k n k 3 k 4 n Where k is the number of Heads observed in x 0

P(x Fair coin) = P(x Biased coin) When is a sequence equally likely to have come from the Fair or Biased coin? when k = n log 2 3 (k 0.63n) P(x Fair) ( 2 ) n 2 n n = P(x Biased) = 3k 4 n = 3 k = k 3 log 2 So when the number of heads over a contiguous sequence of tosses is greater than 63% the dealer is most likely used the biased coin

Log-odds Ratio We can define the log-odds ratio as follows: P(x Fair) log 2 ( P(x Biased) ) = k i= P( x i Fair) log 2 ( P( Biased) ) = n k log 2 3 x i The log-odds ratio is a means (threshold) for deciding which of two alternative hypotheses is most likely "Zero-crossing" measure: If the log-odds ratio is > 0 then the numerator (Fair coin) is more likely if the log-odds ratio is < 0 then the denominator (Biased coin) is more likely They are equally likely if the log-odds ratio = 0 2

Log-odds over a sliding window Given a sequence of length n, consider the log-odds ratio of a sliding window of length w << n x, x 2, x 3, x 4, x 5, x 6, x 7, x,, 8 x 9 x n Disadvantages: The length of CG-island (appropriate window size) is not known in advance Different window sizes may classify the same position differently What about the rule that they don't swap out the coins frequently? 3

Key Elements of the Problem There is an unknown or hidden state for each observation (Was the coin the Fair or Biased?) Outcomes are modeled probabilistically: P(H Fair) =, P(T Fair) = 2 2 3 P(H Bias) =, P(T Bias) = 4 4 Transitions between states are modeled probabilistically: P( π i = Bias π i = Bias) = a BB = 0.9 P( π i = Bias π i = Fair) = a FB = 0. P( π i = Fair π i = Bias) = a BF = 0. P( π i = Fair π i = Fair) = a FF = 0.9 4

Hidden Markov Model (HMM) A generalization of this class of problem Can be viewed as an abstract machine with k hidden states that emits symbols from an alphabet Σ. Each state emits outputs with its own probability distribution, and the machine switches between states according to some other probability distribution. While in a certain state, the machine makes 2 decisions: What symbol from the alphabet Σ should I emit? What state should I move to next? 5

Why "Hidden"? Observers see the emitted symbols of an HMM but cannot see which state the HMM is currently in. Thus, the goal is to infer the most likely hidden states of an HMM based on the given sequence of emitted symbols. 6

HMM Parameters Σ: set of emission characters. Example: Σ = {0, } for coin tossing (0 for Tails and Heads) Σ = {, 2, 3, 4, 5, 6} for dice tossing Q: set of hidden states, emitting symbols from Σ. Q = {Fair,Bias} for coin tossing 7

HMM for Fair Bet Casino The Fair Bet Casino in HMM terms: Σ = {0, } (0 for Tails and Heads) Q = {F,B} F for Fair & B for Biased coin Transition Probabilities A, Emission Probabilities E 8

HMM as a Graphical Model Directed graph with two types nodes and two types of edges hidden states are shown as squares emission outputs are shown as circles transition edges emission edges 9

Hidden Paths A path π = π π in the HMM is defined as a sequence of hidden states. n Consider path π = FFFBBBBBFFF sequence x = 00000 x π = = 0 F F 0 F B B B 0 B B 0 F 0 F F P( x i π i ) P( π i π i+ ) 2 9 0 2 9 0 2 0 3 4 9 0 3 4 9 0 3 4 9 0 4 9 0 3 4 0 2 9 0 2 9 0 2 What is the probability of the given path ( FFFBBBBBFFF )? 20

P(x π) calculation P(x π): Probability that sequence x was generated by the path π: P(x π) = P( x i π i ) P( π i π i+ ) 2 n n i= n E πi, x i A, i= = πi π i+ How many such paths exist? What algorithmic approach would you use to find the best path? Branch and Bound? Divide and Conquer? Dynamic Programming? 2

Decoding Problem Finding the optimal path in a graph is equivalent to a classical problem of decoding a message using constellations. This is very commonly used when a discrete set of symbols is encoded for transport over an analog medium (e.x. modems, wired internet, wireless internet, digital television). A simple binary coding does not make good use of the dynamic range of a digital signal, however, if you put the codes too close noise becomes a problem. Goal: Find an optimal hidden path of state transitions given a set of observations. Input: Sequence of observations x = x xn generated by an HMM M(Σ, Q, A, E) Output: A path that maximizes P(x π) over all possible paths π. 22

How do we solve this? Brute Force approach: Enumerate every possible path Compute P(x..n π..n) for each one Keep track of the most probable path A better approach: Break any path in two parts, P( x..i π..i ), P( x i..n π i..n ) P( x..n π..n ) = P( x..i π..i ) P( x i..n π i..n ) Will less than the highest P( x..i π..i ) ever improve the total probability? P( x..n π..n ) P( x..i π..i ) Thus to find the maximum we need find the maximum of each subproblem, for i from to n What algorithm design approach does this remind you of? 23

Building Manhattan for Decoding In 967, Andrew Viterbi developed a Manhattan-like grid (Dynamic program) model to solve the Decoding Problem. Every choice of π = π, π 2, π n corresponds to a path in the graph. The only valid direction in the graph is eastward. This graph has Q (n ) 2 edges. 24

Edit Graph for Decoding Problem 25

Viterbi Decoding of Fair-Bet Casino Each vertex represents a possible state at a given position in the output sequence The observed sequence conditions the likelihood of each state Dynamic programming reduces search space to: Q +transition_edges (n-) = 2+4 5 from naïve 26 26

Decoding Problem Solution The Decoding Problem is equivalent to finding a longest path in the directed acyclic graph (DAG), where "longest" is defined as the maximum product of the probabilities along the path. 27

Next Time We'll find the DP recurrance equations See examples and what it looks like in code See how truth and maximum likelihood do not always agree Apply to HMMs to problems of biological interest. 28