# O 3 O 4 O 5. q 3. q 4. Transition

Save this PDF as:

Size: px
Start display at page:

Download "O 3 O 4 O 5. q 3. q 4. Transition" ## Transcription

1 Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in a series of statistical papers by L. E. Baum et al. They began to be used extensively in the area of bioinformatics (to model gene and protein behavior) in the mid 1980 s. General Idea: Similarly to a Markov chain, a hidden Markov model is a statistical model for a sequence of observations which are dependent on each other in a systematical way. A hidden Markov model consists of two parts: a sequence of states q 1, q 2, q 2,... and a sequence of emitted observations O 1, O 2, O 3,.... Generally, it is assumed that the observations can be observed (duh!) but that the states cannot be observed in practice. That s the hidden part of Hidden Markov model. O 1 O 2 O 3 O 4 O 5 Emission q 1 q 2 q 3 q 4 q 5 Transition The state variables q perform transitions from one state to another according to a discrete finite Markov chain. Each state emits observations O from a finite alphabet with a probability distribution that may depend on the state q. To fully describe a hidden Markov model, one needs the following components: A set of N states {S 1, S 2,..., S N }. An alphabet of M distinct observation symbols A = {a 1, a 2,..., a M }. The transition probability matrix for the states P = (p ij ) = P (q t+1 = S j q t = S i ) The emission probabilities (that may depend on the current state q t = S i ). For each state S i and a A define b i (a) = P (S i emits symbol a) these emission probabilities may be arranged in form of an N M matrix B. The initial distribution vector π = (π 1,..., π N ) for the first state. π i = P (q 1 = S i ) 36

2 Example: You have two coins - a fair coin in your right hand and a biased coin that tosses heads with probability 0.8 in your left hand. You will toss the coins repeatedly (always starting with the right hand) switching hands according to a Markov chain with transition probability matrix P (for right, left) and call out the results ( R L ) R P = L (a) What are the states and the observations in this example? Write down the state space and the observation alphabet. (b) How many (free) parameters does this HMM have? Write them down. (c) Suppose the coins are tossed three times and we observe T HT. What is the probability of this outcome for this model? (d) Given this outcome, find the most likely sequence of hidden states that have emitted it. The collection of parameters of a HMM is denoted λ = (P, B, π). In practice, the values of these parameters are usually unknown. Sometimes a structure, for instance the distribution of B may be inferred (apart from parameter values) from the biological setting. 37

3 Today, Hidden Markov models are used very frequently in biological applications. Examples of their use include: Gene Finding: A chromosome consists of regions that code for proteins (coding regions) and regions whose function is today still largely unknown (non-coding regions). However, DNA has a grammatical structure, that is the composition of base pairs is slightly different in coding and non-coding regions. For instance, it is known that the beginning regions of mammalian genes (promotors) are rich in CG combinations. The program Genescan relies on a hidden Markov model to divide a given nucleotide sequence into coding and non-coding regions. Studying copy number variations. During replication or reproduction whole regions in the genome are sometimes deleted or duplicated. That leads to some genes being present more than once in a genome. The number of times a gene is present is called the copy number of that gene. There may be evolutionary pressure for or against genes in higher copy numbers. Copy numbers can vary amongst individuals and may vary even between identical twins. Microarray technology can be used to measure copy number variation and hidden Markov models are used to reconstruct which portions of the genome are duplicated or deleted in a specific individual. Protein Family Characterization: A protein family is a group of evolutionary related proteins. Proteins in the same family often have similar threedimensional structures, similar functions, and similar amino acid sequences. Currently, there are over 60,000 defined protein families. How can a Hidden Markov model help in defining protein families? Starting with training sequences of proteins that have known similarities in sequence and/or function a hidden Markov model is fitted that describes how the sequences evolved from each other. The parameters of the HMM are estimated from the training data. Knowing the statistical properties of the model allows to perform hypothesis tests for new protein structures of unknown family: versus H 0 : new protein does not belong to family H a : new protein does belong to family The architecture of the HMM used for protein family modeling is described in great detail in Krogh et al (J.Mol Biol, 1994). Consider the following four amino acid sequences that may have evolved from a common ancestor: Protein 1 M P L H L T Q D E L D V Protein 2 I P H H F A Q D E L S S Protein 3 I P L H A A Y Q N L S W Protein 4 V V T H M A Q N F V D L 38

4 Evolutionary events that any amino acid in the ancestral sequence may have undergone include Matches - same distribution of amino acids on all sequences Deletion - the amino acid is deleted. Insertion - a new amino acid is inserted between two existing ones Consider the following HMM for a short ancestral sequence with only four amino acids. D1 D2 D3 D4 Begin M1 M2 M3 M4 End I1 I2 I3 I4 I5 Here, the M-states are the match states. Each match state has a distinct distribution for the 20-letter amino acid alphabet. Corresponding to each match state, there is a deletion state D that emits a dummy variable δ (or - that stands for deletion of an amino acid). On either side of a match state there is an insertion state I which generates an amino acid again from the 20-letter amino acid alphabet but with a distribution characterized by the state I. The Begin and End states do not produce emissions. (a) How many states does this HMM have? (b) What are the model parameters P and B in this example? (c) What would the model parameters have to be so that this model produces completely random sequences? Always the same sequence? 39

5 Hidden Markov Models in Gene Finding In most (non-bacterial) genomes, genes are split into several coding and non-coding regions. The beginning and end of coding sequences within the genome are flanked by start and stop codons. The start codon is preceded by a promotor region. This is a DNA sequence to which the RNA polymerase (which facilitates the copying of DNA into RNA) can bind. The DNA sequences at the splice sites, where two different regions meet, are quite characteristic (usually GT at the 5 end and AG at the 3 end) and properties of the nucleotide sequence (for instance CG content) can differ within these regions. Interactions of proteins with promoter sites can block the promoter and effectively make it impossible for the RNA polymerase to bind. This can cause the gene to be (at least temporarily) turned off. This concept will become very important for us, when we study microarray technology and its applications in a few weeks. When DNA is transcribed into RNA, it is first transcribed into pre-mrna which contains both the introns and exons. RNA splicing is the process of removing the introns and joining the exons to form a continuous coding sequence. The goal of a gene finding algorithm is to automatically detect the coding regions, given a long sequence of DNA. The algorithm may classify the regions into upstream, start codon, intron, exon, stop coding, downstream and intergenic regions. There are several different HMM based algorithms whose goal it is to identify coding regions in DNA. Genescan uses the hidden state structure shown in the figure to the right. The states move from non-coding intergenic regions to promoters to start codons and then to one or more exons (interspersed with introns if there is more than one exon). The model specifies the probabilities with which each state (blue circle or red diamond in figure) emits the nucleotides A,T,C,G. (Burge & Karlin, J. Mol. Biol., 1997) 40

6 Statistical Inference for Hidden Markov Models There are three different types of questions that can be asked (and answered) in the context of hidden Markov models. (1) Given the parameters λ = (P, B, π) of the model, efficiently calculate the probability of some given output sequence. One algorithm that can efficiently compute P (O λ) is called the Forward Algorithm. (2) Given the parameters λ = (P, B, π) of the model, find the sequence of hidden states that is most likely to have generated a specific sequence of observations. The algorithm that performs this task is called the Viterbi Algorithm. It finds argmax Q P (Q O) (3) The hardest task is to estimate the values of the model parameters while not knowing the hidden states in order to maximize the likelihood of a given sequence of observations. In effect, both the model parameters and the hidden states have to be estimated from the data in order to make the model likelihood as large as possible. argmax λ P (O λ) This task is sometimes referred to in the literature as Machine Learning. The general solution for this problem is called the EM-Algorithm ( E stands for expectation, and M stands for maximization). The special case of the EM-algorithm for hidden Markov models is called the Baum-Welch method. The Forward Algorithm Recall that the task of the Forward algorithm is to compute the probability of a specific sequence of outputs (assuming the model structure of the HMM is fixed and the model parameters are known). Naively this could be done by using the law of total probability and by considering all possible hidden state sequences Q P (O λ) = all Q P (O Q, λ)p (Q λ) However, considering all possible state sequences that may have lead to an observed emission sequence O quickly makes the search space very large, particularly if the observation sequence is long. It would require summing N T products of 2T terms each (N is the size of the state space and T is the length of the sequence of observations). Thus, an iterative procedure is clearly preferable. 41

7 The Forward Algorithm iteratively computes the joint probability that at time t the Markov chain is in state S i S and that the observations along the way were O 1,..., O t. α i (t) = P (O 1, O 2,..., O t, q t = S i λ), t = 1, 2,..., T This allows for an efficient computation of the terminal probabilities α i (T ) = P (O 1, O 2,..., O T, q T = S i λ) The probability to observe any particular sequence O = (O 1,..., O T ) of emissions can then be written as N P (O) = α i (T ) i=1 This calculation still requires recursive computation of N α s, but it is much less costly than an exhaustive search of all possible state sequences. To recursively compute α i (T ) for i = 1,..., N, we need an initialization α i (1) = We also need an induction step in which α i (t + 1) is formulated as a function of previous α s. Then P (O) = N α i (T ) The Forward algorithm provides a solution for problem (1): Calculating the probability of a given output sequence O. It can be carried out in T N 2 computations. i=1 42

8 Example: Recall the coin tossing example from page 37. In this example, the state space was S = {right, left}, the emissions were A = {H, T } and the given model parameters were ( R L ) ( H T ) R R π = (1, 0), P =, B = L L Use the forward algorithm to compute the probability of observing the sequence THT. The Backward Algorithm In the forward algorithm, we started by considering the possible values of the first state and working our way forward from there. Similarly, one could also start by considering the possible values for the last state q T and work backwards. That is, we find the probability of the ending sequence (O t+1,..., O T ) given the hidden state occupied at time t. Define β i (t) = P (O t+1,..., O T q t = S i, λ) Note, that in this definition, β is a conditional probability whereas the α s were defined as joint probabilities. The initial β is defined as β i (T ) = 1, for all i S The β-terms are now defined recursively backwards in time: β i (t) = P (O t+1,..., O T q t = S i, λ) = N P (O t+1,..., O T, q t+1 = S j q t = S i, λ) j=1 = N P (O t+1,..., O T q t = S i, q t+1 = S j, λ)p (q t+1 = S j q t = S i, λ) j=1 = N P (O t+1,..., O T q t+1 = S j, λ)p (q t+1 = S j q t = S i, λ) j=1 = N P (O t+2,..., O T q t+1 = S j, λ)p (O t+1 q t+1 = S j, λ)p (q t+1 = S j q t = S i, λ) j=1 = N β j (t + 1)b j (O t+1 )p ij j=1 43

9 The Viterbi Algorithm Recall, that the goal of the Viterbi algorithm is to find the most likely sequence of states that have produced a given output sequence. Here, we assume that the parameters λ of the hidden Markov model are known. We want to find argmax Q P (Q O, λ) = argmax Q P (Q, O, λ) Just as the forward and backward algorithms the Viterbi algorithm is defined recursively. Let v i (t) be the probability of the most likely state sequence of the first t observations that ends in state S i. That is The sequences are initialized by defining v i (t) = max 1 j N (P (O t q t = S i )p ji v j (t 1)) v i (1) = P (O 1 q 1 = S i )π i The Viterbi path x 1,..., x T is defined as the sequence of states q t = S j that maximize the v i (t) expression. That is x T = argmax 1 i N v i (T ) x t = argmax 1 j N (P (O t q t = S i )p ji v j (t 1)) As for the forward and backward algorithm, the complexity of this algorithm is O(T N 2 ). Example: Recall the coin tossing example from page 37. In this example, the state space was S = {right, left}, the emissions were A = {H, T } and the given model parameters were ( R L ) ( H T ) R R π = (1, 0), P =, B = L L Use the Viterbi algorithm to find the most likely state sequence that gave rise to the sequence THT. 44

10 The EM-Algorithm Recall, that the goal of the EM-algorithm is to estimate the (possibly numerous) parameters of a hidden Markov model from the sequence of observations without knowing the explicit sequence of hidden states. The original EM-algorithm was developed for maximum likelihood estimation of parameters in probability models with missing data. (Dempster, Laird, Rubin, 1977). Consider the following scenario: We have a probability model that generates observations x. The model has parameters θ. There may be some missing data y. In the hidden Markov model context we have Notation Description HMM analog x observed data output sequence O 1,..., O T y missing data state sequence q 1,..., q T θ model parameters λ = (π, P, B) The likelihood of a set of parameter values θ is defined as the probability to observe a given set of outcomes given those parameter values. L(θ x) = P (x θ) Since joint probabilities are usually computed as products with many terms, it is more convenient in many cases to work with log-likelihood functions rather than with likelihood functions directly. A maximum-likelihood parameter estimate is the value ˆθ that maximizes the likelihood function. The same value will also maximize the log-likelihood function, since log(x) is monotone. That is, we want to find ˆθ to maximize log P (x θ) = log P (x, y θ) y Here the sum should be taken over all possible values of the missing data. We will now provide a heuristic justification (not a strict proof) for how the EM-algorithm works. First, note that and write P (x, y θ) = P (y x, θ) P (x θ) log P (x θ) = log P (x, y θ) log P (y x, θ) Suppose θ t is a current estimate (not necessarily the optimal estimate) of the parameter vector θ. Multiply both sides of the above equation with P (y x, θ t ) and sum over all possible values of y. One the left side of the equation, nothing will change, since log P (x θ)p (y x, θ t ) = log P (x θ) P (y x, θ t ) = log P (x θ) 1 = log P (x θ) y y 45

11 On the other side, we ll have log P (x θ) = y P (y x, θ t ) log P (x, y θ) y P (y x, θ t ) log P (y x, θ) Give the first term in this equation a name Q(θ θ t ) = y P (y x, θ t ) log P (x, y θ) One can show that maximizing Q(θ θ t ) with respect to θ always increases the likelihood function P (x θ) (compared to P (x θ t )). Note, that here Q(θ θ t ) is the expected value of log P (x, y θ) with respect to y. The name of the EM algorithm comes from the two steps that are now carried out in an alternating fashion: Initialization: Pick an initial value for θ, say θ 0. E-step: For the current θ estimate, say θ t, compute Q(θ θ t ), that is E y [log P (x, y θ t )]. M-step: Maximize Q(θ θ t ) with respect to θ. Call the new argmax θ t+1. Repeat: Check whether termination criterion is met and if not go back to E-step with updated θ t+1. The EM-algorithm can be shown to converge to a local maximum of the likelihood function P (x θ). Note, that a local maximum is not necessarily equal to the global maximum. It is usually a good idea to restart the algorithm with different initial values θ 0. The algorithm should be terminated either after a fixed number of steps or after the likelihood function does not change much anymore (% change less than some threshold). The Baum-Welch Algorithm In the specific context of hidden Markov models the observed data are the outputs O = (O 1,..., O T ), the missing data are the states q = (q 1,..., q T ) and the model parameters are λ = (π, P, B). Next, we need to introduce two new quantities. Define γ i (t) = P (q t = S i O, λ) Note, that because of the Markov property we can write P (O, q t = S i λ) = 46

12 so that γ i (t) = Also define In a similar way one can show that ξ ij (t) = P (q t = S i, q t+1 = S j O, λ) ξ ij (t) = α i (t)p ij b j (O t+1 )β j (t + 1) N N α i (t)p ij b j (O t+1 )β j (t + 1) i=1 j=1 Note, that both γ i (t) as well as ξ ij (t) involve the forward probabilities α i (t) as well as the backward probabilities β j (t). Why do we need all this complicated notation? Let s take a look for what the two new parameters γ and ξ represent. γ i (t) is the probability that the hidden state is S i at time t, given the complete sequence of observations and the current set of parameters. ξ ij (t) is the probability that from time t to time t + 1 the hidden states will transition from i to j, given the complete sequence of observations and current set of parameters. To estimate the initial distribution π we would like an estimate for the probability that the hidden state is in state i at time t = 1. Note, that ˆπ i = γ i (1) = P (q 1 = S i O, λ) where λ is the current set of parameter estimates. Furthermore, T 1 t=1 T 1 t=1 Hence γ i (t) is the expected number of times the hidden chain will be in state i, whereas ξ ij (t) is the expected number of times the hidden chain transitions from i to j. Finally, ˆp ij = ˆbj (o) = T 1 t=1 T 1 ξ ij (t) γ i (t) t=1 O t=o γ j (t) T γ j (t) t=1 47

13 Initialization: Pick an initial value for λ, say λ 0. Often, the entries in π, P, and B are chosen to all be equally likely. If the entries are chosen to be zero, they will remain zero during the algorithm. E-step: For the current λ estimate, say λ t, compute first α i (t), β i (t) (for i = 1,..., N and t = 1,... T ). Then use them to compute γ i (t) and ξ ij (t) (for i, j = 1,..., N and t = 1,..., T ). M-step: Compute ˆπ i, ˆp ij, and ˆb i (o) for i, j = 1,..., N and o A. Repeat: Compute model likelihood P (O λ). Check, whether termination criterion is met and if not go back to E-step with updated λ t+1. The HMM package in R For hidden Markov models numerous R packages have become available in the first decade of this century for simulation of data and execution of the algorithms we discussed (Forward, Backward, Viterbi, Baum-Welch, EM). The most prominent examples of such packages are HMM andhiddenmarkov. The different packages have many overlapping functionalities. Example: The HMM package can simulate data from a hidden Markov model with specified parameters. Let the state space be S = {1, 2} and the emission alphabet A = {a, b, c}. Specify the initial state distribution π, the transition probability matrix P and the emission probability matrix B as follows: ( ) ( ) π = (0.5, 0.5), P =, B = In R, one initiates (i.e., defines parameters, state space, and emission alphabet) with the command inithmm(). Then, the command simhmm() can be used to simulate a sequence of n observations generated from the HMM previously defined. The results are random, that means if you repeat the command, you may get different values. 48

14 Given the parameters of a HMM (in the form of an inithmm() object) the forward() algorithm finds the probability of observing a specific output sequence together with the state of the last observed output. The probabilities are reported on a log-scale. The backward() algorithm finds the probabilities of observing future outputs, given the hidden state at time t and complete sequence of observations. Recall, that the viterbi() algorithm finds the most likely sequence of states that generated a given sequence of observations. The user has to provide model parameters in the form of an inithmm() object for this algorithm. The baumwelch() algorithm finds both the estimated state sequence as well as estimates of the parameters (π, P, B) of the model, given an output sequence and initial parameter estimates. The initial parameter estimates are often noninformative matrices, in which every transition or emission is equally likely. 49

### Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

### CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

### An Introduction to Bioinformatics Algorithms Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

### Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

### Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

### Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

### Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

### Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

### Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

### 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic

More information

### Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

### Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

### Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

### VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16 VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by

More information

### HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

### An Introduction to Bioinformatics Algorithms Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

### Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet \$. You roll 3. Casino player rolls 4. Highest number wins \$ The casino has two dice: Fair die P() = P() = P(3)

More information

### Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

### Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

### Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

### Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

### HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

### Hidden Markov Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

More information

### Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

### Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

### Hidden Markov Models and Applications. Spring 2018 February 22,27, 2018 Hidden Markov Models and Applications Spring 2018 February 22,27, 2018 Gene finding in prokaryotes Reading frames A protein is coded by groups of three nucleotides (codons): ACGTACGTACGTACGT ACG-TAC-GTA-CGT-ACG-T

More information

### Hidden Markov Models. Three classic HMM problems An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

More information

### Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

### CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

### HMM: Parameter Estimation I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

More information

### Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

### Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

### 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

More information

### 11.3 Decoding Algorithm 11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence

More information

### Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

### Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

### L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

### Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

### Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

### 1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

### Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

### CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

### Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

### Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

### Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

### Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

### Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

### Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

### 6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

### order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

### Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

More information

### Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

### Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

### Hidden Markov Models 1 Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

More information

### Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

### Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

### Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 3.10, hard-copy). L. Rabiner, "A tutorial on HMMs and selected

More information

### Hidden Markov Models (HMMs) November 14, 2017 Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

More information

### Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

### CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

### A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

### Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1 CpG Islands Regions in DNA sequences with increased

More information

### Chapter 4: Hidden Markov Models Chapter 4: Hidden Markov Models 4.1 Introduction to HMM Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Markov models of sequence structures Introduction to Hidden Markov

More information

### Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

### STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

### Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

### Parametric Models Part III: Hidden Markov Models Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent

More information

### ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY BIOINFORMATICS Lecture 11-12 Hidden Markov Models ROBI POLIKAR 2011, All Rights Reserved, Robi Polikar. IGNAL PROCESSING & PATTERN RECOGNITION LABORATORY @ ROWAN UNIVERSITY These lecture notes are prepared

More information

### Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

### Lecture 9. Intro to Hidden Markov Models (finish up) Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is

More information

### Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

### BMI/CS 576 Fall 2016 Final Exam BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.

More information

### 1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

### Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

### Genome 373: Hidden Markov Models II. Doug Fowler Genome 373: Hidden Markov Models II Doug Fowler Review From Hidden Markov Models I What does a Markov model describe? Review From Hidden Markov Models I A T A Markov model describes a random process of

More information

### Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models. , I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding

More information

### Hidden Markov Methods. Algorithms and Implementation Hidden Markov Methods. Algorithms and Implementation Final Project Report. MATH 127. Nasser M. Abbasi Course taken during Fall 2002 page compiled on July 2, 2015 at 12:08am Contents 1 Example HMM 5 2 Forward

More information

### Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

### HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

### Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

### GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

### CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

### Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

### Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of

More information

### Lab 3: Practical Hidden Markov Models (HMM) Advanced Topics in Bioinformatics Lab 3: Practical Hidden Markov Models () Maoying, Wu Department of Bioinformatics & Biostatistics Shanghai Jiao Tong University November 27, 2014 Hidden Markov Models

More information

### DNA Feature Sensors. B. Majoros DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length

More information

### Hidden Markov Models Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content

More information

### 3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

### Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

### Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

### Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

### Hidden Markov Models. Hosein Mohimani GHC7717 Hidden Markov Models Hosein Mohimani GHC7717 hoseinm@andrew.cmu.edu Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or

More information

### p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

### Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

### CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

### CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information