# HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Size: px
Start display at page:

Transcription

1 I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017

2 Content Consideration of two aligned sequences TWINSCAN Generalization to approaches for modeling multiple, aligned sequences Phylo-HMM Definition of phylo-hmm Pruning algorithm for calculating the probability of a column (of the multiple alignment) given a phylogenetic model Multivariate HMM

3 TWINSCAN model for gene finding in human and mouse genomes TWINSCAN is an augmented version of the GHMM used in Genscan. Input: syntenic regions in human and mouse genome assumption: the gene structure (exon/intron boundaries) is conserved in these two genomes, and the conserved boundaries are aligned precisely in the pairwise genome alignment Output: the annotation of gene structure

4 TWINSCAN model Human: ACGGCGACUGUGCACGU Mouse: ACUGUGAC GUGCACUU Alignment: : : - : intron exon Why using multiple sequences? 1)multiple sequences gives strong signal (e.g. if a sequence profile of a splicing site is preserved, it is more likely to be a true splicing site); and more importantly, 2)the conservation pattern can be used to discriminate exons and introns: exons tend to be more conserved than introns.

5 TWINSCAN algorithm The key idea: converting a pairwise alignment into a single observation sequence on an expanded alphabet 1. Align the two sequences (e.g. from human and mouse genomes); 2. Use the similar hidden states as Genscan; 3. Design a new alphabet for observation symbols: 4 x 3 = 12 symbols: S = { A-, A:, A, C-, C:, C, G-, G:, G, U-, U:, U } gap ( - ), mismatch ( : ), match ( )

6 Example Human: ACGGCGACUGUGCACGU Mouse: ACUGUGAC GUGCACUU Alignment: : : - : Input to TWINSCAN HMM (observation sequence) A C G: G C: G A C U- G U G C A C G: U Recall, e E (A ) > e I (A ) and e E (A-) < e I (A-) Likely exon will be annotated for the entire region

7 N-SCAN GHMM in TWINSCAN outputs a target genomic sequence and a conservation sequence GHMM in N-SCAN outputs a target genomic sequence and N informant sequences N-SCAN uses Bayesian network representation to calculate the probability of a column Target sequence Using multiple alignments to improve gene prediction. JCB, 2006

8 HMM for multiple aligned sequences Strategy 1: converting the alignment of multiple observation sequences into one observation sequence New alphabet representing convoluted observation symbols, e.g., the TWINSCAN model Not practical for n sequences: the size of the alphabet grows exponentially with O(2 n ). Strategy 2: employing another probabilistic model to emit multiple aligned observation sequences simultaneously (Phylo-HMM model) Strategy 3: emitting multiple aligned observation sequences simultaneously but independently, each following a different emission probability distribution (multivariate HMM)

9 Phylo-HMMs: model multiple alignments of syntenic sequences A phylo-hmm is a probabilistic machine that generates a multiple alignment, column by column, such that each column is defined by a phylogenetic model Unlike single-sequence HMMs, the emission probabilities of phylo-hmms are complex distributions defined by phylogenetic models Molecular evolution can be viewed as a combination of two Markov processes One operates in the dimension of space (along a genome) One operates in the dimension of time (along the branches of a phylogenetic tree) Phylo-HMMs combine phylogeny and HMM

10 Single-sequence HMM Phylo-HMM

11 Phylo-HMMs: formal definition A phylo-hmm can be specified as =(S,, A, b), 1) S = {S 1,S 2,,S M }, a set of states 2) = { 1, 2,, M }, a set of associated phylogenetic models 3) A = {a jk }(1 apple j, k apple M), a matrix of state-transition probabilities 4) b =(b 1,,b M ), a vector of state-initial probabilities a jk is the conditional probability of visiting state k at some site i given that state l is visited at site i 1. b j is the probability that state j is visited first.

12 Questions we can ask using phylo-hmm A path through the phylo-hmm is a sequence of states =( 1,, M), such that i 2 {1,,M} for all 1 apple i apple L. The joint probability of a path and an alignment is, p(,x ) =b 1 P (X 1 1 ) The probability of the observation (likelihood) is, LY a i 1, P (X i i i ) i=2 p(x ) = X P (,X ) The probability of emitting column i given a phylogenetic model The most probable (maxmum-likelihood) path, ˆ = arg max P (,X ) Forward algorithm Viterbi algorithm

13 Phylogenetic models The different phylogenetic models associated with the states of a phylo-hmm may reflect different overall rates of substitution (e.g. in conserved and nonconserved regions), different patterns of substitution or background distributions, or even different tree topologies (as with recombination)

14 Phylogenetic models ψ j = (Q j,π j,τ j, β j ) Q j p j t j b j : substitution rate matrix : background frequencies : binary tree : branch lengths HKY model The model is defined with respect to an alphabet S of size d The substitution rate matrix has dimension d x d The background frequencies vector has dimension d The tree has n leaves, corresponding to n extant taxa in the multiple alignment of observation sequences The branch lengths are associated with the tree How to calculate the likelihood of a column given a model?

15 Brute force approach to likelihood calculation Given a column AACT Brute force strategy: Try all 16 combinations of ancestral states and sum

16 Pruning algorithm Many calculations can be done just once, and then reused many times *The pruning algorithm was introduced by: Felsenstein, J Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: Paul Lewis

17 Pruning algorithm P(X Ψ) (X: a column; Ψ: phylogenetic model; subscript not shown for clarity) can be computed recursively from leaves to root. L i (x i ): the probability of observing leaves in the subtree rooted by i, while the root is assigned to x i (A, T, C or G), L i " \$ ( x i ) = # p xi x j t j %\$ x j ( )L j x j ( ) & \$ " ' (\$ \$ # p x i x k ( t k )L k x k %\$ x k Sum over all possible x j (A, T, C, or G) ( ) j & k: offspring nodes of node i t j & t k: the branch lengths p a,b (t): the probability to observe a substitution from a to b within the evolution time of t. The total probability at the root node r: P(X ψ) = L r x r & \$ ' (\$ x r j t j x j i x i ( ) t k k x k

18 Substitution probabilities Pruning algorithm requires the conditional probabilities of substitution p a,b (t) for all bases a,b Î S and branch lengths t Î b Another way to denote this probability: P(b a,t,ψ) It can be computed using a continuous-time Markov model of substitution, defined by the rate matrix Q P(t) = (e Q ) t = e Qt (matrix multiplication)

19 Example Substitution matrices: P(t=0.1) P(t=0.2) T C A G Branch lengths: t 1 =t 2 =t 3 =t 4 =t 5 =0.2; t 6 =t 7 =t 8 = 0.1. ( ) P(X i ψ j ) = P TCACC T,t 1,t 2,t 3,t 4,t 5,t 6,t 7,t 8,κ = π x0 P x0 x 6 ( t 6 )P x0 x 8 ( t 8 )P x6 x 7 ( t 7 )P x7 T ( t 1 )P x7 C ( t 2 )P x6 A ( t 3 )P x8 C ( t 4 )P x8 C t 5 x 0 x 6 x 7 x 8 P (0.1) = P (0.2) = ( ) [ ] Y Y

20 Example By using pruning algorithm, it can be computed through L i (x i ), from leaves to root. L i (T) L i (C) L i (A) L i (G) P(TCACC)? Summing at the root, P= Initialization

21 Applications of Phylo-HMMs Improving phylogenetic modeling that allow for variation among sites in the rate of substitution (Felsenstein & Churchill, 1996; Yang, 1995) Protein secondary structure prediction (Goldman et al., 1996; Thorne et al., 1996) Detection of recombination from DNA multiple alignments (Husmeier & Wright, 2001) Comparative genomics (Siepel, et. al. Haussler, 2005)--phastCons Inferring sequence regions under functional divergence in duplicate genes (Huang & Golding, Bioinformatics, 2012)

22 phastcons phastcons is based on a two-state phylogenetic hidden Markov model (phylo-hmm), with a state for conserved regions and a state for nonconserved regions; the free parameters of the model were estimated from a multiple alignment by maximum likelihood, using an EM algorithm. Developed for searching for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes) The predicted (conserved) elements cover roughly 3% 8% of the human genome (depending on the details of the calibration procedure) HCEs (highly conserved elements) are associated with the 3 UTRs of regulatory genes, stable gene deserts, and megabasesized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

23 State-transition diagram for the phylo-hmm used by phastcons, which consists of a state for conserved regions (c) and a state for nonconserved regions (n). Siepel A et al. Genome Res. 2005;15: Cold Spring Harbor Laboratory Press

24 PHAST & RPHAST PHAST and RPHAST: phylogenetic analysis with space/time models (Brief Bioinform. 2011) Include phastcons, phastodds, phylop, dbless, etc

25 HMMDiverge Inferring Sequence Regions under Functional Divergence in Duplicate Genes Bioinformatics (2011) An example with three states: M0 (no functional divergence) M1 & M2 (with functional divergence)

26 Multivariate HMM M M M M S 1 S 2 S L-1 S L E 1 E 1 E 1 E 1 E 2 X 1,1 X 2,1 X L-1 x L E 2 E 2 E 2 X 1,2 X 2,2 X L-1,2 X L,2 E N E N E N E N X 1,N X 2,N X L-1,N X L,N

27 Multivariate HMM (formal definition) A multivariate HMM θ has N sets of observation symbols, each for one given observation sequence n (n=1, 2,, N) A set of hidden states Transition probabilities a ij, for any pair of hidden states i and j Initial probabilities B j =a 0j for any hidden states j N sets of emission probabilities e n s(x n ) for the observation symbol being emitted in the nth observation sequence from the hidden state s.

28 Multivariate HMM Given N observation sequences of the same length L, X={(x 1,1 x 1,L ),,(x N,1 x N,L )} and the hidden state sequence S=(s 1 s L ), the full probability from the multivariate HMM is, ( ) = ' a s j 1 s j e s j x n, j P S,X θ N L j =1 % & N n =1 ( ) Let e ( si x n,1,..., x ) n, j = e ( si x ) n, j, the multivariate HMM n =1 can be reduced to conventional HMM, except the observation symbol becomes a vector (x n,1 x n,j ) at position j. The same algorithms for model inference (Viterbi and forward/backward) and learning can be used. ( * )

### 3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

### 1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

### Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

### Today s Lecture: HMMs

Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

### Evolutionary Models. Evolutionary Models

Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

### 6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

### Supplementary Material

Supplementary Material 1 Sequence Data and Multiple Alignments The five vertebrate, four insect, two worm, and seven yeast genomes used in the analysis are summarized in Table S1, and the four genome-wide

### The Phylo- HMM approach to problems in comparative genomics, with examples.

The Phylo- HMM approach to problems in comparative genomics, with examples. Keith Bettinger Introduction The theory of evolution explains the diversity of organisms on Earth by positing that earlier species

### Computational Identification of Evolutionarily Conserved Exons

Computational Identification of Evolutionarily Conserved Exons Adam Siepel Center for Biomolecular Science and Engr. University of California Santa Cruz, CA 95064, USA acs@soe.ucsc.edu David Haussler Howard

### MCMC: Markov Chain Monte Carlo

I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

### O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

### Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

### Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

### Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

### Basic math for biology

Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

### HMMs and biological sequence analysis

HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

### Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

### Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

### 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

### Expectation-Maximization (EM) algorithm

I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### Alignment Algorithms. Alignment Algorithms

Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

### Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

### CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

### Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

### Lecture 3: Markov chains.

1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### Genome 373: Hidden Markov Models II. Doug Fowler

Genome 373: Hidden Markov Models II Doug Fowler Review From Hidden Markov Models I What does a Markov model describe? Review From Hidden Markov Models I A T A Markov model describes a random process of

### HMM: Parameter Estimation

I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

### Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

### Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

### Statistical Methods for NLP

Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

### Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

### Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

### Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

### STA 414/2104: Machine Learning

STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

### Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

### Hidden Markov Models

Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

### Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

### Markov Models & DNA Sequence Evolution

7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

### Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

### Linear Dynamical Systems

Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

### Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models Terminology and Basic Algorithms The next two weeks Hidden Markov models (HMMs): Wed 9/11: Terminology and basic algorithms Mon 14/11: Implementing the basic algorithms Wed 16/11:

### Hidden Markov Models (I)

GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

### Hidden Markov Models in Bioinformatics 14.11

idden Markov Models in Bioinformatics 14.11 60 min Definition Three Key Algorithms Summing over Unknown States Most Probable Unknown States Marginalizing Unknown States Key Bioinformatic Applications Pedigree

### Applications of hidden Markov models for comparative gene structure prediction

pplications of hidden Markov models for comparative gene structure prediction sger Hobolth 1 and Jens Ledet Jensen 2 1 Bioinformatics Research entre, University of arhus 2 Department of heoretical Statistics,

### VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by

### An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

### Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

### Information Theoretic Distance Measures in Phylogenomics

Information Theoretic Distance Measures in Phylogenomics Pavol Hanus, Janis Dingel, Juergen Zech, Joachim Hagenauer and Jakob C. Mueller Institute for Communications Engineering Technical University, 829

### Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

### Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

### STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

### Stephen Scott.

1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

### Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

### Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

### Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

### Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

### Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

### Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

### Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

### Hidden Markov Models

Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

### Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

### Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

### Phylogenetics. BIOL 7711 Computational Bioscience

Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

### Hidden Markov Models 1

Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

### Phylogenetics: Building Phylogenetic Trees

1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

### Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

### Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

### Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

### p(d θ ) l(θ ) 1.2 x x x

p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

### Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

### Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### DNA Feature Sensors. B. Majoros

DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

### Lecture Notes: Markov chains

Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

### L23: hidden Markov models

L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

### Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

### Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

### Computational Genomics. Uses of evolutionary theory

Computational Genomics 10-810/02 810/02-710, Spring 2009 Model-based Comparative Genomics Eric Xing Lecture 14, March 2, 2009 Reading: class assignment Eric Xing @ CMU, 2005-2009 1 Uses of evolutionary

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

### Hidden Markov Models

Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

### CSCE 471/871 Lecture 3: Markov Chains and

and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

### Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

### BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

### Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

### Taming the Beast Workshop

Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

### Multiple Alignment of Genomic Sequences

Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

### An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis

Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis Adam Siepel Center for Biomolecular Science and Engr. University of California Santa Cruz, CA 95064, USA acs@soe.ucsc.edu David Haussler

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of