Hidden Markov Models and some applications

Similar documents
Hidden Markov Models and some applications

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

HIDDEN MARKOV MODELS

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Steven L. Scott. Presented by Ahmet Engin Ural

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Stephen Scott.

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models

Hidden Markov Models 1

Hidden Markov Models. Three classic HMM problems

O 3 O 4 O 5. q 3. q 4. Transition

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov Models for biological sequence analysis

Biology 644: Bioinformatics

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models (I)

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Note Set 5: Hidden Markov Models

Gibbs Sampling Methods for Multiple Sequence Alignment

HMM: Parameter Estimation

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Hidden Markov Models (HMMs) November 14, 2017

Bioinformatics 2 - Lecture 4

Probabilistic Machine Learning

Lecture 9. Intro to Hidden Markov Models (finish up)

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

11.3 Decoding Algorithm

L23: hidden Markov models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

CSCE 471/871 Lecture 3: Markov Chains and

Data Mining in Bioinformatics HMM

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

STA 414/2104: Machine Learning

HMMs and biological sequence analysis

{ p if x = 1 1 p if x = 0

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

HMM part 1. Dr Philip Jackson

Hidden Markov Models. Hosein Mohimani GHC7717

Introduction to Machine Learning CMU-10701

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

STA 4273H: Statistical Machine Learning

Basic math for biology

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Computational Biology: Basics & Interesting Problems

Parametric Models Part III: Hidden Markov Models

Hidden Markov Models. Ron Shamir, CG 08

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Chapter 4: Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x N

A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models

Hidden Markov models in population genetics and evolutionary biology

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Advanced Data Science

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Approximate Inference

Lecture 7 Sequence analysis. Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

1 What is a hidden Markov model?

Introduction to Artificial Intelligence (AI)

Hidden Markov Models

Today s Lecture: HMMs

Expectation-Maximization (EM) algorithm

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

BMI/CS 576 Fall 2016 Final Exam

Introduction to Stochastic Processes

Markov Models and Hidden Markov Models

MCMC: Markov Chain Monte Carlo

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models in Language Processing

Hidden Markov Models. Terminology, Representation and Basic Problems

Pattern Recognition with Hidden Markov Modells

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Examples of Hidden Markov Models

Hidden Markov Models and Gaussian Mixture Models

CS711008Z Algorithm Design and Analysis

Hidden Markov Models. x 1 x 2 x 3 x K

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos

Lecture 15. Probabilistic Models on Graph

6 Markov Chains and Hidden Markov Models

Transcription:

Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011

HMM description Application to genetic analysis Applications to weather and climate modeling Discussion

HMM description Application to genetic analysis Applications to weather and climate modeling Discussion

HMM description Application to genetic analysis Applications to weather and climate modeling Discussion

HMM description Application to genetic analysis Applications to weather and climate modeling Discussion

HMM description Application to genetic analysis Applications to weather and climate modeling Discussion

HMM description Hidden Markov Models (HMM) a) Markov Chain describing the system state (hidden) Markov Property: the value of X t+1 depends only on immediate past X t b) Emission probabilities describing how the hidden state affects the observable outcomes. Observed states: Y t depends only on X t

HMM past and present Originated in speech processing ( 1970)

HMM past and present Originated in speech processing ( 1970) Popular in

HMM past and present Originated in speech processing ( 1970) Popular in - Fraud detection (Scott 2002)

HMM past and present Originated in speech processing ( 1970) Popular in - Fraud detection (Scott 2002) - Language and music analysis

HMM past and present Originated in speech processing ( 1970) Popular in - Fraud detection (Scott 2002) - Language and music analysis - Bioinformatics (analysis of nucleic acid and protein sequences)

HMM past and present Originated in speech processing ( 1970) Popular in - Fraud detection (Scott 2002) - Language and music analysis - Bioinformatics (analysis of nucleic acid and protein sequences) -......

A toy example Suppose a coin can be switched from fair coin (Head to Tail ratio 1:1) to biased coin (Head to Tail ratio 3:1). It cannot be switched too often because the people will notice. It also cannot stay biased for very long, or the people will become suspicious. Thus, we will have the transition matrix P and the emission matrix G P = fair biased G = next fair next biased 0.95 0.05 0.1 0.9 Heads Tails 0.5 0.5 0.9 0.1 fair biased

Estimation methods Decoding Parameter estimation Let F = fair, B = biased, H = Heads, T = Tails Suppose that P and G matrices are known. For example, let the sequence Y 1:3 = HHH be observed. For every possible sequence of X 1:3, FFF, for example, we can calculate conditional probability P(FFF HHH) = (Bayes Theorem). X,Y,Z {F,B} P(HHH FFF )P(FFF ) P(HHH XYZ)P(XYZ)

Estimation methods Then, one possible decoding method is to determine the most likely sequence of X s to generate the observed Y s. Another method is based on determining P(X t = F Y 1:T ), for all t = 1,..., T. This calculation is very costly (2 T possible values for X 1:T )

Estimation methods Exploiting the Markov property of X can be done iteratively Forward pass: let P F t = P(X t = F Y 1:t ) P F 1 = P(X 1 = F Y 1 ) is easy, then P(X t = F and X t+1 = F Y 1:(t+1) ) is obtained using Markov Property and then P(X t+1 = F Y 1:(t+1) ) P F t+1 is obtained etc. Eventually we get P(X T = F Y 1:T ).

Estimation methods Backward pass: Based on P(X t = F Y 1:T ), apply Markov Property again to get P(X t 1 = F Y 1:T ), etc all the way down to X 1. A similar version of backward pass lets you find the most likely sequence, or to sample the values of X t (conditional on all data) one by one.

Estimation methods If matrices P, G and other parameters are not known, then we can use an iterative sampling method within the Gibbs sampler: sample the sequence X 1:T given the current values of P, G, then update P, G based on current sequence X 1:T etc... etc... until we get a long enough sample of X s. We can use it to estimate the probability that P(X t = F all data) for all t

Output: observed Y, simulated X and the decoded probabilities (red line) of X t = B. A sample trajectory of Y Y[1:200] 1.0 1.4 1.8 0 50 100 150 200 t A sample trajectory of X X[1:200] 1.0 1.4 1.8 0 50 100 150 200 t

HMMs and bioinformatics Genetic code: TTGGTTATCGTTTTTCAC... four letters (bases) A, C, G, T (adenine, cytosine, guanine, thymine). Some sections of DNA code for proteins ( coding regions, about 1.5% of human genome), most do not ( junk DNA ) but might have some function. First uses of HMM in bioinformatics: late 80 s (Churchill) Applications to gene-finding, protein sequence alignment etc etc

CpG islands CpG islands (Irizarry et al, 2009): Areas in the genome with a higher concentration of C and G bases, along with a higher than usual occurrence of CG-pairs. Have some important, still not well understood biological functions, e.g. in regulating gene expression. Original definition of CpG island: a region with at least 200 bp, based on GC content higher than 50% and OE ratio above 0.6 note: OE (observed/expected ratio) is given by p(cg) = OE*p(C)*P(G) This definition misses some regions with the function similar to known CpG islands. Also, it might differ for different species. Need a more flexible definition.

A little problem: CpG appearance cannot be modeled by an HMM (since Y t is conditionally independent of Y t 1, given the hidden state X t ). Solution: run HMM not on single bases, but on the segments of, say, 20 bp. The emission probabilities are assumed Poisson with the means a i 20 p C p G where i = 1 is baseline, and i = 2 is a CpG island state. For Arabidopsis (below), a 1 0.5, a 2 0.9, and p C p G 0.2.

Genetic example: Arabidopsis has a small genome, 157 megabase pairs (Mbp) [human genome 2900 Mbp] popular in plant genetic studies, first plant genome to be sequenced TTGGTTATCGTTTTTCAC...

count of CG s in the chunks of 20 bp: 1 0 0 1 0 0 2 1 4 2 1 1 0 1 0 0 1 0 0 0... number of CpG's 0 200 400 0 200 400 600 800 1000 chunks of 20 number of C or G's 0 4000 8000 0 200 400 600 800 1000 chunks of 20

Bibliography A species-generalized probabilistic model-based definition of CpG islands, Irizarry, Wu & Feinberg, Mamm Genome (2009) 20:674-680 Bayesian methods for Hidden Markov Models, S.L. Scott, Journal of Amer. Stat. Assoc. (2002) 97, 337-351 Twentieth century bipolar seesaw of the Arctic and Antarctic surface air temperatures, Chylek, Folland, Lesins and Dubey, GRL (2010)

QUESTIONS?

THANK YOU! see www.nmt.edu/~olegm/talks/hmm