Data Mining in Bioinformatics HMM

Similar documents
Parametric Models Part III: Hidden Markov Models

Hidden Markov Models Hamid R. Rabiee

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Markov Chains and Hidden Markov Models. = stochastic, generative models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

HMMs and biological sequence analysis

Introduction to Machine Learning CMU-10701

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Multiscale Systems Engineering Research Group

Today s Lecture: HMMs

Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models

STA 4273H: Statistical Machine Learning

Hidden Markov Models for biological sequence analysis

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Machine Learning for natural language processing

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models for biological sequence analysis I

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Lecture 11: Hidden Markov Models

STA 414/2104: Machine Learning

COMP90051 Statistical Machine Learning

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Hidden Markov Models NIKOLAY YAKOVETS

HIDDEN MARKOV MODELS

Lecture 3: Markov chains.

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models (I)

Statistical Methods for NLP

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Modelling

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Lecture 7 Sequence analysis. Hidden Markov Models

HMM part 1. Dr Philip Jackson

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

BMI/CS 576 Fall 2016 Final Exam

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Basic math for biology

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 471/871 Lecture 3: Markov Chains and

Stephen Scott.

Hidden Markov Model and Speech Recognition

DNA Feature Sensors. B. Majoros

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models Part 2: Algorithms

Graphical Models Seminar

Hidden Markov models

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

1. Markov models. 1.1 Markov-chain

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models

Statistical Methods for NLP

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Lab 3: Practical Hidden Markov Models (HMM)

Dynamic Approaches: The Hidden Markov Model

Dept. of Linguistics, Indiana University Fall 2009

Lecture 12: Algorithms for HMMs

Hidden Markov Models Part 1: Introduction

Lecture 12: Algorithms for HMMs

Hidden Markov Models

EECS730: Introduction to Bioinformatics

Computational Genomics and Molecular Biology, Fall

Linear Dynamical Systems (Kalman filter)

Advanced Data Science

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

L23: hidden Markov models

Alignment Algorithms. Alignment Algorithms

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Statistical Sequence Recognition and Training: An Introduction to HMMs

Gibbs Sampling Methods for Multiple Sequence Alignment

CS 7180: Behavioral Modeling and Decision- making in AI

Data-Intensive Computing with MapReduce

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Pair Hidden Markov Models

Biochemistry 324 Bioinformatics. Hidden Markov Models (HMMs)

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

order is number of previous outputs

Lecture 9. Intro to Hidden Markov Models (finish up)

Transcription:

Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1

Data Mining in Bioinformatics n Many genomes being sequenced n Resulting problem in computational biology: n Finding genes in DNA sequences Gene Finding n Gene finding refers to identifying stretches of nucleotide sequences in genomic DNA that are biologically functional n Computational gene finding deals with algorithmically identifying protein-coding genes 2

Gene Finding n n Gene finding is not an easy task Gene structure can be very complex n The gene is discontinous coding: n Exons: a region that encodes a sequence of amino acids n Introns: non-coding polynucleotide sequences that interrupts the coding sequences, the exons, of a gene 3

n In gene finding there are some important biological rules: n Translation starts with a start codon (ATG). n Translation ends with a stop codon (TAG, TGA, TAA) n Exon can never follow an exon without an intron in between n Complete genes can never end with an intron 4

Gene Finder n The HMMs can be applied efficiently to well known biological problems. n Problems like: n Protein secondary structure recognition n Multiple sequence alignment n Gene finding 5

Hidden Markov Models in Bioinformatics n A HMM is a statistical model for sequences of discrete simbols. n HMMs are perfect for the gene finding task. Categorizing nucleotids within a genomic sequence can be interpreted as a clasification problem with a set of ordered observations that posses hidden structure, that is a suitable problem for the application of hidden Markov models. Example of a Markov Model 6

Markov Chains Rain Sunny Cloudy States : Three states - sunny, cloudy, rainy. State transition matrix : The probability of the weather given the previous day's weather. Initial Distribution : Defining the probability of the system being in each of the states at time 0. n For each Markov chain, a transition matrix stores the transitions for each combination of states. n As the length of the Markov chain increases, the size of the matrix also increases exponentially, and the matrix becomes extremely sparse. n Processing time also increases proportionally. 7

Example of a Hidden Markov Model 8

Markov chain: an example Weather model: n 3 states {rainy, cloudy, sunny} Problem: n Forecast weather state, based on the current weather state Markov chain Model Definition n N States, {S 1, S 2, S N } n Sequence of states Q ={q 1, q 2, } Initial probabilities π={π 1, π 2, π N } π i =P(q 1 =S i ) Transition matrix A NxN a ij =P(q t+1 =S j q t =S i ) 9

Mixture Models: an example Weather model: n 3 hidden states n {rainy, cloudy, sunny} n Measure weather-related variables (e.g. temperature, humidity, barometric pressure) Problem: n Given the values of the weather variables, what is the state? Gaussian Mixture Model Definition n Ν states observed through an observation x Model parameter θ={p 1 p N, µ 1...µ Ν, Σ 1...Σ Ν } 10

HMM: an example Weather model: n 3 hidden states n {rainy, cloudy, sunny} n Measure weather-related variables (e.g. temperature, humidity, barometric pressure) Problem: n Forecast the weather state, given the current weather variables Hidden Markov Model Definition (1/2) n N hidden States, {S 1, S 2, S N } n Sequence of states Q ={q 1, q 2, } n Sequence of observations O={O 1, O 2, } 11

Hidden Markov Model Definition (2/2) Similar to Markov Chain Similar to Mixture Model Similar to Markov Chain n λ=(a, B, π): Hidden Markov Model n A={a ij }: State transition probabilities n a ij =P(q t+1 =S j q t =S i ) Β={b i (v)}: Observation probability distribution b i (v)=p(o t =v q t =S i ) n π={π i }: initial state distribution n π i =P(q 1 =S i ) HMM Graph Similar to Markov Chain Similar to Mixture Model 12

The three basic problems n Evaluation n O, λ P(O λ) n Uncover the hidden part n O, λ Q that P(Q O, λ) is maximum n Learning n {Ο} λ that P(O λ) is maximum Evaluation n O, λ P(O λ) n Solved by using the forwardbackward procedure n Applications n Evaluation of a sequence of observations n Find most suitable HMM n Used in the other two problems 13

Uncover the hidden part n O, λ Q that P(Q O, λ) is maximum n Solved by Viterbi algorithm n Applications n Find the real states n Learn about the structure of the model n Estimate statistics of the states n Used in the learning problem Learning n {Ο} λ that P(O λ) is maximum n No analytic solution n Usually solved by Baum-Welch (EM variation) n Applications n Unsupervised Learning (single HMM) n Supervised Learning (multiple HMM) 14

Hidden Markov models Set of states: { s 1, s2,!, sn } Process moves from one state to another generating a sequence of states : s, s, 2!, s,! i1 i ik Markov chain property: probability of each subsequent state depends only on what was the previous state: P ( sik si 1, si2,!, sik- 1) = P( sik sik- 1) States are not visible, but each state randomly generates one of M observations (or visible states) { v 1, v2,!, vm } Hidden Markov models To define hidden Markov model, the following probabilities have to be specified: matrix of transition probabilities A=(a ij ), a ij = P(s i s j ) Matrix of observation probabilities B=(b i (v m )), b i (v m )= P(v m s i ) Vector of initial probabilities p=(p i ), p i = P(s i ) Model is represented by M=(A, B, p). 15

Word recognition Typed word recognition, assume all characters are separated. Character recognizer outputs probability of the image being particular character, P(image character). a b c 0.5 0.03 0.005 z 0.31 Hidden state Observation Word recognition Hidden states of HMM = characters. Observations = typed images of characters segmented from the image v a. Note that there is an infinite number of observations Observation probabilities = character recognizer scores. B = ( b v )) ( P( v s )) i( a = a i Transition probabilities will be defined differently in two subsequent models. 16

Word recognition If lexicon is given, we can construct separate HMM models for each lexicon word. Amherst a m h e r s t Buffalo b u f f a l o 0.5 0.03 0.4 0.6 Here recognition of word image is equivalent to the problem of evaluating few HMM models. This is an application of Evaluation problem. We can construct a single HMM for all words. Hidden states = all characters in the alphabet. Transition probabilities and initial probabilities are calculated from language model. Observations and observation probabilities are as before. a f m o r t b h e s v Here we have to determine the best sequence of hidden states, the one that most likely produced word image. This is an application of Decoding problem. 17

Character recognition with HMM The structure of hidden states is chosen. Observations are feature vectors extracted from vertical slice Probabilistic mapping from hidden state to feature vectors: 1. use mixture of Gaussian models 2. Quantize feature vector space. The structure of hidden states: s 1 s 2 s 3 Observation = number of islands in the vertical slice. HMM for character A : æ.8.2 0 ö Transition probabilities: {a ij }= ½ 0.8.2 ½ è 0 0 1 ø æ.9.1 0 ö Observation probabilities: {b jk }= ½.1.8.1 ½ è.9.1 0 ø HMM for character B : æ.8.2 0 ö Transition probabilities: {a ij }= ½ 0.8.2 ½ è 0 0 1 ø æ.9.1 0 ö Observation probabilities: {b jk }= ½ 0.2.8 ½ è.6.4 0 ø 18

Suppose that after character image segmentation the following sequence of island numbers in 4 slices was observed: { 1, 3, 2, 1} What HMM is more likely to generate this observation sequence, HMM for A or HMM for B? 19

Law of Total Probabilities 20

n The algorithm makes use of the principle of dynamic programming n Computes efficiently the values that are required to obtain the posterior marginal distributions in two passes. n The first pass goes forward in time while the second goes backward in time; hence the name forward backward algorithm n In the first pass, the forward backward algorithm computes a set of forward probabilities which provide n The probability of ending up in any particular state given the first k observations 21

22

n In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point n These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence 23

24

25

Expectation maximization n Expectation (E) step, which creates a function for the expectation using the current estimate for the parameters n Maximization (M) step, which computes parameters maximizing the expected found on the E step 26

27

28

Finding genes in DNA sequence This is one of the most challenging and interesting problems in computational biology at the moment. With so many genomes being sequenced so rapidly, it remains important to begin by identifying genes computationally. 29

30

Example-HMM Transition Prob. Output Prob. Scoring a Sequence with an HMM: The probability of ACCY along this path is.4 *.3 *.46 *.6 *.97 *.5 *.015 *.73 *.01 * 1 = 1.76x10-6. 31

32

n When using HMMs first we have to specify a model n When choosing the model we have to take into consideration their complexity like the number of states and allowed transitions. 33