Hidden Markov Models

Similar documents
Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

EECS730: Introduction to Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Today s Lecture: HMMs

Hidden Markov Models

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Lecture 3: Markov chains.

Hidden Markov Models

Sequences and Information

Statistical NLP: Hidden Markov Models. Updated 12/15

Markov Chains and Hidden Markov Models. = stochastic, generative models

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Computational Genomics and Molecular Biology, Fall

Statistical Methods for NLP

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

HIDDEN MARKOV MODELS

Hidden Markov Models Hamid R. Rabiee

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Data Mining in Bioinformatics HMM

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

CSCE 471/871 Lecture 3: Markov Chains and

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Parametric Models Part III: Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Pair Hidden Markov Models

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

Stephen Scott.

Hidden Markov Models NIKOLAY YAKOVETS

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

Computational Genomics and Molecular Biology, Fall

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models. Terminology, Representation and Basic Problems

LECTURER: BURCU CAN Spring

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CS 7180: Behavioral Modeling and Decision- making in AI

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models (I)

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Introduction to Machine Learning CMU-10701

Biology 644: Bioinformatics

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

HMM: Parameter Estimation

11.3 Decoding Algorithm

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Dept. of Linguistics, Indiana University Fall 2009

Biochemistry 324 Bioinformatics. Hidden Markov Models (HMMs)

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Lecture 11: Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models Part 2: Algorithms

One-Parameter Processes, Usually Functions of Time

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Graphical models for part of speech tagging

A gentle introduction to Hidden Markov Models

Lecture 12: Algorithms for HMMs

Hidden Markov Models

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Methods. Algorithms and Implementation

CAP 5510 Lecture 3 Protein Structures

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

HMM part 1. Dr Philip Jackson

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Lecture 3: ASR: HMMs, Forward, Viterbi

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Dynamic Approaches: The Hidden Markov Model

Basics of protein structure

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Lecture 12: Algorithms for HMMs

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Statistical Sequence Recognition and Training: An Introduction to HMMs

Advanced Data Science

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Hidden Markov Models (Part 1)

Transcription:

Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models in Bioinformatics P. Agius Spring 2008

Sequence = Structure Structure = Function Function = Life Sequence = Life

Protein sequence grammar 1. Letters: Amino acid profiles 2. Words: I-sites motifs 3. Phrases: a hidden Markov model 4. Sentences

Motif grammar? Arrangement of I-sites motifs in proteins is highly non-random helix helix cap beta strand beta turn The dependencies can be modeled as a Markov chain

How to make a Markov chain Sequence data The dog bit the mailman. The mailman kicked the dog back. Markov model the dog mailman bit kicked back Stochastic output The dog back. The mailman kicked the mailman kicked the dog bit the dog bit the dog bit the mailman kicked the dog....

A "hidden" Markov model What's "hidden" about it? An HMM is a Markov chain where the meaning of the Markov state is probabilistic.

Basics of Hidden Markov Models Probabilistic, generative, graphical models HMMs consist of states and state transitions Each state emits characters of an alphabet with characteristic probabilities A sequence is generated by a random path through the model guided by the transition probabilities HMMs are applied to problems where the state sequence is typically unknown. Purpose of using an HMM: Sequence classification: is a given sequence generated by a given HMM? Parsing: which residue of the sequence is generated by which part if the model? EPFL Bioinformatics I 05 Dec 2005

Specification of an HMM N - number of states Q = {q 1 ; q 2 ; : : : ;q T } - set of states M - the number of symbols (observables) O = {o 1 ; o 2 ; : : : ;o T } - set of symbols

Specification of an HMM A - the state transition probability matrix aij = P(q t+1 = j q t = i) B- observation probability distribution b j (k) = P(o t = k q t = j) i k M π - the initial state distribution

Specification of an HMM Full HMM is thus specified as a triplet: λ = (A,B,π)

Sequence alignment data How to make a hidden Markov chain The dog bit the mailman. The mailman kicked the dog back. The dog attacked the postman. The postman hit the dog. hidden Markov model the dog mailman 0.5 postman 0.5 bit 0.3 attacked 0.7 kicked 0.6 hit 0.4 back Stochastic output The dog back. The mailman kicked the postman kicked the dog bit the dog bit the dog attacked the mailman kicked the dog....

Markov Model (or Markov Chain) A T C T A G Probability for each character based only on several preceding characters in the sequence # of preceding characters = order of the Markov Model Probability of a sequence P(s) = P[A] P[A,T] P[A,T,C] P[T,C,T] P[C,T,A] P[T,A,G]

Hidden Markov Models States -- well defined conditions Edges -- transitions between the states A T C G T A C ATGAC ATTAC ACGAC ACTAC Each transition asigned a probability. Probability of the sequence: single path with the highest probability ---Viterbi path sum of the probabilities over all paths --Baum-Welch method

Measures of Prediction Accuracy Nucleotide Level TN FN TP FP TN FN TP FN TN REALITY PREDICTION PREDICTIO c nc REALITY c nc TP FP FN TN Sensitivity S n = TP / (TP + FN) Specificity S p = TP / (TP + FP)

HMM Advantages Statistical Grounding Statisticians are comfortable with the theory behind hidden Markov models Freedom to manipulate the training and verification processes Mathematical / theoretical analysis of the results and processes HMMs are still very powerful modeling tools far more powerful than many statistical methods

HMM Advantages continued Modularity HMMs can be combined into larger HMMs Transparency of the Model Assuming an architecture with a good design People can read the model and make sense of it The model itself can help increase understanding

HMM Advantages continued Incorporation of Prior Knowledge Incorporate prior knowledge into the architecture Initialize the model close to something believed to be correct Use prior knowledge to constrain training process

HMM Disadvantages Markov Chains States are supposed to be independent P(x) P(y) P(y) must be independent of P(x), and vice versa This usually isn t true Can get around it when relationships are local Not good for RNA folding problems

HMM Disadvantages continued Standard Machine Learning Problems Watch out for local maxima Model may not converge to a truly optimal parameter set for a given training set Avoid over-fitting You re only as good as your training set More training is not always good

Speed!!! HMM Disadvantages continued Almost everything one does in an HMM involves: enumerating all possible paths through the model There are efficient ways to do this Still slow in comparison to other methods