Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Similar documents
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

HMMs and biological sequence analysis

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models

Hidden Markov Models

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

An Introduction to Bioinformatics Algorithms Hidden Markov Models

HIDDEN MARKOV MODELS

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

EECS730: Introduction to Bioinformatics

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

Hidden Markov Models (I)

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models for biological sequence analysis

CSCE 471/871 Lecture 3: Markov Chains and

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Chapter 4: Hidden Markov Models

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models and some applications

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Gibbs Sampling Methods for Multiple Sequence Alignment

Lecture 9. Intro to Hidden Markov Models (finish up)

0 Algorithms in Bioinformatics, Uni Tübingen, Daniel Huson, SS 2004

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Introduction to Machine Learning CMU-10701

Hidden Markov Models and some applications

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Pair Hidden Markov Models

HMM: Parameter Estimation

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Today s Lecture: HMMs

Hidden Markov Models (HMMs) November 14, 2017

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov models in population genetics and evolutionary biology

192 Algorithms in Bioinformatics II, Uni Tübingen, Daniel Huson, SS 2003

Multiple Sequence Alignment using Profile HMM

Markov Chains and Hidden Markov Models. = stochastic, generative models

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

BMI/CS 576 Fall 2016 Final Exam

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

order is number of previous outputs

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Statistical Machine Learning from Data

Computational Genomics and Molecular Biology, Fall

Markov Models & DNA Sequence Evolution

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Models (HMMs) and Profiles

Stephen Scott.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Pairwise sequence alignment and pair hidden Markov models

Steven L. Scott. Presented by Ahmet Engin Ural

Data Mining in Bioinformatics HMM

HMM : Viterbi algorithm - a toy example

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Basic math for biology

Pairwise alignment using HMMs

6 Markov Chains and Hidden Markov Models

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models 1

Bioinformatics 2 - Lecture 4

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Applications of Hidden Markov Models

CS 7180: Behavioral Modeling and Decision- making in AI

Statistical Methods for NLP

Lecture 12: Algorithms for HMMs

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Toward Probabilistic Forecasts of Convective Storm Activity for En Route Air Traffic Management

11.3 Decoding Algorithm

STA 414/2104: Machine Learning

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Evolutionary Models. Evolutionary Models

Info 2950, Lecture 25

Alignment Algorithms. Alignment Algorithms

Design and Implementation of Speech Recognition Systems

Lecture 12: Algorithms for HMMs

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Transcription:

CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/14/07 CAP5510 1

CpG Islands Regions in DNA sequences with increased occurrences of substring CG Rare: typically C gets methylated and then mutated into a T. Often around promoter or start regions of genes Few hundred to a few thousand bases long 2/14/07 CAP5510 2

Problem 1: Input: Small sequence S Output: Is S from a CpG island? Build Markov models: M+ and M Then compare 2/14/07 CAP5510 3

2/14/07 CAP5510 4 0.182 0.384 0.355 0.079 T 0.125 0.375 0.339 0.161 G 0.188 0.274 0.368 0.171 C 0.120 0.426 0.274 0.180 A T G C A + 0.292 0.292 0.239 0.177 T 0.208 0.298 0.246 0.248 G 0.302 0.078 0.298 0.322 C 0.210 0.285 0.205 0.300 A T G C A Markov Models

How to distinguish? Compute S = P( x M + ) = P( x M ) L ( i 1) i ( x) log x( i 1) i= 1 p log m x x ( i 1) x x i = L i= 1 r x i r=p/m A C A -0.740-0.913 C 0.419 0.302 G 0.580 1.812 T -0.803-0.685 Score(GCAC) =.461-.913+.419 < 0. GCAC not from CpG island. G T -0.624-1.169 0.461 0.573 0.331 0.393-0.730-0.679 Score(GCTC) =.461-.685+.573 > 0. GCTC from CpG island. 2/14/07 CAP5510 5

Problem 1: Input: Small sequence S Output: Is S from a CpG island? Build Markov Models: M+ & M- Then compare Problem 2: Input: Long sequence S Output: Identify the CpG islands in S. Markov models are inadequate. Need Hidden Markov Models. 2/14/07 CAP5510 6

Markov Models P(A+ A+) + A C G T A+ T+ A 0.180 0.274 0.426 0.120 C 0.171 0.368 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 C+ P(G+ C+) G+ 2/14/07 CAP5510 7

CpG Island + in an ocean of First order Hidden Markov Model P(A+ A+) A+ MM=16, HMM= 64 transition probabilities (adjacent bp) T+ A- T- P(C- A+) C+ P(G+ C+) G+ C- G- 2/14/07 CAP5510 8

Hidden Markov Model (HMM) States Transitions Transition Probabilities Emissions Emission Probabilities What is hidden about HMMs? Answer: The path through the model is hidden since there are many valid paths. 2/14/07 CAP5510 9

How to Solve Problem 2? Solve the following problem: Input: Hidden Markov Model M, parameters, emitted sequence S Output: Most Probable Path How: Viterbi s Algorithm (Dynamic Programming) Define [i,j] = MPP for first j characters of S ending in state i Define P[i,j] = Probability of [i,j] Compute state i with largest P[i,j]. 2/14/07 CAP5510 10

Profile Method 2/14/07 CAP5510 11

Profile Method 2/14/07 CAP5510 12

Profile Method 2/14/07 CAP5510 13

Profile HMMs START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END 2/14/07 CAP5510 14

Profile HMMs with InDels Insertions Deletions Insertions & Deletions DELETE 1 DELETE 2 DELETE 3 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END INSERT 3 INSERT 4 2/14/07 CAP5510 15

Profile HMMs with InDels DELETE 1 DELETE 2 DELETE 3 DELETE 4 DELETE 5 DELETE 6 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END INSERT 4 INSERT 4 INSERT 3 INSERT 4 INSERT 4 INSERT 4 Missing transitions from DELETE j to INSERT j and from INSERT j to DELETE j+1. 2/14/07 CAP5510 16

How to model Pairwise Sequence Alignment LEAPVE LAPVIE DELETE Pair HMMs Emit pairs of synbols Emission probs? Related to Sub. Matrices START MATCH END INSERT How to deal with InDels? Global Alignment? Local? Related to Sub. Matrices 2/14/07 CAP5510 17

How to model Pairwise Local Alignments? START Skip Module Align Module Skip Module END How to model Pairwise Local Alignments with gaps? START Skip Module Align Module Skip Module END 2/14/07 CAP5510 18

Standard HMM architectures 2/14/07 CAP5510 19

Standard HMM architectures 2/14/07 CAP5510 20

Standard HMM architectures 2/14/07 CAP5510 21

Profile HMMs from Multiple Alignments HBA_HUMAN VGA--HAGEY HBB_HUMAN V----NVDEV MYG_PHYCA VEA--DVAGH GLB3_CHITP VKG------D GLB5_PETMA VYS--TYETS LGB2_LUPLU FNA--NIPKH GLB1_GLYDI IAGADNGAGV Construct Profile HMM from above multiple alignment. 2/14/07 CAP5510 22

HMM for Sequence Alignment 2/14/07 CAP5510 23

Problem 3: LIKELIHOOD QUESTION Input: Sequence S, model M, state i Output: Compute the probability of reaching state i with sequence S using model M Backward Algorithm (DP) Problem 4: LIKELIHOOD QUESTION Input: Sequence S, model M Output: Compute the probability that S was emitted by model M Forward Algorithm (DP) 2/14/07 CAP5510 24

Problem 5: LEARNING QUESTION Input: model structure M, Training Sequence S Output: Compute the parameters Criteria: ML criterion maximize P(S M, ) HOW??? Problem 6: DESIGN QUESTION Input: Training Sequence S Output: Choose model structure M, and compute the parameters No reasonable solution Standard models to pick from 2/14/07 CAP5510 25

Iterative Solution to the LEARNING QUESTION (Problem 5) Pick initial values for parameters 0 Repeat Run training set S on model M Count # of times transition i j is made Count # of times letter x is emitted from state i Update parameters Until (some stopping condition) 2/14/07 CAP5510 26

Entropy Entropy measures the variability observed in given data. E = c log pc pc Entropy is useful in multiple alignments & profiles. Entropy is max when uncertainty is max. 2/14/07 CAP5510 27