Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Similar documents
STA 414/2104: Machine Learning

CS 343: Artificial Intelligence

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

STA 4273H: Statistical Machine Learning

L23: hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS 343: Artificial Intelligence

Lecture 11: Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

Hidden Markov Models

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

CS 188: Artificial Intelligence Fall 2011

Introduction to Machine Learning CMU-10701

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Modelling

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

CS532, Winter 2010 Hidden Markov Models

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models and Gaussian Mixture Models

Advanced Data Science

Parametric Models Part III: Hidden Markov Models

CSE 473: Artificial Intelligence

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Hidden Markov models 1

Math 350: An exploration of HMMs through doodles.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Note Set 5: Hidden Markov Models

Graphical Models Seminar

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

CSEP 573: Artificial Intelligence

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Graphical models for part of speech tagging

CS 343: Artificial Intelligence

Basic math for biology

CS 7180: Behavioral Modeling and Decision- making in AI

HMM part 1. Dr Philip Jackson

Statistical Processing of Natural Language

CS 7180: Behavioral Modeling and Decision- making in AI

Markov Chains and Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Multiscale Systems Engineering Research Group

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Hidden Markov Models and Gaussian Mixture Models

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models NIKOLAY YAKOVETS

order is number of previous outputs

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

CS 188: Artificial Intelligence Spring 2009

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models Part 2: Algorithms

Approximate Inference

Lecture 9. Intro to Hidden Markov Models (finish up)

Data Mining in Bioinformatics HMM

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

CS 188: Artificial Intelligence

O 3 O 4 O 5. q 3. q 4. Transition

Statistical Methods for NLP

Hidden Markov Models Hamid R. Rabiee

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

CS 188: Artificial Intelligence Spring Announcements

The main algorithms used in the seqhmm package

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Hidden Markov Models. x 1 x 2 x 3 x N

Introduction to Artificial Intelligence (AI)

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Linear Dynamical Systems (Kalman filter)

Speech Recognition HMM

Hidden Markov Models (HMMs)

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Computational Genomics and Molecular Biology, Fall

HMM: Parameter Estimation

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Hidden Markov Models

Data Analyzing and Daily Activity Learning with Hidden Markov Model

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

A gentle introduction to Hidden Markov Models

Hidden Markov Models

Bioinformatics Introduction to Hidden Markov Models Hidden Markov Models and Multiple Sequence Alignment

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

COMP90051 Statistical Machine Learning

Hidden Markov Models

Transcription:

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs 1

HMM is useful for many, many problems. Speech Recognition and Translation Weather Modeling Sequence Alignment Financial Modeling 2

So let s say you re riding out nuclear war in a bunker To keep sane, you want to know what the weather outside is like? but all you can observe is if the security guard brings his umbrella. 4

Observations Probabilistic reasoning X E Hidden State P(Sunny Umbrella) P(loudy Umbrella) P(Rain Umbrella) P(Sunny No Umbrella) P(loudy No Umbrella) P(Rain No Umbrella) P(X E) = probability of X happening if E is observed. 5

Probabilistic reasoning in stochastic processes Time Hidden State X 0 X 1 X 2 X 3 X 4 Observations ( Emissions ) E 0 E 1 E 2 E 3 E 4 Hidden State Observations ( Emissions ) This is called a Markov chain 6

Assumptions in Markov modeling Assumption 1: This is a stationary process, specifically a first-order Markov Process: P(X t X t-1,x t-2,x t-3, ) = P(X t X t-1 ) in other words, the current state depends only on the previous state. We call this the transition model. Assumption 2: The current observations depends only on the current state: P(E t X t,x t-1,x t-2,,e t-1,e t-2,e t-3, ) = P(E t X t ) in other words, the current observation depends only on the current state. We call this the observation (or emission) model. Hidden State X 0 X 1 X 2 X 3 X 4 Observations ( Emissions ) E 0 E 1 E 2 E 3 E 4 7

The initial and transition probability models: π and A X t-1 P(X t = Sunny) P(X t X t-1 ) P(X t = loudy) P(X t = Raining) P(X) π Sunny 0.7 loudy 0.15 Raining 0.15 Sunny 0.7 0.25 0.05 loudy 0.33 0.33 0.33 Raining 0.2 0.6 0.2 X 0 X 1 E 0 E 1 Encodes prior knowledge about weather trends. 8

The observation probability model: B Hidden State X 0 X 1 X 2 X t P(E t =Um.) Sunny 0.05 loudy 0.10 Raining 0.85 Observations ( Emissions ) E 0 E 1 E 2 Encodes prior knowledge about how likely people are to bring their umbrella depending on weather conditions. 9

Together these parameters define a Markov model. {, A, B} Initial State Probabilities State Transition Probabilities Observation Distributions a, a R,R a,r π R π R a R, b b R 10

Predicting state sequences from observations Observation Sequence (t=1..t) Predicted Hidden State Sequence Markov hain Markov Model a, a R,R X 0 X 1 X 2 X 3 π a,r a R, R π R E 0 E 1 E 2 E 3 b b R 11

Finding the optimal state sequence with Viterbi {, A, B} Given a model that describes the system ( the optimal state sequence (idealization) as follows: Time X 0 X 1 X 2 X 3 ), we can determine S S S S States R R R R For each state at time t, calculate probability of the state at time t (X t ) being a particular state x i (sunning, raining, etc), given observations and previous states: P(X t =x i E t,e t-1,e t-2,,x t-1,x t-2,x t-3, ) = P(X t =x i E t, X t-1 =x j ) = P(X t =x i E t ) P(X t =x i X t =x j ) P(X 0 =x i ) = π 12

Finding the optimal state sequence with Viterbi Time X 0 X 1 X 2 X 3 S S S S States R R R R Repeat these calculations for all possible transitions recursively. Then at each point in time we have an estimate of how likely we are to be in a particular state at that time given all possible previous paths. We also keep track of the most likely state at each point in time. (This complex looking thing is called a trellis. an you see why?) 13

Finding the optimal state sequence with Viterbi Time X 0 X 1 X 2 X 3 S S S S States R R R R Find the most likely end state from the probabilities. We can then backtrack to find the most likely state sequence. You have seen a similar procedure with sequence alignment. 14

Predicting state sequences from observations Observation Sequence (t=1..t) Predicted Hidden State Sequence Markov hain Markov Model a, a R,R X 0 X 1 X 2 X 3 π a,r a R, R π R E 0 E 1 E 2 E 3 b b R 15

FRET Fluorescence Ok, so I m bored of talking about the weather. 0 1 2 3 4 5 Time (min) A practical example of Markov modeling: Analysis of single-molecule fluorescence trajectories 16

Neurotransmitter release and reuptake is central to neuronal signaling and proper functioning of the brain. NSS Reuptake www.nia.nih.gov, public domain.

Neurotransmitter:Sodium Symporter (NSS) proteins are the targets of many clinically-important drugs. Therapeutic Inhibitors NSS Reuptake Drugs of Abuse www.nia.nih.gov, public domain.

A practical example of Markov modeling: Analysis of single-molecule fluorescence trajectories High Na + Outside Neurotransmitter Extracellular Intracellular Low Na + Inside Key Question: What are the specific conformational changes required for such a mechanism and how do they mediate transport?

FRET Single molecule FRET: A tool for examining conformational dynamics 1.0 0.8 0.6 0.4 Acceptor Donor 0.2 R 0 0.0 2 4 6 8 10 Distance (nm) 20

FRET Fluorescence FRET imaging of single-molecules can be achieved using a few tricks, including total internal reflection excitation. 15 10 5 0 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 Time (min) Acceptor Donor Surface 532 nm TIR Excitation 21

FRET Fluorescence onformation HMM is a statistical framework for modeling a hidden system using a sequence of observations generated by that system. Sequence of Hidden States X 0 X 1 X 2 Sequence of Observations E 0 E 1 E 2 0 2 4 6 8 10 12 14 Time (sec) We want to know: 1) How many distinct states are there? 2) What are their FRET values? 3) What are the rates? 4) Most likely state at each point in time? Unlike with the weather, we have to learn the model form the data itself!! 22

Hidden Markov models have three components: 1) Initial state probabilities: O, a O,O {, A, B} a, a O, 2) Transition probabilities: A { a i, j} a a O, O, O a a O,, π O O b O a,o b π 3) Observation probability distribution (OPD): 1 ( Et i ) B bi ( Et ) exp 2 2 2 i i 2 μ i σ i 0.4 0.5 0.6 0.7 FRET FRET distribution for state i. 23

Goal: best model to explain the experimental data. In other words, we want to maximize the probability of the model giving the data. ˆ argmax P( E) (where λ is the model, E is the observed FRET trajectory) But we don t know how to calculate P( λ E )! Instead, turn it around using Bayes theorem: P( E) P( E ) P( ) P( E) The prior probability P(E) is independent of the model choice and will not affect model ranking. If we assume all models are equally likely, then: ˆ argmax P( E) argmax P( E ) P( E λ ) is easy to calculate it is the observation distribution. Why is X not here? We have to do this over all possible state sequences! 24

FRET Segmental k-means (SKM): optimization on the cheap λ 0 State assignment (Viterbi) Parameter reestimation 1.0 0.8 0.6 0.4 0.2 0.0 0 1 2 3 4 5 Time (min) λ i To get B, simply calculate the mean and std for each state from the current assignment. To get A, count the number of transitions of each type and normalize. To get π, count the number of times each dwell starts with each state x i and normalize. Works only if the starting model that is close to final. F. Qin (2004), Biophys J 86: 1488 25

Model optimization: expectation maximization (EM). Expectation: alculate the probability of data given the model (expectation). P( E ) Initial (π) Transition (A) Observation (B) P( X ) t0.. T P( X t X t1 ) P( X t Et ) LL log[ P( X )] t0.. T log[ P( X t X ) P( X t1 t t E )] Maximization: Adjust model parameters to better fit the calculated probabilities. Termination: Iterate until log-likelihood converges (e.g., ΔLL<10-4 ). Restarts: if the likelihood landscape is very frustrated, restarting from a random initial model can help get out of local minima. 26

The forward-backward algorithm (Baum Welch) The past The future X 0 X 1 X 2 X 3 X 96 X 97 X 98 X 99 E 0 E 1 E 2 E 3 E 96 E 97 E 98 E 99 alculating the probabilities at a particular point in time (t): P( X t E.. T ) P( X t E1.. t, Et 1.. 1 T ) α P( X t E.. t ) P( X t Et 1.. 1 T Forward Backward ) We can do this because of Bayes rule and conditional independence of observations over time We calculate these much like we did with Viterbi 27

The forward algorithm Time X 0 X 1 X 2 X 3 O O O O States Partial probabilities (α) are calculated recursively as: α t (j) = P(observation hidden state is j) P(all paths to state j at time t) Initial condition: α 0 (j) = π( j ) B(j,E t ) Iterate: n t1 ( j ) B ), i1 j, E t t ( i ai j Then the total probability of the sequence is the sum of these α s 28

Maximization using forward-backward probabilities Probability of transitioning from state i to j at time t: (from the Forward-Backward algorithm) Probability of being in state i at time t: Model parameters adjusted to maximize log-likelihood: This very much like SKM, except we use explicit probabilities instead of just counting. 29

The problem of bias You can always get a better fit using more parameters! But it may not be a good model. Bayesian information criterion (BI): -2 ln* P(E k) + BI = -2 ln(ll) + k ln(n) k is the number of free parameters, LL is log-likelihood of the optimal fit, and n is the number of data points. Akike information (AI) AI = -2 k - 2 ln(ll) Maximum evidence methods (vbfret), etc. 30

FRET Fluorescence onformation HMM is a statistical framework for modeling a hidden system using a sequence of observations generated by that system. Sequence of Hidden States X 0 X 1 X 2 E 0 E 1 E 2 Sequence of Observations We want to know: 1) How many distinct states are there? 2) What are their FRET values? 3) What are the rates? 4) Most likely state at each point in time? 0 2 4 6 8 10 12 14 Time (sec) 31

Occupancy (%) FRET Dwell Time (s) Quantifying kinetics is then useful for understanding how outside factors (ligands) influence dynamics. 30 20 1.0 0.8 0.6 0.4 0.2 2 mm Na + : +2 mm Ala 0.0 0 1 2 3 4 5 Time (min) 10 0 80 60 Open State losed State -1 0 1 2 3 4 log [Ala] (M) 40 20-1 0 1 2 3 4 log [Ala] (M) Zhao and Terry, et al (2011), Nature 474

Other important examples of Markov modeling: Single-channel recordings (patch clamp) O Sequence analysis ardiac electrical modeling Systems modeling of metabolic networks 33

We can do non-equilibrium Markov modeling, too Geggier et al (2010), JMB 399: 576 34

HMM is useful for many, many problems. Speech Recognition and Translation Weather Modeling Sequence Alignment Financial Modeling 35

Some useful references Artificial Intelligence: A Modern Approach http://www.comp.leeds.ac.uk/roger/hiddenmarkovmodels/ html_dev/main.html Rabiner (1989), Proc. of the IEEE 77: 257. Qin F. Principles of single-channel kinetic analysis. Methods Mol Biol. 2007; 403. Bronson et al (2009), Biophys J 97: 3196. QuB software suite: www.qub.buffalo.edu 36