Basic math for biology

Similar documents
CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Hidden Markov Models

Linear Dynamical Systems

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models. Three classic HMM problems

Dynamic Approaches: The Hidden Markov Model

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

p(d θ ) l(θ ) 1.2 x x x

Machine Learning for natural language processing

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Conditional Random Field

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Data-Intensive Computing with MapReduce

HMM part 1. Dr Philip Jackson

Lecture 11: Hidden Markov Models

STA 4273H: Statistical Machine Learning

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Models

Directed Probabilistic Graphical Models CMSC 678 UMBC

Computational Genomics and Molecular Biology, Fall

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Modelling

Hidden Markov Models in Language Processing

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Statistical NLP: Hidden Markov Models. Updated 12/15

Linear Dynamical Systems (Kalman filter)

STA 414/2104: Machine Learning

Machine Learning Summer School

Hidden Markov Models

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Note Set 5: Hidden Markov Models

Graphical Models for Collaborative Filtering

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Steven L. Scott. Presented by Ahmet Engin Ural

Lecture 6: Graphical Models: Learning

Hidden Markov Models. Terminology and Basic Algorithms

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

11.3 Decoding Algorithm

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models NIKOLAY YAKOVETS

Factor Graphs and Message Passing Algorithms Part 1: Introduction

A gentle introduction to Hidden Markov Models

Graphical Models Seminar

3 : Representation of Undirected GM

Cours 7 12th November 2014

Hidden Markov models

Hidden Markov Models (I)

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Today s Lecture: HMMs

2 : Directed GMs: Bayesian Networks

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

State Space and Hidden Markov Models

CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm

Hidden Markov Models. Terminology, Representation and Basic Problems

Probabilistic Graphical Models

Parametric Models Part III: Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Probabilistic Models for Sequence Labeling

9 Forward-backward algorithm, sum-product on factor graphs

Hidden Markov Models

CS 7180: Behavioral Modeling and Decision- making in AI

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Variational Scoring of Graphical Model Structures

Master 2 Informatique Probabilistic Learning and Data Analysis

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

4 : Exact Inference: Variable Elimination

order is number of previous outputs

Fractional Imputation in Survey Sampling: A Comparative Review

Consistency of the maximum likelihood estimator for general hidden Markov models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Statistical Sequence Recognition and Training: An Introduction to HMMs

Lecture 3: Machine learning, classification, and generative models

Machine Learning for OR & FE

Bayesian Machine Learning - Lecture 7

Alignment Algorithms. Alignment Algorithms

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Lecture 9. Intro to Hidden Markov Models (finish up)

Tutorial on Hidden Markov Model

Transcription:

Basic math for biology Lei Li Florida State University, Feb 6, 2002

The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood estimate: log P θ (Y) = log P θ (Y, X) log P θ (X Y) log L(Y; θ) = log L(Y, X; θ) log P (X Y; θ) Maximum likelihood estimate: ˆθ maximizes log L(Y; θ). However, usually the MLE based on the full data has a closed a form.

The EM algorithm: key idea Take conditional expectation with respect to Y = y at the parameter θ and let Q(θ; θ ) = E θ [log L(Y, X; θ) Y = y] H(θ; θ ) = E θ [log P (X Y; θ) Y = y] log L(Y; θ) = Q(θ; θ ) H(θ; θ ) E-M algorithm: Iterate between the following two steps with an initial value θ. E-step: Calculate Q(θ; θ ) for a current value of θ ; M-step: Maximize Q(θ; θ ) with respect to θ.

The EM algorithm: the magic Conditional expectation: can be calculated in cases such as exponential family. Don t worry about H(θ; θ )! Jensen s inequality, Shannon s first theorem. Partial-likelihood always goes up! Convergence, local maximum.

Bayesian inference Parametric models: {p(data θ)}. A prior distribution of θ: π(θ), θ is a random variable. Posterior distribution of θ: p(θ data) = p(data θ) p(data θ)) π(θ)dθ MAP solution: the θ that maximizes it. Posterior mean: the expected value of θ w.r.t. the posterior distribution.

Gibbs sampling Conditional distributions of (X 1, X 2 ) known: p(x 1 = x 1 X 2 = x 2 ), p(x 2 = x 2 X 1 = x 1 ) Gibbs sampling scheme 1. Start with x (n) 2 ; 2. Generate x (n+1) 1 according to p(x 1 X 2 = x (n) 2 ); 3. Generate x (n+1) 2 according to p(x 2 X 1 = x (n+1) 1 ) and go back to Step 1; Joint distribution of (X 1, X 2 ): given by (x (n) 1, x(n) 2 ), n = 1, 2,.

Bayesian treatment of missing data problem Data: full data (Y, X); partial data Y. Parametric models: {p(y, X θ)}. What we need? p(x, θ Y) and hence p(θ Y) We can apply Gibbs sampling if we know the following: p(x θ, Y) p(θ X, Y)

Markov Chain Markov property: the future and past are independent given the present knowledge Markov models: Why do we need them? Time dependence and stochastic process Markov property: simple but general enough Characterized by transition matrix, {p ij }

Structure and notation Hidden process (Markov Chain): {X t } takes values from n states s i and transition probability matrix {p ij = P (X t+1 = s j X t = s i )}. Observation: each hidden state X t emits a random variable {O t } taking values from m letters v j, and emission probability {e jk = P (O t = v k X t = s i )}. Parameters λ: initial distribution of hidden states θ = (θ 1,, θ n ), {p ij }, {e jk }. Topology of the hidden Markov chain: represents our a priori knowledge. The time scale of observation process is not necessarily 1-D, alignment of two sequences.

The three basic problems in HMM Likelihood: what is the probability of a sequence of observations? Forward-backward algorithm. Parameter estimation: what are the maximum likelihood estimates of parameters? EM-algorithm? Decoding: what is the most likely sequence of states that produced a given sequence of observations? Viterbi decoding, marginal decoding.

The forward algorithm Let α t (i) = P (o 1 o 2 o t, X t = s i ; λ) 1. Initialization: α 1 (i) = θ i e i,o1 ; 2. Induction: α t+1 (j) = [ n α t (i)p ij ]e j,ot+1, i=1 3. Termination: P (O λ) = n i=1 α T (i). Complexity: n(n + 1)(T 1) + n multiplications and n(n 1)(T 1) additions.

The backward algorithm Let β t (i) = P (o t+1 o t+2 o T X t = s i ; λ) 1. Initialization: β T (i) = 1; 2. Induction: n β t (i) = p ij e j,ot+1 β t+1 (j), i=1 3. Termination: T P (O λ) = α T (i). Complexity: n 2 T computation. i=1

Marginal decoding Let γ t (i) = P (X t = s i O = o; λ), then γ t (i) = α t(i) β t (i) P (O = o λ) = α t (i) β t (i) n i=1 α t(i) β t (i). The state that maximizes this marginal posterior probability gives the solution of marginal decoding.

The Viterbi decoding Goal: find x = max 1 x P (x, O = o λ). Soul: optimal path on a directed acyclic graph (DAG). Intermediate variables in the recursion: Let η t (i) be the probability of most probable path ending in state s i. Namely, max P (X 1 = x x 1,,x 1,, X t 1 = x t 1, X t = s i, o 1o 2 o t λ) t 1 Keep track of the argument which maximizes the above quantity: ψ t (i).

The Viterbi decoding: recursion 1. Initialization: η 1 (i) = θ i e i,o1, ψ 1 (i) = 0; 2. Induction: η t (j) = max [η t 1(i)p ij ]e j,ot, 1 i n ψ t (j) = max 1 [η t 1 (i)p ij ]. 1 i n 3. Termination: P (x, O) = max [η T (i)], 1 i n x T = max 1 [η T (i)]. 1 i n 4. Traceback: x t = η t+1 (x t+1).

The EM algorithm in HMM Missing data in HMM: the hidden states X. Conditional expectation γ t (i) = P (X t = s i O = o; λ), ξ t (i, j) = P (X t = s i, X t+1 = s j O = o; λ), where ξ t (i, j) = n i=1 MLE of the full data ˆp ij = α t (i) p ij e j,ot+1 β t+1 (j) n i=1 α t(i) p ij e β j,ot+1 t+1(j) t=t 1 t=1 ξ t (i, j) t=t 1 t=1 γ t (i, j), ê jk = Computation: underflow. t=t t=1,o t=v k γ t (j) t=t t=1 γ t(j)