Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Similar documents
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Learning Sequence Motif Models Using Gibbs Sampling

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Gibbs Sampling Methods for Multiple Sequence Alignment

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

Probabilistic Graphical Models

Generative Clustering, Topic Modeling, & Bayesian Inference

Programming Assignment 4: Image Completion using Mixture of Bernoullis

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Stephen Scott.

The value of prior knowledge in discovering motifs with MEME

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models. x 1 x 2 x 3 x K

STA 414/2104: Machine Learning

Topic Modelling and Latent Dirichlet Allocation

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Homework 6: Image Completion using Mixture of Bernoullis

Today s Lecture: HMMs

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

1.5 MM and EM Algorithms

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

STA 4273H: Statistical Machine Learning

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Notes on Machine Learning for and

Computational Genomics and Molecular Biology, Fall

Dynamic Approaches: The Hidden Markov Model

CSC321 Lecture 18: Learning Probabilistic Models

Basic math for biology

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

EM-algorithm for motif discovery

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Hidden Markov Models

Hidden Markov Models 1

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Alignment. Peak Detection

CSCE 478/878 Lecture 6: Bayesian Learning

Lecture 9. Intro to Hidden Markov Models (finish up)

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Statistical learning. Chapter 20, Sections 1 4 1

Non-Parametric Bayes

QB LECTURE #4: Motif Finding

O 3 O 4 O 5. q 3. q 4. Transition

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Introduction to Machine Learning

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

AN INTRODUCTION TO TOPIC MODELS

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

COS513 LECTURE 8 STATISTICAL CONCEPTS

Pattern Recognition and Machine Learning

Data Mining in Bioinformatics HMM

DNA Binding Proteins CSE 527 Autumn 2007

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Expectation-Maximization (EM) algorithm

Markov Models & DNA Sequence Evolution

Hidden Markov Models (I)

Hidden Markov Models. x 1 x 2 x 3 x K

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

CS Lecture 18. Topic Models and LDA

Hidden Markov Models

The Expectation-Maximization Algorithm

Alignment Algorithms. Alignment Algorithms

Bayesian Analysis for Natural Language Processing Lecture 2

Note for plsa and LDA-Version 1.1

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Modeling Environment

Statistical Pattern Recognition

MCMC: Markov Chain Monte Carlo

Quantitative Bioinformatics

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Probabilistic Graphical Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Hidden Markov Models. Terminology, Representation and Basic Problems

Lecture 8: Graphical models for Text

A brief introduction to Conditional Random Fields

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

1. Markov models. 1.1 Markov-chain

Hidden Markov Models

Pair Hidden Markov Models

Probabilistic models of biological sequence motifs

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Computational Cognitive Science

Machine Learning Lecture 5

Transcription:

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1

Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically repeated several times in the genome examples protein binding sites in DNA protein sequences corresponding to common functions or conserved pieces of structure 2

Sequence Motifs Example CAP-binding motif model based on 59 binding sites in E.coli helix-turn-helix motif model based on 100 aligned protein sequences Figure from Crooks et al., Genome Research 14:1188-90, 2004. 3

Sequence Logos Based on entropy (H) of random variable (X) representing distribution of character states at each position H(X) = p(x) log 2 p(x) x Height of logo at given position determined by decrease in entropy ( (from maximum) H max H(X) = log 2 N x p(x) log 2 p(x) ) 4

The Motif Model Learning Task given: a set of sequences that are thought to contain an unknown motif of interest do: infer a model of the motif predict the locations of the motif in the given sequences 5

Motifs and Profile Matrices (a.k.a. Position Weight Matrices) given a set of aligned sequences, it is straightforward to construct a profile matrix characterizing a motif of interest shared motif sequence positions 1 2 3 4 5 6 7 8 A 0.1 0.3 0.1 0.2 0.2 0.4 0.3 0.1 C 0.5 0.2 0.1 0.1 0.6 0.1 0.2 0.7 G 0.2 0.2 0.6 0.5 0.1 0.2 0.2 0.1 T 0.2 0.3 0.2 0.2 0.1 0.3 0.3 0.1 each element represents the probability of given character at a specified position 6

Motifs and Profile Matrices how can we construct the profile if the sequences aren t aligned? in the typical case we don t know what the motif looks like use an Expectation Maximization (EM) algorithm 7

The EM Approach EM is a family of algorithms for learning probabilistic models in problems that involve hidden state in our problem, the hidden state is where the motif starts in each training sequence 8

The MEME Algorithm Bailey & Elkan, 1993, 1994, 1995 uses EM algorithm to find multiple motifs in a set of sequences first EM approach to motif discovery: Lawrence & Reilly 1990 9

Representing Motifs in MEME a motif is assumed to have a fixed width, W represented by a matrix of probabilities: pc,k represents the probability of character c in column k also represent the background (i.e., outside the motif) probability of each character: pc,0 represents the probability of character c in the background 10

MEME Motif example example: a motif model of length 3 0 1 2 3 A 0.25 0.1 0.5 0.2 C 0.25 0.4 0.2 0.1 G 0.25 0.3 0.1 0.6 T 0.25 0.2 0.2 0.1 background 11

Basic EM Approach the element Z i,j the motif starts in position j in sequence i of the matrix Z represents the probability that example: given 4 DNA sequences of length 6, where W=3 G C T G T A G C T G T A G C T G T A G C T G T A 1 2 3 4 seq1 0.1 0.1 0.2 0.6 seq2 0.4 0.2 0.1 0.3 seq3 0.3 0.1 0.5 0.1 seq4 0.1 0.5 0.1 0.3 12

Basic EM Approach given: length parameter W, training set of sequences set initial values for p do re-estimate Z from p (E step) re-estimate p from Z (M-step) until change in p < ε return: p, Z 13

Calculating the Probability of a Sequence Given a Hypothesized Starting Position is the ith sequence before motif motif after motif is 1 if motif starts at position j in sequence i is the character at position k in sequence i 14

Example G C T G T A G 0 1 2 3 A 0.25 0.1 0.5 0.2 C 0.25 0.4 0.2 0.1 G 0.25 0.3 0.1 0.6 T 0.25 0.2 0.2 0.1 15

The E-step: Estimating Z to estimate the starting positions in Z at step t this comes from Bayes rule applied to 16

The E-step: Estimating Z assume that, a priori, it is equally likely that the motif will start in any position 17

Example: Estimating Z G C T G T A G 0 1 2 3 A 0.25 0.1 0.5 0.2 C 0.25 0.4 0.2 0.1 G 0.25 0.3 0.1 0.6 T 0.25 0.2 0.2 0.1... then normalize so that 18

The M-step: Estimating p recall represents the probability of character c in position k ; values for k=0 represent the background pseudo-counts expected # of c s at position k of motif total # of c s in data set 19

Example: Estimating p A C A G C A A G G C A G T C A G T C 20

The ZOOPS Model the approach as we ve outlined it, assumes that each sequence has exactly one motif occurrence per sequence; this is the OOPS model the ZOOPS model assumes zero or one occurrences per sequence 21

E-step in the ZOOPS Model we need to consider another alternative: the ith sequence doesn t contain the motif we add another parameter (and its relative) prior probability that any position in a sequence is the start of a motif prior probability of a sequence containing a motif 22

E-step in the ZOOPS Model Q i is a random variable for which Q i = 1 if sequence X i contains a motif, Q i = 0 otherwise 23

M-step in the ZOOPS Model update p same as before update as follows: λ (t+1) = 1 n(l W + 1) n i=1 L W +1 j=1 Z (t) i,j γ (t+1) λ (t+1) (L W + 1) 24

The TCM Model the TCM (two-component mixture model) assumes zero or more motif occurrences per sequence 25

Likelihood in the TCM Model the TCM model treats each length W subsequence independently to determine the likelihood of such a subsequence: assuming a motif starts there assuming a motif doesn t start there 26

E-step in the TCM Model subsequence isn t a motif subsequence is a motif M-step same as before 27

Extending the Basic EM Approach in MEME How to choose the width of the motif? How to find multiple motifs in a group of sequences? How to choose good starting points for the parameters? How to use background knowledge to bias the parameters? 28

Choosing the Width of the Motif try various widths estimate the parameters each time apply a likelihood ratio test based on probability of data under motif model probability of data under null model penalized by # of parameters in the model 29

Finding Multiple Motifs we might want to find multiple motifs in a given set of sequences how can we do this without rediscovering previously learned motifs iteration 1 iteration 2 discovering a motif that substantially overlaps with previously learned motifs iteration 1 iteration 2 30

Finding Multiple Motifs basic idea: discount the likelihood that a new motif starts in a given position if this motif would overlap with a previously learned one when re-estimating, multiply by 31

Finding Multiple Motifs To determine need to take into account individual positions in the window Use Ui,j random variables to encode positions that are not part of a motif 32

Finding Multiple Motifs pass p U (p) i,j = U (p 1) i,j Z values at end of pass p ( ) 1 max k=j W +1...j Z(p) i,k 33

Finding Multiple Motifs 34

Starting Points in MEME EM is susceptible to local maxima for every distinct subsequence of length W in the training set derive an initial p matrix from this subsequence run EM for one iteration choose motif model (i.e., p matrix) with highest likelihood run EM to convergence 35

Using Subsequences as Starting Points for EM set values corresponding to letters in the subsequence to some value π set other values to (1- π)/(n-1) where N is the size of the alphabet example: for the subsequence TAT with π =0.5 1 2 3 A 0.17 0.5 0.17 C 0.17 0.17 0.17 G 0.17 0.17 0.17 T 0.5 0.17 0.5 36

Final MEME algorithm 37

Using Background Knowledge to Bias the Parameters accounting for palindromes that are common in DNA binding sites using Dirichlet mixture priors to account for biochemical similarity of amino acids 38

Representing Palindromes parameters in probabilistic models can be tied or shared during motif search, try tying parameters according to palindromic constraint; accept if it increases likelihood test (half as many parameters) 39

Dirichlet mixtures in MEME Useful for protein motifs Amino acids can be grouped into classes based on certain properties (size, charge, etc.) Idea: use different pseudocounts in columns depending on most likely classes 40

Amino acids 41

Dirichlet distribution Each motif column corresponds to a multinomial distribution p k Using pseudocounts for estimating p k is essentially equivalent to using Dirichlet prior Dirichlet distribution is the conjugate prior to the multinomial Multinomial Dirichlet P[n θ] = M 1 (n) P[θ β] = Z 1 (β) K i=1 K θ n i i i=1 θ β i 1 i δ K i=1 θ i 1 42

Mixtures of Dirichlets P[p k β 1,..., β R ] = Given c k (expected counts of characters in column k), R q i D(p k β i ) i=1 Think of q i as prior probability of ith Dirichlet distribution. ior probability of th Dirichlet distribu P[p k c k ] = P[p k β i, c k ]P[β i c k ] = i i D[p k β i + c k ]P[β i c k ] P[β i c k ] = q ip[c k β i ] j q jp[c k β j ] PME: posterior mean estimate (different than maximum likelihood estimate) p P ME k = c k + d k c k + d k where d k,π = R P[β i c k ]βπ i i=1 See chapter 11 of textbook for more details 43

Maximum Likelihood Estimation During Maximization (M) step of the EM algorithm, we have to find the parameters that maximize the expected log likelihood How to do this in general? Take derivative of expected log likelihood with respect to parameter to be optimized Set derivative equal to zero and solve for parameter Use constrained optimization techniques for dependent parameters 44

Likelihood in MEME OOPS model n m l = log P[X, Z θ] = Z i,j log P[X i Z i,j = 1, θ] + n log 1 m i=1 j=1 n m W = Z i,j ( I(i, j + k 1) T log p k + I(i, k) T log p 0 ) + n log 1 m i=1 j=1 k=1 k= i,j X i : sequence i in data set (i = 1,..., n) Z i,j : binary indicator random variable, 1 if motif starts at position j in sequence i I(i, j) : vector-valued indicator variable, I(i, j) π = 1 if X i,j = π, or 0 otherwise. i,j = {1, 2,..., j 1, j + w,..., L} 45

Constrained optimization l = n m W Z i,j ( I(i, j + k 1) T log p k + I(i, k) T log p 0 ) + n log 1 W m + λ k (1 j=1 k=1 k= i,j k=0 π i=1 Use Lagrange multipliers for parameter constraint p k,π ) l p k,π = n i=1 m j=1 Z i,j I(i, j + k 1) π 1 p k,π λ k, for k > 0 Maximizing parameters of this function are maximizing parameters of original, subject to constraints 46

Constrained optimization Set derivative to zero, solve for Solve for 0 = λ k = n i=1 1 ˆp k,π λ k = c k,π ˆp k,π ˆp k,π λ k = c k,π ˆp k,π λ k = π π λ k = c k ˆp k,π ˆp k,π = c k,π λ k m j=1 n i=1 c k,π Z i,j I(i, j + k 1) π 1 ˆp k,π λ k m Z i,j I(i, j + k 1) π j=1 = c k,π c k λ k 47

Hidden Markov model representation For each sequence Xi Si,1 Si,2 Si,3 Si,L Xi,1 Xi,2 Xi,3 Xi,L Xi,j: observed character j of sequence i Si,j: hidden state j for sequence i Si,j ε {0, 1, 2,..., W} Emission probabilities = profile matrix probabilities (state = column in profile) 48

OOPS HMM state transitions Start 0s 1 2 W 0e End ZOOPS Start 0s 1 2 W 0e End??? 0 Start 1 2 W End 49