Learning Sequence Motif Models Using Gibbs Sampling

Similar documents
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Bayesian Networks Practice

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian classification CISC 5800 Professor Daniel Leeds

Introduction to Probability for Graphical Models

Collaborative Place Models Supplement 1

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Radial Basis Function Networks: Algorithms

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Biitlli Biointelligence Laboratory Lb School of Computer Science and Engineering. Seoul National University

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

Multiple Whole Genome Alignment

8 STOCHASTIC PROCESSES

Gibbs Sampling Methods for Multiple Sequence Alignment

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Named Entity Recognition using Maximum Entropy Model SEEM5680

Comparative Network Analysis

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

The Binomial Approach for Probability of Detection

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Bayesian System for Differential Cryptanalysis of DES

Inferring Models of cis-regulatory Modules using Information Theory

Introduction to Probability and Statistics

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Notes on Instrumental Variables Methods

MCMC: Markov Chain Monte Carlo

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Homework Solution 4 for APPM4/5560 Markov Processes

Bayesian Methods for Machine Learning

Bayesian Networks Practice

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Analysis of some entrance probabilities for killed birth-death processes

Session 5: Review of Classical Astrodynamics

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

arxiv: v1 [physics.data-an] 26 Oct 2012

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

Bayesian Inference and MCMC

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

A Time-Varying Threshold STAR Model of Unemployment

Analysis of Group Coding of Multiple Amino Acids in Artificial Neural Network Applied to the Prediction of Protein Secondary Structure

Uniform Law on the Unit Sphere of a Banach Space

Distributed Rule-Based Inference in the Presence of Redundant Information

1 Probability Spaces and Random Variables

Topic Modelling and Latent Dirichlet Allocation

1 Extremum Estimators

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Lecture 7: Linear Classification Methods

Information collection on a graph

Convex Optimization methods for Computing Channel Capacity

CMSC 425: Lecture 4 Geometry and Geometric Programming

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

General Linear Model Introduction, Classes of Linear models and Estimation

PArtially observable Markov decision processes

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Empirical Bayesian EM-based Motion Segmentation

QUIZ ON CHAPTER 4 - SOLUTIONS APPLICATIONS OF DERIVATIVES; MATH 150 FALL 2016 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Information collection on a graph

Principal Components Analysis and Unsupervised Hebbian Learning

CS 361: Probability & Statistics

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Introduction to Optimization (Spring 2004) Midterm Solutions

Chapter 1: PROBABILITY BASICS

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 10: The Bayesian way to fit models. Geoffrey Hinton

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning

Modeling Environment

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Particle-Based Approximate Inference on Graphical Model

Non-Parametric Bayes

Generative Clustering, Topic Modeling, & Bayesian Inference

Hidden Predictors: A Factor Analysis Primer

Numerical Linear Algebra

Sampling. Inferential statistics draws probabilistic conclusions about populations on the basis of sample statistics

Brownian Motion and Random Prime Factorization

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Feedback-error control

Solution: (Course X071570: Stochastic Processes)

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

Hidden Markov Models

Approximating min-max k-clustering

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

State Estimation with ARMarkov Models

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

LIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Ordered and disordered dynamics in random networks

Transcription:

Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under CC BY-NC 4.0 by Mark Craven Colin Dewey and Anthony Gitter

Goals for Lecture Key concets: Markov Chain Monte Carlo (MCMC and Gibbs samling CS 760 slides for background Gibbs samling alied to the motif-finding task arameter tying incororating rior knowledge using Dirichlets and Dirichlet mixtures 2

Gibbs Samling: An Alternative to EM EM can get traed in local maxima One aroach to alleviate this limitation: try different (erhas random initial arameters Gibbs samling exloits randomized search to a much greater degree Can view it as a stochastic analog of EM for this task In theory Gibbs samling is less suscetible to local maxima than EM [Lawrence et al. Science 1993] 3

Gibbs Samling Aroach In the EM aroach we maintained a distribution over the ossible motif starting oints for each sequence at iteration t t Z ( i In the Gibbs samling aroach we ll maintain a secific starting oint for each sequence a i but we ll kee randomly resamling these 4

Gibbs Samling Algorithm for Motif Finding given: length arameter W training set of sequences choose random ositions for a do ick a sequence estimate given current motif ositions a (using all sequences but (redictive udate ste samle a new motif osition for (samling ste until convergence return: a X i X i a i X i 5

Markov Chain Monte Carlo (MCMC Consider a Markov chain in which on each time ste a grasshoer randomly chooses to stay in its current state jum one state left or jum one state right. 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.25 4 3 2 1 +1 +2 +3 +4 0.5 0.25 0.5 0.25 0.5 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 Figure from Koller & Friedman Probabilistic Grahical Models MIT Press Let P (t (u reresent the robability of being in state u at time t in the random walk (0 (0 (0 P (0 1 P ( 1 0 P ( 2 0 P P P (1 (2 (100 (0 0.5 (0 0.375 (0 0.11 P P P (1 (2 (100 ( 1 0.25 ( 1 0.25 ( 1 0.11 P P P (1 (2 (100 ( 2 0 ( 2 0.0625 ( 2 0.11 6

The Stationary Distribution Let P(u reresent the robability of being in state u at any given time in a random walk on the chain P P ( t ( t1 ( u ( u P ( t1 v ( u P ( t ( v ( u v robability of state v robability of transition vu The stationary distribution is the set of such robabilities for all states 7

Markov Chain Monte Carlo (MCMC We can view the motif finding aroach in terms of a Markov chain Each state reresents a configuration of the starting ositions (a i values for a set of random variables A 1 A n Transitions corresond to changing selected starting ositions (and hence moving to a new state A 1 =5 ACATCCG ACATCCG CGACTAC CGACTAC A 1 =3 ATTGAGC CGTTGAC GAGTGAT ATTGAGC CGTTGAC GAGTGAT TCGTTGG TCGTTGG ACAGGAT ( v u ACAGGAT TAGCTAT GCTACCG GGCCTCA TAGCTAT GCTACCG GGCCTCA state u state v 8

Markov Chain Monte Carlo In motif-finding task the number of states is enormous Key idea: construct Markov chain with stationary distribution equal to distribution of interest; use samling to find most robable states Detailed balance: P( u ( v u P( v ( u v robability of state u robability of transition uv When detailed balance holds: 1 lim N N count( u number of samles P( u 9

MCMC with Gibbs Samling Gibbs samling is a secial case of MCMC in which Markov chain transitions involve changing one variable at a time Transition robability is conditional robability of the changed variable given all others We samle the joint distribution of a set of random variables P A... A by iteratively samling from ( 1 n P( Ai A... Ai 1 Ai 1... A 1 n 10

Gibbs Samling Aroach Possible state transitions when first sequence is selected ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA 11

Gibbs Samling Aroach Lawrence et al. maximize the likelihood ratio P( X motif P( X background How do we get the transition robabilities when we don t know what the motif looks like? 12

Gibbs Samling Aroach The robability of a state is given by P( u c W j1 c j c0 n c j ( u count of c in motif osition j u ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA background robability for character c A C G T 1 2 3 1 5 2 n(u 3 2 2 2 3 1 1 6 2 robability of c in motif osition j See Liu et al. JASA 1995 for the full derivation 13

Samling New Motif Positions For each ossible starting osition A i j comute the likelihood ratio (leaving sequence i out of estimates of LR jw 1 c k j ( j jw 1 k j k k j1 c k 0 Randomly select a new starting osition A i j with robability LR( j LR( k k{starting ositions} 14

The Phase Shift Problem Gibbs samler can get stuck in a local maximum that corresonds to the correct solution shifted by a few bases Solution: add a secial ste to shift the a values by the same amount for all sequences Try different shift amounts and ick one in roortion to its robability score 15

Convergence of Gibbs true motif deleted from inut sequences 16

Using Background Knowledge to Bias the Parameters Let s consider two ways in which background knowledge can be exloited in motif finding 1. Accounting for alindromes that are common in DNA binding sites 2. Using Dirichlet mixture riors to account for biochemical similarity of amino acids 17

Using Background Knowledge to Bias the Parameters Many DNA motifs have a alindromic attern because they are bound by a rotein homodimer: a comlex consisting of two identical roteins Porteus & Carroll Nature Biotechnology 2005 18

Reresenting Palindromes Parameters in robabilistic models can be tied or shared a0 c0 g 0 t0 a1 c1 g 1 t1 a W c W g W t W During motif search try tying arameters according to alindromic constraint; accet if it increases likelihood ratio test (half as many arameters 19

Udating Tied Parameters W t t t W g g g W c c c W a a a 1 0 1 0 1 0 1 0 b b W b W b b b W t a W t a W t a d n d n d d n n ( ( 1 1 1 1 1 20

Including Prior Knowledge Recall that MEME and Gibbs udate arameters by: Can we use background knowledge to guide our choice of seudocounts ( d ck? Suose we re modeling rotein sequences b k b k b k c k c k c d n d n ( 21

Amino Acids Can we encode rior knowledge about amino acid roerties into the motif finding rocess? There are classes of amino acids that share similar roerties 22

Using Dirichlet Mixture Priors Prior for a single PWM column not the entire motif Because we re estimating multinomial distributions (frequencies of amino acids at each motif osition a natural way to encode rior knowledge is using Dirichlet distributions Let s consider the Beta distribution the Dirichlet distribution mixtures of Dirichlets 23

Suose we re taking a Bayesian aroach to estimating the arameter θ of a weighted coin The Beta distribution rovides an aroriate rior P( where h t The Beta Distribution ( h t ( ( h t 1 h (1 1 # of imaginary heads we have seen already # of imaginary tails we have seen already continuous generalization of factorial function t 0 1 24

Suose now we re given a data set D in which we observe D h heads and D t tails The Beta Distribution Beta( (1 ( ( ( ( 1 1 t t h h D D t t h h t h D D D D D D D P t t h h The osterior distribution is also Beta: we say that the set of Beta distributions is a conjugate family for binomial samling 25

The Dirichlet Distribution For discrete variables with more than two ossible values we can use Dirichlet riors Dirichlet riors are a conjugate family for multinomial data If P(θ is Dirichlet(α 1... α K then P(θ D is Dirichlet(α 1 +D 1... α K +D K where D i is the # occurrences of the i th value K i i K i i K i i i P 1 1 1 1 ( ( 26

Dirichlet Distributions Probability density (shown on a simlex of Dirichlet distributions for K=3 and various arameter vectors α (6 2 2 (3 7 5 (2 3 4 (6 2 6 Image from Wikiedia Python code adated from Thomas Boggs 27

Mixture of Dirichlets We d like to have Dirichlet distributions characterizing amino acids that tend to be used in certain roles Brown et al. [ISMB 93] induced a set of Dirichlets from trusted rotein alignments large charged and olar olar and mostly negatively charged hydrohobic uncharged nonolar etc. 28

Trusted Protein Alignments A trusted rotein alignment is one in which known rotein structures are used to determine which arts of the given set of sequences should be aligned 29

Using Dirichlet Mixture Priors Recall that the EM/Gibbs udate the arameters by: We can set the seudocounts using a mixture of Dirichlets: where is the j th Dirichlet comonent b k b k b k c k c k c d n d n ( ( ( ( j c k j j k c P d n ( j 30

Using Dirichlet Mixture Priors d c k j P( ( j n k ( j c robability of j th Dirichlet given observed counts arameter for character c in j th Dirichlet We don t have to know which Dirichlet to ick Instead we ll hedge our bets using the observed counts to decide how much to weight each Dirichlet See textbook section 11.5 31

Motif Finding: EM and Gibbs These methods comute local multile alignments Otimize the likelihood or likelihood ratio of the sequences EM converges to a local maximum Gibbs will converge to a global maximum in the limit; in a reasonable amount of time robably not Can take advantage of background knowledge by tying arameters Dirichlet riors There are many other methods for motif finding In ractice motif finders often fail motif signal may be weak large search sace many local minima do not consider binding context 32