Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under CC BY-NC 4.0 by Mark Craven Colin Dewey and Anthony Gitter
Goals for Lecture Key concets: Markov Chain Monte Carlo (MCMC and Gibbs samling CS 760 slides for background Gibbs samling alied to the motif-finding task arameter tying incororating rior knowledge using Dirichlets and Dirichlet mixtures 2
Gibbs Samling: An Alternative to EM EM can get traed in local maxima One aroach to alleviate this limitation: try different (erhas random initial arameters Gibbs samling exloits randomized search to a much greater degree Can view it as a stochastic analog of EM for this task In theory Gibbs samling is less suscetible to local maxima than EM [Lawrence et al. Science 1993] 3
Gibbs Samling Aroach In the EM aroach we maintained a distribution over the ossible motif starting oints for each sequence at iteration t t Z ( i In the Gibbs samling aroach we ll maintain a secific starting oint for each sequence a i but we ll kee randomly resamling these 4
Gibbs Samling Algorithm for Motif Finding given: length arameter W training set of sequences choose random ositions for a do ick a sequence estimate given current motif ositions a (using all sequences but (redictive udate ste samle a new motif osition for (samling ste until convergence return: a X i X i a i X i 5
Markov Chain Monte Carlo (MCMC Consider a Markov chain in which on each time ste a grasshoer randomly chooses to stay in its current state jum one state left or jum one state right. 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.25 4 3 2 1 +1 +2 +3 +4 0.5 0.25 0.5 0.25 0.5 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 Figure from Koller & Friedman Probabilistic Grahical Models MIT Press Let P (t (u reresent the robability of being in state u at time t in the random walk (0 (0 (0 P (0 1 P ( 1 0 P ( 2 0 P P P (1 (2 (100 (0 0.5 (0 0.375 (0 0.11 P P P (1 (2 (100 ( 1 0.25 ( 1 0.25 ( 1 0.11 P P P (1 (2 (100 ( 2 0 ( 2 0.0625 ( 2 0.11 6
The Stationary Distribution Let P(u reresent the robability of being in state u at any given time in a random walk on the chain P P ( t ( t1 ( u ( u P ( t1 v ( u P ( t ( v ( u v robability of state v robability of transition vu The stationary distribution is the set of such robabilities for all states 7
Markov Chain Monte Carlo (MCMC We can view the motif finding aroach in terms of a Markov chain Each state reresents a configuration of the starting ositions (a i values for a set of random variables A 1 A n Transitions corresond to changing selected starting ositions (and hence moving to a new state A 1 =5 ACATCCG ACATCCG CGACTAC CGACTAC A 1 =3 ATTGAGC CGTTGAC GAGTGAT ATTGAGC CGTTGAC GAGTGAT TCGTTGG TCGTTGG ACAGGAT ( v u ACAGGAT TAGCTAT GCTACCG GGCCTCA TAGCTAT GCTACCG GGCCTCA state u state v 8
Markov Chain Monte Carlo In motif-finding task the number of states is enormous Key idea: construct Markov chain with stationary distribution equal to distribution of interest; use samling to find most robable states Detailed balance: P( u ( v u P( v ( u v robability of state u robability of transition uv When detailed balance holds: 1 lim N N count( u number of samles P( u 9
MCMC with Gibbs Samling Gibbs samling is a secial case of MCMC in which Markov chain transitions involve changing one variable at a time Transition robability is conditional robability of the changed variable given all others We samle the joint distribution of a set of random variables P A... A by iteratively samling from ( 1 n P( Ai A... Ai 1 Ai 1... A 1 n 10
Gibbs Samling Aroach Possible state transitions when first sequence is selected ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA 11
Gibbs Samling Aroach Lawrence et al. maximize the likelihood ratio P( X motif P( X background How do we get the transition robabilities when we don t know what the motif looks like? 12
Gibbs Samling Aroach The robability of a state is given by P( u c W j1 c j c0 n c j ( u count of c in motif osition j u ACATCCG CGACTAC ATTGAGC CGTTGAC GAGTGAT TCGTTGG ACAGGAT TAGCTAT GCTACCG GGCCTCA background robability for character c A C G T 1 2 3 1 5 2 n(u 3 2 2 2 3 1 1 6 2 robability of c in motif osition j See Liu et al. JASA 1995 for the full derivation 13
Samling New Motif Positions For each ossible starting osition A i j comute the likelihood ratio (leaving sequence i out of estimates of LR jw 1 c k j ( j jw 1 k j k k j1 c k 0 Randomly select a new starting osition A i j with robability LR( j LR( k k{starting ositions} 14
The Phase Shift Problem Gibbs samler can get stuck in a local maximum that corresonds to the correct solution shifted by a few bases Solution: add a secial ste to shift the a values by the same amount for all sequences Try different shift amounts and ick one in roortion to its robability score 15
Convergence of Gibbs true motif deleted from inut sequences 16
Using Background Knowledge to Bias the Parameters Let s consider two ways in which background knowledge can be exloited in motif finding 1. Accounting for alindromes that are common in DNA binding sites 2. Using Dirichlet mixture riors to account for biochemical similarity of amino acids 17
Using Background Knowledge to Bias the Parameters Many DNA motifs have a alindromic attern because they are bound by a rotein homodimer: a comlex consisting of two identical roteins Porteus & Carroll Nature Biotechnology 2005 18
Reresenting Palindromes Parameters in robabilistic models can be tied or shared a0 c0 g 0 t0 a1 c1 g 1 t1 a W c W g W t W During motif search try tying arameters according to alindromic constraint; accet if it increases likelihood ratio test (half as many arameters 19
Udating Tied Parameters W t t t W g g g W c c c W a a a 1 0 1 0 1 0 1 0 b b W b W b b b W t a W t a W t a d n d n d d n n ( ( 1 1 1 1 1 20
Including Prior Knowledge Recall that MEME and Gibbs udate arameters by: Can we use background knowledge to guide our choice of seudocounts ( d ck? Suose we re modeling rotein sequences b k b k b k c k c k c d n d n ( 21
Amino Acids Can we encode rior knowledge about amino acid roerties into the motif finding rocess? There are classes of amino acids that share similar roerties 22
Using Dirichlet Mixture Priors Prior for a single PWM column not the entire motif Because we re estimating multinomial distributions (frequencies of amino acids at each motif osition a natural way to encode rior knowledge is using Dirichlet distributions Let s consider the Beta distribution the Dirichlet distribution mixtures of Dirichlets 23
Suose we re taking a Bayesian aroach to estimating the arameter θ of a weighted coin The Beta distribution rovides an aroriate rior P( where h t The Beta Distribution ( h t ( ( h t 1 h (1 1 # of imaginary heads we have seen already # of imaginary tails we have seen already continuous generalization of factorial function t 0 1 24
Suose now we re given a data set D in which we observe D h heads and D t tails The Beta Distribution Beta( (1 ( ( ( ( 1 1 t t h h D D t t h h t h D D D D D D D P t t h h The osterior distribution is also Beta: we say that the set of Beta distributions is a conjugate family for binomial samling 25
The Dirichlet Distribution For discrete variables with more than two ossible values we can use Dirichlet riors Dirichlet riors are a conjugate family for multinomial data If P(θ is Dirichlet(α 1... α K then P(θ D is Dirichlet(α 1 +D 1... α K +D K where D i is the # occurrences of the i th value K i i K i i K i i i P 1 1 1 1 ( ( 26
Dirichlet Distributions Probability density (shown on a simlex of Dirichlet distributions for K=3 and various arameter vectors α (6 2 2 (3 7 5 (2 3 4 (6 2 6 Image from Wikiedia Python code adated from Thomas Boggs 27
Mixture of Dirichlets We d like to have Dirichlet distributions characterizing amino acids that tend to be used in certain roles Brown et al. [ISMB 93] induced a set of Dirichlets from trusted rotein alignments large charged and olar olar and mostly negatively charged hydrohobic uncharged nonolar etc. 28
Trusted Protein Alignments A trusted rotein alignment is one in which known rotein structures are used to determine which arts of the given set of sequences should be aligned 29
Using Dirichlet Mixture Priors Recall that the EM/Gibbs udate the arameters by: We can set the seudocounts using a mixture of Dirichlets: where is the j th Dirichlet comonent b k b k b k c k c k c d n d n ( ( ( ( j c k j j k c P d n ( j 30
Using Dirichlet Mixture Priors d c k j P( ( j n k ( j c robability of j th Dirichlet given observed counts arameter for character c in j th Dirichlet We don t have to know which Dirichlet to ick Instead we ll hedge our bets using the observed counts to decide how much to weight each Dirichlet See textbook section 11.5 31
Motif Finding: EM and Gibbs These methods comute local multile alignments Otimize the likelihood or likelihood ratio of the sequences EM converges to a local maximum Gibbs will converge to a global maximum in the limit; in a reasonable amount of time robably not Can take advantage of background knowledge by tying arameters Dirichlet riors There are many other methods for motif finding In ractice motif finders often fail motif signal may be weak large search sace many local minima do not consider binding context 32