Gibbs sampling. Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark.

Size: px

Start display at page:

Download "Gibbs sampling. Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark."

Austin Richards
6 years ago
Views:

1 Gibbs sampling Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark Technical University of Denmark 1

2 Monte Carlo simulations MC methods use repeated random sampling to numerically approximate solutions to problems Technical University of Denmark 2

3 Monte Carlo simulations A simple example: computing π with sampling Technical University of Denmark 3

4 Monte Carlo simulations A simple example: computing π with sampling r A c = πr 2 A = ( 2r) 2 s Technical University of Denmark 4

5 Monte Carlo simulations A simple example: computing π with sampling r A c = πr 2 A c A s = πr2 4r 2 = π 4 A = ( 2r) 2 s π = 4 A c A s Technical University of Denmark 5

6 Monte Carlo simulations A simple example: computing π with sampling π = 4 A c A s Technical University of Denmark 6

7 Monte Carlo simulations A simple example: computing π with sampling X X X Throw darts randomly hit circle hit square = hit hit +miss = A c A s π = 4 A c A s Technical University of Denmark 7

8 Monte Carlo simulations A simple example: computing π with sampling hit=0 for N iterations x = random(-1,1) y = random(-1,1) dist=sqrt(x 2 +y 2 ) X if (dist<1) hit++ π = 4 A c A s Technical University of Denmark 8

9 Monte Carlo simulations A simple example: computing π with sampling hit=0 for N iterations x = random(-1,1) y = random(-1,1) dist=sqrt(x 2 +y 2 ) X if (dist<1) hit++ pi = 4 * hit/n π = 4 A c A s Technical University of Denmark 9

10 Monte Carlo simulations A simple example: computing π with sampling Technical University of Denmark 10

11 Monte Carlo simulations A simple example: computing π with sampling - More iterations more accurate estimate - After 1,000,000 iterations I got pi 3, Technical University of Denmark 11

12 Gibbs sampling A special kind of Monte Carlo method (Markov Chain Monte Carlo, or MCMC) - estimates a distribution by sampling from it - the samples are taken with pseudo-random steps - stepping to the next state only depends on the current state (memory-less chain) Technical University of Denmark 12

13 Gibbs sampling f(z) Stochastic search Z Technical University of Denmark 13

14 Gibbs sampling f(z) Stochastic search de = f (Z i ) f (Z i 1 ) P = min 1,exp de T Z i = current state of the system P = probability of accepting the move T = a scalar lowered during the search Z Technical University of Denmark 14

15 Gibbs sampling - down to biology Sequence alignment SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT de = f (Z i ) f (Z i 1 ) P = min 1,exp de T Z i = current state of the system P = probability of accepting the move T = a scalar lowered during the search Technical University of Denmark 15

16 Gibbs sampling - down to biology Sequence alignment SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT de = f (Z i ) f (Z i 1 ) P = min 1,exp de T Z i = current state of the system P = probability of accepting the move T = a scalar lowered during the search E = C p,a p,a log p p,a q a de = E i E i 1 Technical University of Denmark 16

17 Gibbs sampling - sequence alignment State transition SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT move to state +1 E = C p,a p,a log p p,a q a de = E i E i 1 Technical University of Denmark 17

18 Gibbs sampling - sequence alignment State transition SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT move to state +1 SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT E = C p,a p,a log p p,a q a de = E i E i 1 Accept or reject the move? P = min 1,exp de T Technical University of Denmark 18 Note that the probability of going to the new state only depends on the previous state

19 Gibbs sampling - sequence alignment SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT Numerical example - 1 move to state +1 T = 0.2 SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT E i 1 = 2.44 E i = 2.52 P = min 1,exp 0.08 = min 1, [ ] =1 Accept move with Prob = 100% Technical University of Denmark 19

20 Gibbs sampling - sequence alignment SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT Numerical example - 2 move to state +1 T = 0.2 SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT E i 1 = 2.44 E i = 2.35 P = min 1,exp 0.09 = min 1, [ ] = Accept move with Prob = 63.8% Technical University of Denmark 20

21 Gibbs sampling - sequence alignment Now, one thing at a time Technical University of Denmark 21

22 Gibbs sampling - sequence alignment T What is the MC temperature? it s a scalar decreased during the simulation iteration Technical University of Denmark 22

23 Gibbs sampling - sequence alignment T What is the MC temperature? it s a scalar decreased during the simulation t 1 =0.4 P(t 1 ) = min 1,exp de = min 1,exp 0.3 = 0.47 t E.g. same de=-0.3 but at different temperatures t 2 =0.1 P(t 2 ) = min 1,exp 0.3 = P(t 3 ) = min 1,exp t 3 =0.02 iteration Technical University of Denmark 23

24 Technical University of Denmark 24

25 f(z) Move freely around states when the system is warm, then cool it off to force it into a state of high fitness Technical University of Denmark 25 Z

26 Gibbs sampling - sequence alignment Why sampling? 50 sequences 12 amino acids long try all possible combinations with a 9-mer overlap 4 50 ~ possible combinations...computationally unfeasible SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT DFAAQVDYPSTGLY Technical University of Denmark 26

27 Single sequence move SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT move to state +1 SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT E = C p,a p,a log p p,a q a de = E i E i 1 Accept or reject the move? P = min 1,exp de T Technical University of Denmark 27

28 Phase shift move SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT move to state +1 shift all sequences SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT E = C p,a p,a log p p,a q a de = E i E i 1 Accept or reject the move? P = min 1,exp de T Technical University of Denmark 28

29 A sketch for the alignment algorithm Start from a random alignment Set initial temperature For N iterations pick a random sequence suggest a shift move accept or reject the move depending on P = min 1,exp de T every P sh moves, attempt a phase shift move decrease temperature Technical University of Denmark 29

30 Does it work? Technical University of Denmark 30

31 Gibbs sequence alignment - performance Technical University of Denmark 31

32 More Gibbs sampling Aligning scoring matrices Technical University of Denmark 32

33 Alignment of scoring matrices 4 networks trained on HLA*DRB Technical University of Denmark 33

34 Alignment of scoring matrices Combined logo Equally valid solutions, but with different core registers Technical University of Denmark 34

35 The PSSM-align algorithm Individual PSSM 20 L Technical University of Denmark 35

36 The PSSM-align algorithm Individual PSSM L 1. Extend matrix with BG frequencies Technical University of Denmark 36

37 The PSSM-align algorithm All individual PSSMs L 1. Extend matrix with BG frequencies Technical University of Denmark 37

38 The PSSM-align algorithm All individual PSSMs L 1. Extend matrix with BG frequencies Technical University of Denmark 38

39 The PSSM-align algorithm All individual PSSMs L 1. Extend matrix with BG frequencies 2. Apply random shift Technical University of Denmark 39

40 The PSSM-align algorithm core 1. Extend matrix with BG frequencies 2. Apply random shift 3. Do Gibbs sampling for many iterations Accept moves with probability: P = min 1,exp de T Maximize combined Information Content of the core Technical University of Denmark 40

41 The PSSM-align algorithm Offset core 1. Extend matrix with BG frequencies 2. Apply random shift 3. Do Gibbs sampling for many iterations Avg matrix Maximize combined Information Content of the core Technical University of Denmark 41

42 Alignment of scoring matrices before alignment after alignment Technical University of Denmark 42

43 And more Gibbs sampling Clustering peptide data Technical University of Denmark 43

44 Gibbs clustering Multiple motifs SLFIGLKGDIRESTV DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF SFSCIAIGIITLYLG IDQVTIAGAKLRSLN WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT NKVKSLRILNTRRKL MMGMFNMLSTVLGVS AKSSPAYPSVLGQTI RHLIFCHSKKKCDELAAK Cluster SLFIGLKGDIRESTV-- --DGEEEVQLIAAVPGK VFRLKGGAPIKGVTF ---SFSCIAIGIITLYLG IDQVTIAGAKLRSLN-- WIQKETLVTFKNPHAKKQDV KMLLDNINTPEGIIP Cluster 2 --ELLEFHYYLSSKLNK LNKFISPKSVAGRFA ESLHNPYPDYHWLRT NKVKSLRILNTRRKL MMGMFNMLSTVLGVS---- AKSSPAYPSVLGQTI RHLIFCHSKKKCDELAAK- Technical University of Denmark 44

45 Gibbs clustering - the algorithm 1. List of peptides FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK Technical University of Denmark 45

46 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS GMFNMLSTV SSPAYPSVL----- g 1 g 2 g N -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA----- Technical University of Denmark 46

47 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS GMFNMLSTV SSPAYPSVL----- g 1 g 2 g N 3 Move sequence -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA----- GMFNMLSTV Technical University of Denmark 47

48 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS SSPAYPSVL----- g 1 g 2 g N 3 Move sequence -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA----- GMFNMLSTV 4b. Remove peptide from its group I Technical University of Denmark 48

49 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS SSPAYPSVL----- g 1 g 2 g N 3 Move sequence -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA----- GMFNMLSTV 5b. Score peptide to a new random group R and in its original group I 4b. Remove peptide from its group I de = S R S I Technical University of Denmark 49

50 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS SSPAYPSVL----- g 1 g 2 g N 3 Move sequence -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA GMFNMLSTV b. Score peptide to a new random group R and in its original group I 4b. Remove peptide from its group I de = S R S I GMFNMLSTV 6b. Accept or reject move P = min 1,exp de T Technical University of Denmark 50

51 Gibbs clustering - the algorithm 1. List of peptides 2. create N random groups FIGLKGDIR EEEVQLIAA RLKGGAPIK SCIAIGIIT QVTIAGAKL QKETLVTFK LLDNINTPE LEFHYYLSS KFISPKSVA LHNPYPDYH VKSLRILNT GMFNMLSTV SSPAYPSVL LIFCHSKKK -----QVTIAGAKL QKETLVTFK LEFHYYLSS SSPAYPSVL----- g 1 g 2 g N 3 Move sequence And iterate many times, gradually decreasing T -----SLFIGLKGD SFSCIAIGI KMLLDNINT KYVHGTWRS NKVKSLRIL LHNPYPDYH LIFCHSKKK RLKGGAPIK KFISPKSVA EEEVQLIAA GMFNMLSTV b. Score peptide to a new random group R and in its original group I 4b. Remove peptide from its group I de = S R S I GMFNMLSTV 6b. Accept or reject move P = min 1,exp de T Technical University of Denmark 51

52 Does it work? Mixture of 100 binders for the two alleles Two MHC class I alleles: HLA-A*0101 and HLA-B*4402 ATDKAAAAY A*0101 EVDQTKIQY A*0101 AETGSQGVY B*4402 ITDITKYLY A*0101 AEMKTDAAT B*4402 FEIKSAKKF B*4402 LSEMLNKEY A*0101 GELDRWEKI B*4402 LTDSSTLLV A*0101 FTIDFKLKY A*0101 TTTIKPVSY A*0101 EEKAFSPEV B*4402 AENLWVPVY B*4402 Technical University of Denmark 52

53 Two MHC class I alleles: HLA-A*0101 and HLA-B*4402 Mixed G 1 A0101 B4402 G 2 Technical University of Denmark 53

54 Two MHC class I alleles: HLA-A*0101 and HLA-B*4402 Mixed G 1 A0101 B G Resolved Technical University of Denmark 54

55 Five MHC class I alleles G 0 G 1 G 2 G 3 G 4 A0101 A0201 A0301 B0702 B4402 Technical University of Denmark 55

56 Five MHC class I alleles G 0 G 1 G 2 G 3 G 4 A0101 A0201 A0301 B0702 B HLA-A % HLA-A % HLA-A % HLA-B % HLA-B % Technical University of Denmark 56

57 HLA-A*02:01 sub-motifs 666 peptide binders (aff < 500 nm) <Aff> = 10 nm <Th> = 4 hours <Aff> = 10 nm <Th> = 1.5 hours Technical University of Denmark 57

58 Splitting with Gibbs clustering <Aff> = 10 nm <Th> = 3.5 hours <Aff> = 10 nm <Th> = 2.25 hours Technical University of Denmark 58

59 Gibbs clustering And what if we don t know a priori the number of clusters? Technical University of Denmark 59

60 How many clusters? We could run the algorithm with different number of clusters k and choose the k with highest information content Technical University of Denmark 60

61 How many clusters? We could run the algorithm with different number of clusters k and choose the k with highest information content What s going on? Technical University of Denmark 61

62 How many clusters? We could run the algorithm with different number of clusters k and choose the k with highest information content What s going on? smaller groups tend to have higher information content Technical University of Denmark 62

63 How many clusters? Let s look back at the Energy function E = C p,a p,a log p p,a q a Technical University of Denmark 63

64 How many clusters? Let s look back at the Energy function E = C p,a p,a log p p,a q a This is equivalent to scoring each sequence S to its matrix E = S p,a log p p,a q a 20 L Technical University of Denmark 64

65 How many clusters? Let s look back at the Energy function E = C p,a p,a log p p,a q a This is equivalent to scoring each sequence S to its matrix E = S p,a log p p,a q a 20 L What is the problem? Overfitting. S was also used to calculate the log-odds matrix The contribution of S on the matrix will be larger if the cluster is small. Technical University of Denmark 65

66 How many clusters? Let s look back at the Energy function E = C p,a p,a log p p,a q a This is equivalent to scoring each sequence S to its matrix E = S p,a log p p,a q a 20 L What is the problem? Overfitting. S was also used to calculate the log-odds matrix The contribution of S on the matrix will be larger if the cluster is small. Technical University of Denmark 66

67 How many clusters? E = S p,a log p p,a q a Before scoring S, remove it and update the matrix E = S p,a log p S p,a q a What is the problem? Overfitting. S was also used to calculate the log-odds matrix The contribution of S on the matrix will be larger if the cluster is small. Technical University of Denmark 67

68 How many clusters? YQAFRTKVH SPRTLNAWV YALTVVWLL LSSIGIPAY AVAKCNLNH TPYDINQML LLMMTLPSI KELENEYYF IENATFFIF AEMLASIDL... E = log p p,a E = log pp,a S p,a q a p,a Is this so important..? S S q a Technical University of Denmark 68

69 How many clusters? YQAFRTKVH SPRTLNAWV YALTVVWLL LSSIGIPAY AVAKCNLNH TPYDINQML LLMMTLPSI KELENEYYF IENATFFIF AEMLASIDL... E = log p p,a E = log pp,a S p,a SCORE w/o removing q a Is this so important..? YES Technical University of Denmark 69 S p,a S q a Num of sequences in the cluster removing Score YALTVVWLL to a matrix, including vs. excluding YALTVVWLL in the matrix construction

70 How many clusters? Quality of clustering is not only determined by information content of individual clusters (intracluster distance), but also by the ability of different groups to discriminate (inter-cluster distance) Technical University of Denmark 70

71 How many clusters? Quality of clustering is not only determined by information content of individual clusters (intracluster distance), but also by the ability of different groups to discriminate (inter-cluster distance) E = log p S p,a E = log pp,a S S p,a q a S p,a S q p,a position and cluster-specific background (the background is calculated on all groups not containing S, it accounts for inter-cluster distance) Technical University of Denmark 71

72 How many clusters? One last thing and we are ready. E = S p,a log p S p,a S q p,a λn A parameter λ to modulate the tightness of the clustering (n is the number of clusters) Technical University of Denmark 72

73 How many clusters? One last thing and we are ready. E = S p,a log p S p,a S q p,a λn frequencies are calculated by removing the sequence being scored S position and cluster-specific background (the background is calculated on all groups not containing S, it accounts for intercluster distance) A parameter λ to modulate the tightness of the clustering (n is the number of clusters) Technical University of Denmark 73

74 How many clusters? 2 alleles lambda= alleles lambda= alleles lambda=0.02 KLD sum KLD sum Groups 5 alleles lambda=0.02 KLD sum KLD sum Groups 6 alleles lambda=0.02 KLD sum KLD sum Groups 7 alleles lambda= Groups Groups Groups 8 alleles lambda= alleles lambda=0.02 KLD sum Technical University of Denmark Groups KLD sum Groups Binders for 2 to 9 MHC class I alleles

75 How many clusters? Number of clusters random allele combinations Lambda penalty Number of clusters Lambda = Alleles Alleles Lambda = Lambda = Number of clusters Number of clusters Alleles Alleles Technical University of Denmark 75

76 In conclusion Sampling methods can solve problems where the search space is too large to be exhaustively explored Gibbs sampling can detect even weak motifs in a sequence alignment (e.g. MHC class II) More than 1,000 papers in PubMed using Gibbs sampling methods Transcription start-sites Receptor binding sites Acceptor:Donor sites... Technical University of Denmark 76

MCMC: Markov Chain Monte Carlo

I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov