An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

Size: px
Start display at page:

Download "An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering"

Transcription

1 An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering Dilan Görür Gatsby Unit University College London Yee Whye Teh Gatsby Unit University College London Abstract We propose an efficient sequential Monte Carlo inference scheme for the recently proposed coalescent clustering model [1]. Our algorithm has a quadratic runtime while those in [1] is cubic. In experiments, we were surprised to find that in addition to being more efficient, it is also a better sequential Monte Carlo sampler than the best in [1], when measured in terms of variance of estimated likelihood and effective sample size. 1 Introduction Algorithms for automatically discovering hierarchical structure from data play an important role in machine learning. In many cases the data itself has an underlying hierarchical structure whose discovery is of interest, examples include phylogenies in biology, object taxonomies in vision or cognition, and parse trees in linguistics. In other cases, even when the data is not hierarchically structured, such structures are still useful simply as a statistical tool to efficiently pool information across the data at different scales; this is the starting point of hierarchical modelling in statistics. Many hierarchical clustering algorithms have been proposed in the past for discovering hierarchies. In this paper we are interested in a Bayesian approach to hierarchical clustering [, 3, 1]. This is mainly due to the appeal of the Bayesian approach being able to capture uncertainty in learned structures in a coherent manner. Unfortunately, inference in Bayesian models of hierarchical clustering are often very complex to implement, and computationally expensive as well. In this paper we build upon the work of [1] who proposed a Bayesian hierarchical clustering model based on Kingman s coalescent [4, 5]. [1] proposed both greedy and sequential Monte Carlo SMC based agglomerative clustering algorithms for inferring hierarchical clustering which are simpler to implement than Markov chain Monte Carlo methods. The algorithms work by starting with each data item in its own cluster, and iteratively merge pairs of clusters until all clusters have been merged. The SMC based algorithm has computational cost On 3 per particle, where n is the number of data items. We propose a new SMC based algorithm for inference in the coalescent clustering of [1]. The algorithm is based upon a different perspective on Kingman s coalescent than that in [1], where the computations required to consider whether to merge each pair of clusters at each iteration is not discarded in subsequent iterations. This improves the computational cost to On per particle, allowing this algorithm to be applied to larger datasets. In experiments we show that our new algorithm achieves improved costs without sacrificing accuracy or reliability. Kingman s coalescent originated in the population genetics literature, and there has been significant interest there on inference, including Markov chain Monte Carlo based approaches [6] and SMC approaches [7, 8]. The SMC approaches have interesting relationship to our algorithm and to that of [1]. While ours and [1] integrate out the mutations on the coalescent tree and sample the coalescent 1

2 times, [7, 8] integrate out the coalescent times, and sample mutations instead. Because of this difference, ours and that of [1] will be more efficient in higher dimensional data, as well as other cases where the state space is too large and sampling mutations will be inefficient. In the next section, we review Kingman s coalescent and the existing SMC algorithms for inference on this model. In Section 3, we describe a cheaper SMC algorithm. We compare our method with that of [1] in Section 4 and conclude with a discussion in Section 5. Hierarchical Clustering using Kingman s Coalescent Kingman s coalescent [4, 5] describes the family relationship between a set of haploid individuals by constructing the genealogy backwards in time. Ancestral lines coalesce when the individuals share a common ancestor, and the genealogy is a binary tree rooted at the common ancestor of all the individuals under consideration. We briefly review the coalescent and the associated clustering model as presented in [1] before presenting a different formulation more suitable for our proposed algorithm. Let π be the genealogy of n individuals. There are coalescent events in π, we order these events with i = 1 being the most recent one, and i = n 1 for the last event when all ancestral lines are coalesced. Event i occurs at time T i < 0 in the past, and involves the coalescing of two ancestors, denoted ρ li and ρ ri, into one denoted ρ i. Let A i be the set of ancestors right after coalescent event i, and A 0 be the full set of individuals at the present time T 0 = 0. To draw a sample π from Kingman s coalescent we sample the coalescent events one at a time starting from the present. At iteration i we pick the pair of individuals ρ li, ρ ri uniformly at random from the n i + 1 individuals available in A i 1, pick a waiting time δ i Exp n i+1 from an exponential distribution with rate n i+1 equal to the number of pairs available, and set A i = A i 1 {ρ li, ρ ri } + {ρ i }, T i = T i 1 δ i. The probability of π is thus: pπ = n i + 1 exp δ i. 1 The coalescent can be used as a prior over binary trees in a model where we have a tree-structured likelihood function for observations at the leaves. Let θ i be the subtree rooted at ρ i and x i be the observations at the leaves of θ i. [1] showed that by propagating messages up the tree the likelihood function can be written in a sequential form: px π = Z 0 x Z ρi x i θ i, where Z ρi is a function only of the coalescent times associates with ρ li, ρ ri, ρ i and of the local messages sent from ρ li, ρ ri to ρ i, and Z 0 x is an easily computed normalization constant in eq.. Each function has the form see [1] for further details: Z ρi x i θ i = p 0 y i py c y i, θ i M ρc y c dy c dy i 3 c=l i,r i where M ρc is the message from child ρ c to ρ i. The posterior is proportional to the product of eq. 1 and eq. and our aim is to have an efficient way to compute the posterior. For this purpose, we will give a different perspective to constructing the coalescent in the following and describe our sequential Monte Carlo algorithm in Section 3..1 A regenerative race process In this section we describe a different formulation of the coalescent based on the fact that each stage of the coalescent can be interpreted as a race between the n i+1 pairs of individuals to coalesce. Each pair proposes a coalescent time, the pair with most recent coalescent time wins the race and gets to coalesce, at which point the next stage starts with n i pairs in the race. Naïvely this race process would require a total of On 3 pairs to propose coalescent times. We show that using the regenerative memoryless property of exponential distributions allows us to reduce this to On.

3 Algorithm 1 A regenerative race process for constructing the coalescent inputs: number of individuals n, set starting time T 0 = 0 and A 0 the set of n individuals for all pairs of existing individuals ρ l, ρ r A 0 do propose coalescent time t lr using eq. 4 end for for all coalescence events i = 1 : n 1 do find the pair to coalesce ρ li, ρ ri using eq. 5 set coalescent time T i = t lir i and update A i = A i 1 {ρ li, ρ ri } + {ρ i } remove pairs with ρ l {ρ li, ρ ri }, ρ r A i 1 \{ρ li, ρ ri } for all new pairs with ρ l = ρ i, ρ r A i \{ρ i } do propose coalescent time using eq. 4 end for end for The same idea will allow us to reduce the computational cost of our SMC algorithm from On 3 to On. At stage i of the coalescent we have n i + 1 individuals in A i 1, and n i+1 pairs in the race to coalesce. Each pair ρ l, ρ r A i 1, ρ l ρ r proposes a coalescent time t lr T i 1 T i 1 Exp1, 4 that is, by subtracting from the last coalescent time a waiting time drawn from an exponential distribution of rate 1. The pair ρ li, ρ ri with most recent coalescent time wins the race: } ρ li, ρ ri = argmax {t lr, ρ l, ρ r A i 1, ρ l ρ r 5 ρ l,ρ r and coalesces into a new individual ρ i at time T i = t lir i. At this point stage i + 1 of the race begins, with some pairs dropping out of the race specifically those with one half of the pair being either ρ li or ρ ri and new ones entering specifically those formed by pairing the new individual ρ i with an existing one. Among the pairs ρ l, ρ r that did not drop out nor just entered the race, consider the distribution of t lr conditioned on the fact that t lr < T i since ρ l, ρ r did not win the race at stage i. Using the memoryless property of the exponential distribution, we see that t lr T i T i Exp1, thus eq. 4 still holds and we need not redraw t lr for the stage i + 1 race. In other words, once t lr is drawn once, it can be reused for subsequent stages of the race until it either wins a race or drops out. The generative process is summarized in Algorithm 1. We obtain the probability of the coalescent π as a product over the i = 1,..., n 1 stages of the race, of the probability of each event ρ li, ρ ri wins stage i and coalesces at time T i given more recent stages. The probability at stage i is simply the probability that t lir i = T i, and that all other proposed coalescent times t lr < T i, conditioned on the fact that the proposed coalescent times t lr for all pairs at stage i are all less than T i 1. This gives: pπ = pt lir i = T i t lir i < T i 1 pt l r < T i t l r < T i 1 6 = ptlir i = T i pt lir i < T i 1 ρ l,ρ r ρ li,ρ ri ρ l,ρ r ρ li,ρ ri pt lr < T i pt lr < T i 1 where the second product runs over all pairs in stage i except the winning pair. Each pair that participated in the race has corresponding terms in eq. 7, starting at the stage when the pair entered the race, and ending with the stage when the pair either dropped out or wins the stage. As these terms cancel, eq. 7 simplifies to, pπ = pt lir i = T i ρ l {ρ li,ρ ri },ρ r A i 1\{ρ li,ρ ri } 7 pt lr < T i, 8 3

4 where the second product runs only over those pairs that dropped out after stage i. The first term is the probability of pair ρ li, ρ ri coalescing at time T i given its entrance time, and the second term is the probability of pair ρ l, ρ r dropping out of the race at time T i given its entrance time. We can verify that this expression equals eq. 1 by plugging in the probabilities for exponential distributions. Finally, multiplying the prior eq. 8 and the likelihood eq. we have, px, π = Z 0 x Z ρi x i θ i pt lir i = T i ρ l {ρ li,ρ ri }, ρ r A i 1\{ρ li,ρ ri } pt lr < T i. 9 3 Efficient SMC Inference on the Coalescent Our sequential Monte Carlo algorithm for posterior inference is directly inspired by the regenerative race process described above. In fact the algorithm is structurally exactly as in Algorithm 1, but with each pair ρ l, ρ r proposing a coalescent time from a proposal distribution t lr Q lr instead of from eq. 4. The idea is that the proposal distribution Q lr is constructed taking into account the observed data, so that Algorithm 1 produces better approximate samples from the posterior. The overall probability of proposing π under the SMC algorithm can be computed similarly to eq. 6-8, and is, qπ = q lir i t lir i = T i ρ l {ρ li,ρ ri },ρ r A i 1\{ρ li,ρ ri } q lr t lr < T i, 10 where q lr is the density of Q lr. As both eq. 9 and eq. 10 can be computed sequentially, the weight w associated with each sample π can be computed on the fly as the coalescent tree is constructed: w 0 = Z 0 x w i = w i 1 Z ρi x i θ i pt lir i = T i q lir i t lir i = T i ρ l {ρ li,ρ ri }, ρ r A i 1\{ρ li,ρ ri } pt lr < T i q lr t lr < T i. 11 Finally we address the choice of proposal distribution Q lr to use. [1] noted that Z ρi x i θ i acts as a local likelihood term in eq. 9. We make use of this observation and use eq. 4 as a local prior, i.e. the following density for the proposal distribution Q lr : q lr t lr Z ρlr x lr t lr, ρ l, ρ r, θ i 1 pt lr T clr 1 where ρ lr is a hypothetical individual resulting from coalescing l and r, T clr denotes the time when the pair ρ l, ρ r enters the race, x lr are the data under ρ l and ρ r, and pt lr T clr = e t lr T clritlr < T clr is simply an exponential density with rate 1 that has been shifted and reflected. I is an indicator function returning 1 if its argument is true, and 0 otherwise. The proposal distribution in [1] also has a form similar to eq. 1, but with the exponential rate being n i+1 instead, if the proposal was in stage i of the race. This dependence means that at each stage of the race the coalescent times proposal distribution needs to be recomputed for each pair, leading to an On 3 computation time. On the other hand, similar to the prior process, we need to propose a coalescent time for each pair only once when it is first created. This results in On computational complexity per particle 1. Note that it may not always be possible or efficient to compute the normalizing constant of the density in eq. 1 even if we can sample from it efficiently. This means that the weight updates eq. 11 cannot be computed. In that case, we can use an approximation Z ρlr to Z ρlr instead. In the following subsection we describe the independent-sites parent-independent model we used in the experiments, and how to construct Z ρlr. 1 Technically the time cost is On m + log n, where n is the number of individuals, and m is the cost of sampling from and evaluating eq. 1. The additional log n factor comes about because a priority queue needs to be maintained to determine the winner of each stage efficiently, but this is negligible compared to m. 4

5 3.1 Independent-Sites Parent-Independent Likelihood Model In our experiments we have only considered coalescent clustering of discrete data, though our approach can be applied more generally. Say each data item consists of a D dimensional vector where each entry can take on one of K values. We use the independent-sites parent-independent mutation model over multinomial vectors in [1] as our likelihood model. Specifically, this model assumes that each point on the tree is associated with a D dimensional multinomial vector, and each entry of this vector on each branch of the tree evolves independently thus independent-sites, forward in time, and with mutations occurring at rate λ d on entry d. When a mutation occurs, a new value for the entry is drawn from a distribution φ d, independently of the previous value at that entry thus parentindependent. When a coalescent event is encountered, the mutation process evolves independently down both branches. Some calculations show that the transition probability matrix of the mutation process associated with entry d on a branch of length t is e λdt I K + 1 e λdt φ d 1 K, where I K is the identity matrix, 1 K is a vector of 1 s, and we have implicitly represented the multinomial distribution φ d as a vector of probabilities. The message for entry d from node ρ i on the tree to its parent is a vector Mρ d i = [Mρ d1 i,..., Mρ dk i ], normalized so that φ d M ρ d i = 1. The local likelihood term is then: K Zρ d lr x lr t lr, ρ l, ρ r, θ i 1 = 1 e λ dt lr t l t r 1 φ dk Mρ dk l Mρ dk r 13 The logarithm of the proposal density is then: log q lr t lr = constant + t lr T clr + k=1 D log Zρ d lr x lr t lr, ρ l, ρ r, θ i 1 14 This is not of standard form, and we use an approximation log q lr t lr instead. Specifically, we use a piecewise linear log q lr t lr, which can be easily sampled from, and for which the normalization term is easy to compute. The approximation is constructed as follows. Note that log Z d ρ lr x lr t lr, ρ l, ρ r, θ i 1, as a function of t lr, is concave if the term inside the parentheses in eq. 13 is positive, convex if negative, and constant if zero. Thus eq. 14 is a sum of linear, concave and convex terms. Using the upper and lower envelopes developed for adaptive rejection sampling [9], we can construct piecewise linear upper and lower envelopes for log q lr t lr by upper and lower bounding the concave and convex parts separately. The upper and lower envelopes give exact bounds on the approximation error introduced, and we can efficiently improve the envelopes until a given desired approximation error is achieved. Finally, we used the upper bound as our approximate log q lr t lr. Note that the same issue arises in the proposal distribution for SMC-PostPost, and we used the same piecewise linear approximation. The details of this algorithm can be found in [10]. 4 Experiments The improved computational cost of inference makes it possible to do Bayesian inference for the coalescence models on larger datasets. The SMC samplers converge to the exact solution in the limit of infinite particles. However, it is not enough to be more efficient per particle, the crucial point is how efficient the algorithm is overall. An important question is how many particles we need in practice. To address this question, we compared the performance of our algorithm SMC1 to SMC-PostPost on the synthetic data shown in Figure 1. There are 15 binary 1-dimensional vectors in the dataset. There is overlap between the features of the data points however the data does not obey a tree structure, which will result in a multimodal posterior. Both SMC1 and SMC-PostPost recover the structure with only a few particles. However there is room for improvement as the variance in the likelihood obtained from multiple runs decreases with increasing number of particles. Since both SMC algorithms are exact in the limit, the values should converge as we add more particles. We can check convergence by observing the variance of likelihood estimates of multiple runs. The variance The comparison is done in the importance sampling setting, i.e. without using resampling for comparison of the proposal distributions. d=1 5

6 a b Figure 1: Synthetic data features is shown on the left; each data point is a binary column vector. A sample tree from the SMC1 algorithm demonstrate that the algorithm could capture the similarity structure. The true covariance of the data a and the distance on the tree learned by the SMC1 algorithm averaged over particles b are shown, showing that the overall structure was corerctly captured. The results obtained from SMC-PostPost were very similar to SMC1 therefore are not shown here. should shrink as we increase the number of particles. Figure shows the change in the estimated likelihood as a function of number of particles. From this figure, we can conclude that the computationally cheaper algorithm SMC1 is more efficient also in the number of particles as it gives more accurate answers with less particles. 7 x likelihood effective sample size number of particles, 7 runs each number of particles, 7 runs each Figure : The change in the likelihood left and the effective sample size right as a function of number of particles for SMC1 solid and SMC-PostPost dashed. The mean estimate of both algorithms are very close, with the SMC1 having a much tighter variance. The variance of both algorithms shrink and the effective sample size increases as the number of particles increase. A quantity of interest in genealogical studies is the time to the most recent common ancestor MRCA, which is the time of the last coalescence event. Although there is not a physical interpretation of this quantity for hierarchical clustering, it gives us an indication about the variance of the particles. We can observe the variation in the time to MRCA to assess convergence. Similar to the variance behaviour in the likelihood, with small number of particles SMC-PostPost has higher variance than SMC1. However, as there are more particles, results of the two algorithms almost overlap. The mean time for each step of coalescence together with its variance for 750 particles for both algorithms is depicted in Figure 3. It is interesting that the first few coalescence times of SMC1 are shorter than those for SMC-PostPost. The distribution of the particle weights is important for the efficiency of the importance sampler. Ideally, the weights would be uniform such that each particle contributes equally to the posterior estimation. If there is only a few particles that come from a high probability region, the weights of those particles would be much larger than 6

7 smc1 smcpostpost Figure 3: Times for each coalescence step averaged over 750 particles. Note that both algorithms almost converged at the same distribution when given enough resources. There is a slight difference in the mean coalescence time. It is interesting that the SMC1 algorithm proposes shorter times for the initial coalescence events. the rest, resulting in a low effective sample size. We will discuss this point more in the next section. Here, we note that for the synthetic dataset, the effective sample size of SMC-PostPost is very poor, and that of SMC1 is much higher, see Figure. 5 Discussion We described an efficient Sequential Monte Carlo algorithm for inference on hierarchical clustering models that use Kingman s coalescent as a proir. Our method makes use of a regenerative perspective to construct the coalescent tree. Using this construction, we achieve quadratic run time per particle. By employing a tight upper bound on the local likelihood term, the proposed algorithm is applicable to general data generation processes. We also applied our algorithm for inferring the structure in the phylolinguistic data used in [1]. We used the same Indo-European subset of the data, with the same subset of features, that is 44 languages with 100 binary features. Three example trees with the largest weights out of 7750 samples are depicted in Figure 4. Unfortunately, on this dataset, the effective sample size of both algorithms is close to one. A usual method to circumvent the low effective sample size problem in sequential Monte Carlo algorithms is to do resampling, that is, detecting the particles that will not contribute much to the posterior from the partial samples and prune them away, multiplying the promising samples. There are two stages to doing resampling. We need to decide at what point to prune away samples, and how to select which samples to prune away. As shown by [11], different problems may require different resampling algorithms. We tried resampling using Algorithm 5.1 of [1], however this only had a small improvement in the final performance for both algorithms on this data set. Note that both algorithms use local likelihoods for calculating the weights, therefore the weights are not fully informative about the actual likelihood of the partial sample. Furthermore, in the recursive calculation of the weights in SMC1, we are including the effect of a pair only when they either coalesce or cease to exist for the sake of saving computations. Therefore the partial weights are even less informative about the state of the sample and the effective sample size cannot really give full explanation about whether the current sample is good or not. In fact, we did observe oscillations on the effective sample size calculated on the weights along the iterations, i.e. starting off with a high value, decreasing to virtually 1 and increasing later before the termination, which also indicates that it is not clear which of the particles will be more effective eventually. An open question is how to incorporate a resampling algorithm to improve the efficiency. References [1] Y. W. Teh, H. Daume III, and D. M. Roy. Bayesian agglomerative clustering with coalescents. In Advances in Neural Information Processing Systems, volume 0,

8 a no resampling Slavic Russian Slavic Ukrainian Slavic Polish Slavic Slovene Slavic Serbian Baltic Lithuanian Baltic Latvian Slavic Czech Celtic Irish Celtic Gaelic Celtic Welsh Celtic Cornish Celtic Breton Germanic Icelandic Germanic English Germanic German Germanic Dutch Germanic Norwegian Germanic Swedish Germanic Danish Romance Spanish Romance Portuguese Romance French Romance Romanian Romance Italian Romance Catalan Greek Greek Slavic Bulgarian Albanian Albanian Indic Kashmiri Iranian Pashto Indic Panjabi Indic Hindi Indic Nepali Iranian Ossetic Indic Maithili Indic Sinhala Indic Marathi Indic Bengali Iranian Tajik Iranian Persian Iranian Kurdish Armenian Armenian W Armenian Armenian E b no resampling Iranian Tajik Iranian Persian Iranian Kurdish Celtic Irish Celtic Gaelic Celtic Welsh Celtic Cornish Celtic Breton Armenian Armenian E Iranian Ossetic Indic Nepali Indic Marathi Indic Maithili Iranian Pashto Indic Panjabi Indic Hindi Indic Kashmiri Indic Bengali Indic Sinhala Armenian Armenian W Germanic Icelandic Germanic English Germanic German Germanic Dutch Germanic Swedish Germanic Norwegian Germanic Danish Romance Spanish Romance Italian Romance Catalan Greek Greek Slavic Bulgarian Romance Portuguese Romance French Romance Romanian Albanian Albanian Baltic Latvian Slavic Polish Slavic Ukrainian Slavic Czech Baltic Lithuanian Slavic Russian Slavic Slovene Slavic Serbian c with resampling Slavic Polish Baltic Latvian Slavic Czech Celtic Irish Celtic Gaelic Celtic Welsh Celtic Breton Greek Greek Slavic Bulgarian Iranian Tajik Iranian Persian Iranian Kurdish Indic Sinhala Indic Kashmiri Indic Bengali Iranian Ossetic Iranian Pashto Indic Nepali Indic Marathi Indic Maithili Indic Panjabi Indic Hindi Armenian Armenian W Armenian Armenian E Germanic Icelandic Germanic English Germanic German Germanic Dutch Germanic Norwegian Germanic Swedish Germanic Danish Slavic Serbian Slavic Slovene Slavic Ukrainian Slavic Russian Baltic Lithuanian Romance Romanian Celtic Cornish Albanian Albanian Romance Spanish Romance Portuguese Romance French Romance Italian Romance Catalan Figure 4: Tree structure infered from WALS data. a,b Samples from a run with 7750 particles without resampling. c Sample from a run with resampling. The values above the trees are normalized weights. Note that the weight of a is almost one, which means that the contribution from the rest of the particles is infinitesimal although the tree structure in b also seem to capture the similarities between languages. [] R. M. Neal. Defining priors for distributions using Dirichlet diffusion trees. Technical Report 0104, Department of Statistics, University of Toronto, 001. [3] C. K. I. Williams. A MCMC approach to hierarchical mixture modelling. In Advances in Neural Information Processing Systems, volume 1, 000. [4] J. F. C. Kingman. On the genealogy of large populations. Journal of Applied Probability, 19:7 43, 198. Essays in Statistical Science. [5] J. F. C. Kingman. The coalescent. Stochastic Processes and their Applications, 13:35 48, 198. [6] J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17: , [7] R. C. Griffiths and S. Tavare. Simulating probability distributions in the coalescent. Theoretical Population Biology, 46: , [8] M. Stephens and P. Donnelly. Inference in molecular population genetics. Journal of the Royal Statistical Society, 6: , 000. [9] W.R. Gilks and P. Wild. Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41: , 199. [10] D. Görür and Y.W. Teh. Concave convex adaptive rejection sampling. Technical report, Gatsby Computational Neuroscience Unit, 008. [11] Y. Chen, J. Xie, and J. Liu. Stopping-time resampling for sequential monte carlo methods. Journal of the Royal Statistical Society, 67, 005. [1] P. Fearnhead. Sequential Monte Carlo Method in Filter Theory. PhD thesis, Merton College, University of Oxford,

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Adaptive Rejection Sampling with fixed number of nodes

Adaptive Rejection Sampling with fixed number of nodes Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, Brazil. Abstract The adaptive rejection sampling

More information

Adaptive Rejection Sampling with fixed number of nodes

Adaptive Rejection Sampling with fixed number of nodes Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bioinformatics with pen and paper: building a phylogenetic tree

Bioinformatics with pen and paper: building a phylogenetic tree Image courtesy of hometowncd / istockphoto Bioinformatics with pen and paper: building a phylogenetic tree Bioinformatics is usually done with a powerful computer. With help from Cleopatra Kozlowski, however,

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Parsimonious Adaptive Rejection Sampling

Parsimonious Adaptive Rejection Sampling Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Tree-Based Inference for Dirichlet Process Mixtures

Tree-Based Inference for Dirichlet Process Mixtures Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Modelling Genetic Variations with Fragmentation-Coagulation Processes Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Fast MCMC sampling for Markov jump processes and extensions

Fast MCMC sampling for Markov jump processes and extensions Fast MCMC sampling for Markov jump processes and extensions Vinayak Rao and Yee Whye Teh Rao: Department of Statistical Science, Duke University Teh: Department of Statistics, Oxford University Work done

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Dublin City University (DCU) GCS/GCSE Recognised Subjects

Dublin City University (DCU) GCS/GCSE Recognised Subjects Dublin City University (DCU) GCS/GCSE Recognised Subjects Subjects listed below are recognised for entry to DCU. Unless otherwise indicated only one subject from each group may be presented. Applied A

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

28 : Approximate Inference - Distributed MCMC

28 : Approximate Inference - Distributed MCMC 10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

MCMC for non-linear state space models using ensembles of latent sequences

MCMC for non-linear state space models using ensembles of latent sequences MCMC for non-linear state space models using ensembles of latent sequences Alexander Y. Shestopaloff Department of Statistical Sciences University of Toronto alexander@utstat.utoronto.ca Radford M. Neal

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information

Bisection Ideas in End-Point Conditioned Markov Process Simulation

Bisection Ideas in End-Point Conditioned Markov Process Simulation Bisection Ideas in End-Point Conditioned Markov Process Simulation Søren Asmussen and Asger Hobolth Department of Mathematical Sciences, Aarhus University Ny Munkegade, 8000 Aarhus C, Denmark {asmus,asger}@imf.au.dk

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Density Modeling and Clustering Using Dirichlet Diffusion Trees BAYESIAN STATISTICS 7, pp. 619 629 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Density Modeling and Clustering

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Infinite Mixtures of Gaussian Process Experts

Infinite Mixtures of Gaussian Process Experts in Advances in Neural Information Processing Systems 14, MIT Press (22). Infinite Mixtures of Gaussian Process Experts Carl Edward Rasmussen and Zoubin Ghahramani Gatsby Computational Neuroscience Unit

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks

Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks Vinayak Rao Gatsby Computational Neuroscience Unit University College London vrao@gatsby.ucl.ac.uk Yee Whye Teh Gatsby

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

New Insights into History Matching via Sequential Monte Carlo

New Insights into History Matching via Sequential Monte Carlo New Insights into History Matching via Sequential Monte Carlo Associate Professor Chris Drovandi School of Mathematical Sciences ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Infinite Hierarchical Hidden Markov Models

Infinite Hierarchical Hidden Markov Models Katherine A. Heller Engineering Department University of Cambridge Cambridge, UK heller@gatsby.ucl.ac.uk Yee Whye Teh and Dilan Görür Gatsby Unit University College London London, UK {ywteh,dilan}@gatsby.ucl.ac.uk

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Infinite Latent Feature Models and the Indian Buffet Process

Infinite Latent Feature Models and the Indian Buffet Process Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Part IV: Monte Carlo and nonparametric Bayes

Part IV: Monte Carlo and nonparametric Bayes Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation

More information

Stopping-Time Resampling for Sequential Monte Carlo Methods

Stopping-Time Resampling for Sequential Monte Carlo Methods Stopping-Time Resampling for Sequential Monte Carlo Methods Yuguo Chen and Junyi Xie Duke University, Durham, USA Jun S. Liu Harvard University, Cambridge, USA July 27, 2004 Abstract Motivated by the statistical

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information